解析
Hadoop Cluster 中 Daemon:
HDFS:
NameNode,NN
SecondaryNameNode,SNN
DataNode,DN
Hadoop集群
几十个节点--初级
几百个节点--正常,可以发挥Hadoop功能
datanode本身有replication机制,无需组织成raid;
有大量磁盘时,可基于Jbot模型把多块磁盘串联成单个设备即可;
YARN:
ResourceManager
NodeManager
主控节点操作从节点,需要ssh:基于密钥机制
运行集群的用户
hdfs用户:
无密钥登录 所有从节点 和 secondary namenode
yarn用户:
无密钥登录 resourcemanager 和 nodemanager 节点
NN 和 SNN(辅助节点) 不能放在同一机器上
datanode至少3个节点
resourcemanager 在独立的机器上
datanode+nodemanager
准备
centos 7.5 1804
4台机器 VMware操作
NAT网络 192.168.25.x
仅主机网络 192.168.50.x
4台主机的主机名:
node1.fgq.com
node2.fgq.com
node3.fgq.com
node4.fgq.com
4台主机 时间同步
[root@node1 ~]# crontab -e
*/5 * * * * ntpdate time3.aliyun.com && hwclock -w
4台主机 主机名称解析
[root@node1 ~]# cat /etc/hosts
192.168.25.11 node1.fgq.com node1 master
192.168.25.12 node2.fgq.com node2
192.168.25.13 node3.fgq.com node3
192.168.25.14 node4.fgq.com node4
192.168.25.15 node5.fgq.com node5
4台主机 安装基础包
[root@node1 ~]# yum -y install vim lrzsz ntpdate wget net-tools lsof
[root@node1 ~]# yum -y install epel-release-latest-7.noarch.rpm
4台主机 禁用防火墙
[root@node1 ~]# systemctl stop firewalld
[root@node1 ~]# systemctl disable firewalld
4台主机 禁用selinux
[root@node1 ~]# vi /etc/selinux/config
SELINUX=disabled
[root@node1 ~]# setenforce 0
[root@node1 ~]# getenforce
配置
4个节点都操作,以node1为例
[root@node1 ~]# mkdir -p /fgq/base-env/
[root@node1 ~]# cd /fgq/base-env/
下载jdk包 jdk-8u152-linux-x64.tar.gz
下载Hadoop包 hadoop-2.9.2.tar.gz
传到这个目录下,并解压
[root@node1 base-env]# tar zxf jdk-8u152-linux-x64.tar.gz
[root@node1 base-env]# tar zxf hadoop-2.9.2.tar.gz
[root@node1 base-env]# ln -s jdk1.8.0_152 jdk
[root@node1 base-env]# ln -s hadoop-2.9.2 hadoop
[root@node1 ~]# vim /etc/profile
最下面添加如下信息:
export JAVA_HOME=/fgq/base-env/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
[root@node1 ~]# source /etc/profile
[root@node1 ~]# java -version
java version "1.8.0_152"
Java(TM) SE Runtime Environment (build 1.8.0_152-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode)
[root@node1 ~]# vim /etc/profile.d/hadoop.sh
export HADOOP_HOME=/fgq/base-env/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_MAPPERD_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
[root@node1 ~]# source /etc/profile.d/hadoop.sh
[root@node1 ~]# vim /fgq/base-env/hadoop/etc/hadoop/hadoop-env.sh
删除此配置:export JAVA_HOME=${JAVA_HOME}
添加如下配置:
export JAVA_HOME=/fgq/base-env/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
[root@node1 ~]# cd /fgq/base-env/hadoop
[root@node1 hadoop]# useradd hadoop
[root@node1 hadoop]# echo 'fgqhadoop' | passwd --stdin hadoop
无密钥登录
仅node1操作
[root@node1 hadoop]# su - hadoop
[hadoop@node1 ~]$ ssh-keygen -t rsa -P ''
直接回车,生成密钥即可,无密码
把生成的公钥id_rsa.pub复制到其他机器上去,注意命令用ssh-copy-id,不是scp
[hadoop@node1 ~]$ for i in {2..4};do ssh-copy-id -i .ssh/id_rsa.pub hadoop@node$i;done
输入hadoop用户对应的密码即可
测试是否可以无密钥登录
[hadoop@node1 ~]$ for i in {2..4};do ssh node$i 'date';done
Mon Mar 4 11:43:24 CST 2019
Mon Mar 4 11:43:25 CST 2019
Mon Mar 4 11:43:25 CST 2019
创建文档目录
4个节点都操作,以node1为例
[root@node1 hadoop]# mkdir -p /fgq/data/hadoop/hdfs/{nn,snn,dn}
node1上只有nn和snn有用
node2/3/4上只有dn有用
[root@node1 hadoop]# chown -R hadoop:hadoop /fgq/data/hadoop/hdfs
[root@node1 hadoop]# ll /fgq/data/hadoop/hdfs
[root@node1 hadoop]# mkdir logs
[root@node1 hadoop]# chmod g+w logs 确保logs用户组有写权限
[root@node1 hadoop]# chown -R hadoop:hadoop logs
[root@node1 hadoop]# chown -R hadoop:hadoop ./*
[root@node1 hadoop]# ll
[root@node1 hadoop]# cd etc/hadoop/
[root@node1 hadoop]# vim core-site.xml
这是hdfs的访问接口
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020</value>
<final>true</final>
</property>
</configuration>
[root@node1 hadoop]# vim yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
</configuration>
[root@node1 hadoop]# vim hdfs-site.xml
副本建议使用3个,但是空间利用率变为3分之一
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///fgq/data/hadoop/hdfs/nn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///fgq/data/hadoop/hdfs/dn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>file:///fgq/data/hadoop/hdfs/snn</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:///fgq/data/hadoop/hdfs/snn</value>
</property>
</configuration>
[root@node1 hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@node1 hadoop]# vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
[root@node1 hadoop]# vim slaves
node2
node3
node4
格式化和启动
[root@node1 hadoop]# su - hadoop
## 格式化
[hadoop@node1 ~]$ hdfs namenode -format
19/03/04 15:07:10 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = node1.fgq.com/192.168.25.11
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.9.2
... ...
19/03/04 15:07:11 INFO common.Storage: Storage directory /fgq/data/hadoop/hdfs/nn has been successfully formatted.
19/03/04 15:07:11 INFO namenode.FSImageFormatProtobuf: Saving image file /fgq/data/hadoop/hdfs/nn/current/fsimage.ckpt_0000000000000000000 using no compression
19/03/04 15:07:11 INFO namenode.FSImageFormatProtobuf: Image file /fgq/data/hadoop/hdfs/nn/current/fsimage.ckpt_0000000000000000000 of size 324 bytes saved in 0 seconds .
19/03/04 15:07:11 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
19/03/04 15:07:11 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at node1.fgq.com/192.168.25.11
************************************************************/
显示有 successfully 字样,表示OK
[hadoop@node1 ~]$ ls /fgq/data/hadoop/hdfs/nn/current/
fsimage_0000000000000000000 fsimage_0000000000000000000.md5 seen_txid VERSION
## 启动hdfs
[hadoop@node1 ~]$ start-dfs.sh
Starting namenodes on [master]
hadoop@master's password:
master: starting namenode, logging to /fgq/base-env/hadoop-2.9.2/logs/hadoop-hadoop-namenode-node1.fgq.com.out
node2: starting datanode, logging to /fgq/base-env/hadoop-2.9.2/logs/hadoop-hadoop-datanode-node2.fgq.com.out
node3: starting datanode, logging to /fgq/base-env/hadoop-2.9.2/logs/hadoop-hadoop-datanode-node3.fgq.com.out
node4: starting datanode, logging to /fgq/base-env/hadoop-2.9.2/logs/hadoop-hadoop-datanode-node4.fgq.com.out
Starting secondary namenodes [0.0.0.0]
hadoop@0.0.0.0's password:
0.0.0.0: starting secondarynamenode, logging to /fgq/base-env/hadoop-2.9.2/logs/hadoop-hadoop-secondarynamenode-node1.fgq.com.out
secondarynamenode没有配置,默认在0.0.0.0机器上面
节点测试,是否启动
[hadoop@node1 ~]$ jps
5921 SecondaryNameNode
6056 Jps
5724 NameNode
[root@node2 hadoop]# su - hadoop
[hadoop@node2 ~]$ jps
4739 DataNode
4853 Jps
[root@node3 base-env]# su - hadoop
[hadoop@node3 ~]$ jps
2468 DataNode
2582 Jps
[root@node4 base-env]# su - hadoop
[hadoop@node4 ~]$ jps
11486 DataNode
11599 Jps
## 创建文件夹并上传文件
[hadoop@node1 ~]$ hdfs dfs -mkdir /test
[hadoop@node1 ~]$ hdfs dfs -put /etc/fstab /test/
[hadoop@node1 ~]$ hdfs dfs -lsr /test
lsr: DEPRECATED: Please use 'ls -R' instead.
-rw-r--r-- 2 hadoop supergroup 501 2019-03-04 15:54 /test/fstab
[hadoop@node1 ~]$ hdfs dfs -cat /test/fstab
#
# /etc/fstab
# Created by anaconda on Sat Feb 23 09:48:36 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=17bafb33-37b7-4df5-9731-7626d2358a8c / xfs defaults 0 0
UUID=456e2300-c78e-47ae-ae3f-c86d5689ff69 /boot xfs defaults 0 0
UUID=8b23281d-2913-45cd-9111-ce6e1cc0af82 swap swap defaults 0 0
浏览器:http://192.168.25.11:50070,点击Datanodes,可以观察block位置
本地文件系统位置在node3/4上
[hadoop@node3 ~]$ cat /fgq/data/hadoop/hdfs/dn/current/BP-636749653-192.168.25.11-1551683231365/current/finalized/subdir0/subdir0/blk_1073741825
#
# /etc/fstab
# Created by anaconda on Sat Feb 23 09:48:36 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=17bafb33-37b7-4df5-9731-7626d2358a8c / xfs defaults 0 0
UUID=456e2300-c78e-47ae-ae3f-c86d5689ff69 /boot xfs defaults 0 0
UUID=8b23281d-2913-45cd-9111-ce6e1cc0af82 swap swap defaults 0 0
## 启动yarn服务
[hadoop@node1 ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /fgq/base-env/hadoop/logs/yarn-hadoop-resourcemanager-node1.fgq.com.out
node2: starting nodemanager, logging to /fgq/base-env/hadoop/logs/yarn-hadoop-nodemanager-node2.fgq.com.out
node4: starting nodemanager, logging to /fgq/base-env/hadoop/logs/yarn-hadoop-nodemanager-node4.fgq.com.out
node3: starting nodemanager, logging to /fgq/base-env/hadoop/logs/yarn-hadoop-nodemanager-node3.fgq.com.out
[hadoop@node1 ~]$ jps
5921 SecondaryNameNode
6601 Jps
5724 NameNode
6333 ResourceManager
出现ResourceManager
[hadoop@node2 ~]$ jps
4739 DataNode
4933 NodeManager
5049 Jps
[hadoop@node3 ~]$ jps
2468 DataNode
2666 NodeManager
2782 Jps
[hadoop@node4 ~]$ jps
11681 NodeManager
11798 Jps
11486 DataNode
[root@node1 ~]# ss -ntl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 *:50090 *:*
LISTEN 0 128 192.168.25.11:8020 *:*
LISTEN 0 128 *:50070 *:*
LISTEN 0 128 *:22 *:*
LISTEN 0 100 127.0.0.1:25 *:*
LISTEN 0 128 ::ffff:192.168.25.11:8030 :::*
LISTEN 0 128 ::ffff:192.168.25.11:8031 :::*
LISTEN 0 128 ::ffff:192.168.25.11:8032 :::*
LISTEN 0 128 ::ffff:192.168.25.11:8033 :::*
LISTEN 0 128 :::22 :::*
LISTEN 0 128 ::ffff:192.168.25.11:8088 :::*
LISTEN 0 100 ::1:25 :::*
1
2
## 上传文件
[hadoop@node1 ~]$ hdfs dfs -put /etc/init.d/functions /test/
[hadoop@node1 ~]$ hdfs dfs -ls -r /test/
Found 2 items
-rw-r--r-- 2 hadoop supergroup 18104 2019-03-04 16:08 /test/functions
-rw-r--r-- 2 hadoop supergroup 501 2019-03-04 15:54 /test/fstab
文件过大会被切割,分成几个block存储
文件过小,block空间大(最小64M),会占用一个block的一部分
程序运行
运行
[hadoop@node1 ~]$ yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar
显示参数--项目
[hadoop@node1 ~]$ yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /test/fstab /test/functions /test/wc
19/03/04 16:18:10 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.25.11:8032
19/03/04 16:18:11 INFO input.FileInputFormat: Total input files to process : 2
19/03/04 16:18:11 INFO mapreduce.JobSubmitter: number of splits:2
19/03/04 16:18:11 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/03/04 16:18:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1551686534719_0001
19/03/04 16:18:12 INFO impl.YarnClientImpl: Submitted application application_1551686534719_0001
19/03/04 16:18:12 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1551686534719_0001/
19/03/04 16:18:12 INFO mapreduce.Job: Running job: job_1551686534719_0001
19/03/04 16:18:19 INFO mapreduce.Job: Job job_1551686534719_0001 running in uber mode : false
19/03/04 16:18:19 INFO mapreduce.Job: map 0% reduce 0%
19/03/04 16:18:26 INFO mapreduce.Job: map 100% reduce 0%
19/03/04 16:18:31 INFO mapreduce.Job: map 100% reduce 100%
19/03/04 16:18:31 INFO mapreduce.Job: Job job_1551686534719_0001 completed successfully
19/03/04 16:18:31 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=11597
FILE: Number of bytes written=618537
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=18797
HDFS: Number of bytes written=8618
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=10178
Total time spent by all reduces in occupied slots (ms)=2095
Total time spent by all map tasks (ms)=10178
Total time spent by all reduce tasks (ms)=2095
Total vcore-milliseconds taken by all map tasks=10178
Total vcore-milliseconds taken by all reduce tasks=2095
Total megabyte-milliseconds taken by all map tasks=10422272
Total megabyte-milliseconds taken by all reduce tasks=2145280
Map-Reduce Framework
Map input records=718
Map output records=2418
Map output bytes=24538
Map output materialized bytes=11603
Input split bytes=192
Combine input records=2418
Combine output records=745
Reduce input groups=738
Reduce shuffle bytes=11603
Reduce input records=745
Reduce output records=738
Spilled Records=1490
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=415
CPU time spent (ms)=1970
Physical memory (bytes) snapshot=757768192
Virtual memory (bytes) snapshot=6356271104
Total committed heap usage (bytes)=483393536
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=18605
File Output Format Counters
Bytes Written=8618
[hadoop@node1 ~]$ hdfs dfs -ls -r /test/wc
Found 2 items
-rw-r--r-- 2 hadoop supergroup 8618 2019-03-04 16:18 /test/wc/part-r-00000
-rw-r--r-- 2 hadoop supergroup 0 2019-03-04 16:18 /test/wc/_SUCCESS
[hadoop@node1 ~]$ hdfs dfs -cat /test/wc/part-r-00000
! 5
!= 6
" 10
"$#" 4
"$(cat 1
"$0" 1
"$1" 19
"$1")" 2
"$?" 2
"$@" 1
"$BOOTUP" 17
"$CONSOLETYPE" 1
"$RC" 4
"$STRING 1
"$SYSTEMCTL_IGNORE_DEPENDENCIES" 1
"$SYSTEMCTL_SKIP_REDIRECT" 1
"$_use_systemctl" 2
"$b" 1
"$base 1
"$base" 1
"$base_stime" 1
"$binary" 4
"$command" 1
"$corelimit 2
"$file" 9
"$force" 1
"$gotbase" 1
"$kill_list" 5
"$killlevel" 3
浏览器:http://192.168.25.11:8088如下图
yarn相关命令
[hadoop@node1 ~]$ yarn application -list -appStates=all
19/03/04 16:42:58 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.25.11:8032
Total number of applications (application-types: [], states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED] and tags: []):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1551686534719_0001 word count MAPREDUCE hadoop default FINISHED SUCCEEDED 100% http://node3.fgq.com:19888/jobhistory/job/job_1551686534719_0001
[hadoop@node1 ~]$ yarn application -status application_1551686534719_0001
19/03/04 16:44:37 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.25.11:8032
Application Report :
Application-Id : application_1551686534719_0001
Application-Name : word count
Application-Type : MAPREDUCE
User : hadoop
Queue : default
Application Priority : 0
Start-Time : 1551687491765
Finish-Time : 1551687509915
Progress : 100%
State : FINISHED
Final-State : SUCCEEDED
Tracking-URL : http://node3.fgq.com:19888/jobhistory/job/job_1551686534719_0001
RPC Port : 39952
AM Host : node3.fgq.com
Aggregate Resource Allocation : 66744 MB-seconds, 39 vcore-seconds
Aggregate Resource Preempted : 0 MB-seconds, 0 vcore-seconds
Log Aggregation Status : DISABLED
Diagnostics :
Unmanaged Application : false
Application Node Label Expression : <Not set>
AM container Node Label Expression : <DEFAULT_PARTITION>
TimeoutType : LIFETIME ExpiryTime : UNLIMITED RemainingTime : -1seconds
[hadoop@node1 ~]$ yarn node -list
19/03/04 16:46:44 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.25.11:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
node4.fgq.com:46657 RUNNING node4.fgq.com:8042 0
node3.fgq.com:44442 RUNNING node3.fgq.com:8042 0
node2.fgq.com:38119 RUNNING node2.fgq.com:8042 0
[hadoop@node1 ~]$ yarn node -status node3.fgq.com:44442
19/03/04 16:52:58 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.25.11:8032
Node Report :
Node-Id : node3.fgq.com:44442
Rack : /default-rack
Node-State : RUNNING
Node-Http-Address : node3.fgq.com:8042
Last-Health-Update : Mon 04/Mar/19 04:52:17:378CST
Health-Report :
Containers : 0
Memory-Used : 0MB
Memory-Capacity : 8192MB
CPU-Used : 0 vcores
CPU-Capacity : 8 vcores
Node-Labels :
Resource Utilization by Node : PMem:1082 MB, VMem:1082 MB, VCores:0.0033322228
Resource Utilization by Containers : PMem:0 MB, VMem:0 MB, VCores:0.0
[hadoop@node1 ~]$ yarn logs -applicationId application_1551686534719_0001
19/03/04 16:57:18 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.25.11:8032
File /tmp/logs/hadoop/logs/application_1551686534719_0001 does not exist.
Can not find any log file matching the pattern: [ALL] for the application: application_1551686534719_0001
Can not find the logs for the application: application_1551686534719_0001 with the appOwner: hadoop
原因是没有启用相关参数
在yarn-site.xml配置文件中定义yarn.log-aggregation-enable 属性的值为true即可。
RMAadmin
[hadoop@node1 ~]$ yarn rmadmin -refreshNodes
19/03/04 17:04:27 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.25.11:8033
刷新节点状态
Ambari 可以自动化部署hadoop,后期了解
网友评论