一、版本
组件名称 | 版本 |
---|---|
CentOS | CentOS-7-x86_64-DVD-1611.iso |
JDK | jdk-8u45-linux-x64.gz |
Hadoop | hadoop-2.6.0-cdh5.15.1.tar.gz |
Zookeeper | zookeeper-3.4.6.tar.gz |
二、主机规划
IP | Host | 安装软件 | 进程 |
---|---|---|---|
192.168.174.121 | hadoop001 | hadoop、zookeeper | NameNode(主) ZKFC JournalNode DataNode ResourceManager(主) JobhistoryServer NodeManager QuorumPeerMain |
192.168.174.122 | hadoop002 | hadoop、zookeeper | NameNode(备) ZKFC JournalNode DataNode ResourceManager(备) NodeManager QuorumPeerMain |
192.168.174.123 | hadoop003 | hadoop、zookeeper | JournalNode DataNode NodeManager QuorumPeerMain |
三、目录规划
用户 | 名称 | 路径 | 说明 |
---|---|---|---|
hadoop | app | /home/hadoop/app | 最终软件安装的目录 |
hadoop | data | /home/hadoop/data | 测试数据 |
hadoop | lib | /home/hadoop/lib | 开发的jar |
hadoop | maven_repos | /home/hadoop/maven_repos | Maven本地仓库 |
hadoop | software | /home/hadoop/software | 软件 |
hadoop | script | /home/hadoop/software | 脚本 |
hadoop | source | /home/hadoop/script | 源码 |
hadoop | tmp | /home/hadoop/tmp | 临时文件夹 |
四、环境安装
1.CentOS7.2、主机名、静态IP、防火墙和可访问外网配置(三台)
详细见本人之前的博客:https://www.jianshu.com/p/482cbff461bf
2.ip 与 hostname 绑定(3三台)
[root@hadoop001 ~]# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.174.121 hadoop001
192.168.174.122 hadoop002
192.168.174.123 hadoop003
3.关闭所有节点的selinux
vi /etc/selinux/config
将SELINUX=enforcing改为SELINUX=disabled
设置后需要重启才能⽣生
4.设置所有节点的时区⼀一致及时钟同步
1.时区设置
[root@hadoop001 ~]# date
Mon Aug 19 10:54:12 CST 2019
[root@hadoop001 ~]# timedatectl
Local time: Mon 2019-08-19 10:55:58 CST
Universal time: Mon 2019-08-19 02:55:58 UTC
RTC time: Mon 2019-08-19 02:55:53
Time zone: Asia/Shanghai (CST, +0800)
NTP enabled: n/a
NTP synchronized: no
RTC in local TZ: no
DST active: n/a
#所有节点设置亚洲上海时区
[root@hadoop001 ~]# timedatectl set-timezone Asia/Shanghai
[root@hadoop002 ~]# timedatectl set-timezone Asia/Shanghai
[root@hadoop003 ~]# timedatectl set-timezone Asia/Shanghai
2.时间设置
#所有节点安装ntp
[root@hadoop001 ~]# yum install -y ntp
#选取hadoop001为ntp的主节点
[root@hadoop001 ~]# vi /etc/ntp.conf
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
#time
server 0.asia.pool.ntp.org
server 1.asia.pool.ntp.org
server 2.asia.pool.ntp.org
server 3.asia.pool.ntp.org
#当外部时间不不可⽤用时,可使⽤用本地硬件时间
server 127.127.1.0 iburst local clock
#允许哪些⽹网段的机器器来同步时间 192.168.174指你的网段
restrict 192.168.174.0 mask 255.255.255.0 nomodify notrap
#开启ntpd及查看状态
[root@hadoop001 ~]# systemctl start ntpd
[root@hadoop001 ~]# systemctl status ntpd
● ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2019-08-19 11:13:20 CST; 19s ago
Process: 9154 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 9155 (ntpd)
CGroup: /system.slice/ntpd.service
└─9155 /usr/sbin/ntpd -u ntp:ntp -g
Aug 19 11:13:20 hadoop001 ntpd[9155]: Listen normally on 2 lo 127.0.0.1 UDP 123
Aug 19 11:13:20 hadoop001 ntpd[9155]: Listen normally on 3 ens33 192.168.174.121 UDP 123
Aug 19 11:13:20 hadoop001 ntpd[9155]: Listen normally on 4 ens33 fe80::f3a2:882:b52f:d0b UDP 123
Aug 19 11:13:20 hadoop001 ntpd[9155]: Listen normally on 5 lo ::1 UDP 123
Aug 19 11:13:20 hadoop001 ntpd[9155]: Listening on routing socket on fd #22 for interface updates
Aug 19 11:13:20 hadoop001 systemd[1]: Started Network Time Service.
Aug 19 11:13:20 hadoop001 ntpd[9155]: 0.0.0.0 c016 06 restart
Aug 19 11:13:20 hadoop001 ntpd[9155]: 0.0.0.0 c012 02 freq_set kernel 0.000 PPM
Aug 19 11:13:20 hadoop001 ntpd[9155]: 0.0.0.0 c011 01 freq_not_set
Aug 19 11:13:21 hadoop001 ntpd[9155]: 0.0.0.0 c514 04 freq_mode
#验证
[root@hadoop001 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
*LOCAL(0) .LOCL. 5 l 12 64 3 0.000 0.000 0.000
send.mx.cdnetwo 216.239.35.8 2 u 11 64 3 48.414 -5021.1 16.238
ntp.gnc.am 42.204.179.159 2 u 9 64 3 377.631 -5058.7 15.043
27.54.120.10 207.148.72.47 3 u 10 64 1 447.310 -5079.8 0.000
202.28.116.236 .INIT. 16 u - 64 0 0.000 0.000 0.000
#其他从节点停⽌止禁⽤用ntpd服务
[root@hadoop002 ~]# systemctl stop ntpd
[root@hadoop002 ~]# systemctl disable ntpd
[root@hadoop002 ~]# /usr/sbin/ntpdate hadoop001
19 Aug 11:17:35 ntpdate[9154]: step time server 192.168.174.121 offset 0.696211 sec
#其他节点每分钟同步hadoop001节点时间
[root@hadoop002 ~]# crontab -e
*/1 * * * * /usr/sbin/ntpdate hadoop001
5.创建hadoop用户、并创建相应目录(三台)
[root@hadoop001 ~]# useradd hadoop
[root@hadoop001 ~]# su - hadoop
[hadoop@hadoop001 ~]$ mkdir app data lib maven_repos software script source tmp
6.安装lrzsz、上传安装包到software目录
[root@hadoop001 ~]# yum install -y lrzsz
[root@hadoop001 ~]# su - hadoop
[hadoop@hadoop001 ~]$ cd software/
[hadoop@hadoop001 software]$ rz -be
7.配置三台机器ssh免密码登入(三台)
[hadoop@hadoop001 software]$ ssh-keygen
[hadoop@hadoop001 .ssh]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop001 .ssh]$ chmod 600 ~/.ssh/authorized_keys
#将所有机器的公钥发送到window本地,将其合成一个authorized_keys文件,然后再发送到所有机器上去替换它自己的authorized_keys文件。
#测试
[hadoop@hadoop001 .ssh]$ ssh hadoop001 date
Mon Aug 19 14:56:54 CST 2019
[hadoop@hadoop001 .ssh]$ ssh hadoop002 date
Mon Aug 19 14:57:01 CST 2019
[hadoop@hadoop001 .ssh]$ ssh hadoop003 date
Mon Aug 19 14:57:08 CST 2019
注意:linux可能识别不了window文件,需要用dos2unix,它是将Windows格式文件转换为Unix、Linux格式的实用命令。
8.安装JDK(三台)
#将安装包发给其他机器
[hadoop@hadoop001 software]$ scp ./* hadoop002:/home/hadoop/software/
hadoop-2.6.0-cdh5.15.1.tar.gz 100% 241MB 120.5MB/s 00:02
jdk-8u45-linux-x64.gz 100% 165MB 165.2MB/s 00:00
zookeeper-3.4.6.tar.gz
#用root用户解压jdk-8u45-linux-x64.gz到/usr/java/目录下
[root@hadoop001 ~]# mkdir /usr/java/
[root@hadoop001 ~]# tar -zxvf /home/hadoop/software/jdk-8u45-linux-x64.gz -C /usr/java/
#修改解压文件的所属用户和所属组
[root@hadoop001 java]# chown -R root:root jdk1.8.0_45/
[root@hadoop001 java]# echo "export JAVA_HOME=/usr/java/jdk1.8.0_45" >> /etc/profile
[root@hadoop001 java]# echo "export PATH=${JAVA_HOME}/bin:${PATH}" >> /etc/profile
[root@hadoop001 java]# source /etc/profile
[root@hadoop001 java]# which java
9.安装zookeeper(三台)
[hadoop@hadoop001 software]$ tar -zxvf zookeeper-3.4.6.tar.gz -C /home/hadoop/app/
[hadoop@hadoop002 app]$ ln -s zookeeper-3.4.6 zookeeper
[hadoop@hadoop001 conf]$ cp zoo_sample.cfg zoo.cfg
[hadoop@hadoop001 conf]$ vi zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/hadoop/data/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=hadoop001:2888:3888
server.2=hadoop002:2888:3888
server.3=hadoop003:2888:3888
"zoo.cfg" 32L, 1023C written
[hadoop@hadoop001 data]$ mkdir zookeeper
#注意 > 符号两边有空格
[hadoop@hadoop001 data]$ echo 1 > /home/hadoop/data/zookeeper/myid
[hadoop@hadoop002 data]$ echo 2 > /home/hadoop/data/zookeeper/myid
[hadoop@hadoop003 data]$ echo 3 > /home/hadoop/data/zookeeper/myid
10.安装hadoop ha(三台)
[hadoop@hadoop001 software]$ tar -zxvf hadoop-2.6.0-cdh5.15.1.tar.gz -C /home/hadoop/app/
[hadoop@hadoop001 app]$ ln -s hadoop-2.6.0-cdh5.15.1 hadoop
配置hadoop-env.sh
[hadoop@hadoop001 hadoop]$ vi hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_45
export HADOOP_CONF_DIR=/home/hadoop/app/hadoop/etc/hadoop
配置core-site.xml ,
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--Yarn 需要使用 fs.defaultFS 指定NameNode URI -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ruozeclusterg7</value>
</property>
<!--==============================Trash机制======================================= -->
<property>
<!--多长时间创建CheckPoint NameNode截点上运行的CheckPointer 从Current文件夹创建CheckPoint;默认:0 由fs.trash.interval项指定 -->
<name>fs.trash.checkpoint.interval</name>
<value>0</value>
</property>
<property>
<!--多少分钟.Trash下的CheckPoint目录会被删除,该配置服务器设置优先级大于客户端,默认:0 不删除 -->
<name>fs.trash.interval</name>
<value>1440</value>
</property>
<!--指定hadoop临时目录, hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这>个路径中 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp/hadoop</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<!--指定ZooKeeper超时间隔,单位毫秒 -->
<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>2000</value>
</property>
<!--第二个hadoop代表你安装hadoop的用户 -->
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec
</value>
</property>
</configuration>
[hadoop@hadoop001 hadoop]$ mkdir /home/hadoop/tmp/hadoop
[hadoop@hadoop001 hadoop]$ chmod 777 /home/hadoop/tmp/hadoop
配置hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--HDFS超级用户 -->
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<!--开启web hdfs -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/data/dfs/name</value>
<description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
</property>
<property>
<name>dfs.namenode.edits.dir</name>
<value>${dfs.namenode.name.dir}</value>
<description>namenode粗放 transaction file(edits)本地目录(需要修改)</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data/dfs/data</value>
<description>datanode存放block本地目录(需要修改)</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 块大小128M (默认128M) -->
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<!--======================================================================= -->
<!--HDFS高可用配置 -->
<!--指定hdfs的nameservice为ruozeclusterg7,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ruozeclusterg7</value>
</property>
<property>
<!--设置NameNode IDs 此版本最大只支持两个NameNode -->
<name>dfs.ha.namenodes.ruozeclusterg7</name>
<value>nn1,nn2</value>
</property>
<!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ruozeclusterg7.nn1</name>
<value>hadoop001:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ruozeclusterg7.nn2</name>
<value>hadoop002:8020</value>
</property>
<!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
<property>
<name>dfs.namenode.http-address.ruozeclusterg7.nn1</name>
<value>hadoop001:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.ruozeclusterg7.nn2</name>
<value>hadoop002:50070</value>
</property>
<!--==================Namenode editlog同步 ============================================ -->
<!--保证数据恢复 -->
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.rpc-address</name>
<value>0.0.0.0:8485</value>
</property>
<property>
<!--设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog -->
<!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/ruozeclusterg7</value>
</property>
<property>
<!--JournalNode存放数据地址 -->
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/data/dfs/jn</value>
</property>
<!--==================DataNode editlog同步 ============================================ -->
<property>
<!--DataNode,Client连接Namenode识别选择Active NameNode策略 -->
<!-- 配置失败自动切换实现方式 -->
<name>dfs.client.failover.proxy.provider.ruozeclusterg7</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--==================Namenode fencing:=============================================== -->
<!--Failover后防止停掉的Namenode启动,造成两个服务 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<!--多少milliseconds 认为fencing失败 -->
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!--==================NameNode auto failover base ZKFC and Zookeeper====================== -->
<!--开启基于Zookeeper -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!--动态许可datanode连接namenode列表 -->
<property>
<name>dfs.hosts</name>
<value>/home/hadoop/app/hadoop/etc/hadoop/slaves</value>
</property>
</configuration>
配置mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 配置 MapReduce Applications -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- JobHistory Server ============================================================== -->
<!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop001:10020</value>
</property>
<!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop001:19888</value>
</property>
<!-- 配置 Map段输出的压缩,snappy-->
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
</configuration>
配置slaves
hadoop001
hadoop002
hadoop003
配置yarn-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- nodemanager 配置 ================================================= -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.localizer.address</name>
<value>0.0.0.0:23344</value>
<description>Address where the localizer IPC is.</description>
</property>
<property>
<name>yarn.nodemanager.webapp.address</name>
<value>0.0.0.0:23999</value>
<description>NM Webapp address.</description>
</property>
<!-- HA 配置 =============================================================== -->
<!-- Resource Manager Configs -->
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 使嵌入式自动故障转移。HA环境启动,与 ZKRMStateStore 配合 处理fencing -->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<!-- 集群名称,确保HA选举时对应的集群 -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!--这里RM主备结点需要单独指定,(可选)
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm2</value>
</property>
-->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<!-- ZKRMStateStore 配置 -->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<property>
<name>yarn.resourcemanager.zk.state-store.address</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<!-- Client访问RM的RPC地址 (applications manager interface) -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>hadoop001:23140</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>hadoop002:23140</value>
</property>
<!-- AM访问RM的RPC地址(scheduler interface) -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>hadoop001:23130</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>hadoop002:23130</value>
</property>
<!-- RM admin interface -->
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>hadoop001:23141</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>hadoop002:23141</value>
</property>
<!--NM访问RM的RPC端口 -->
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>hadoop001:23125</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>hadoop002:23125</value>
</property>
<!-- RM web application 地址 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop001:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop002:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm1</name>
<value>hadoop001:23189</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm2</name>
<value>hadoop002:23189</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop001:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
<discription>单个任务可申请最少内存,默认1024MB</discription>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<discription>单个任务可申请最大内存,默认8192MB</discription>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
</configuration>
将修改的文件传输到其他机器上去
[hadoop@hadoop001 hadoop]$ scp *.xml slaves hadoop-env.sh hadoop002:/home/hadoop/app/hadoop/etc/hadoop/
[hadoop@hadoop001 hadoop]$ scp *.xml slaves hadoop-env.sh hadoop003:/home/hadoop/app/hadoop/etc/hadoop/
12.启动集群
1.启动zookeeper
command: ./zkServer.sh start|stop|status
2.启动hadoop ha
-
格式化前 , 先在 journalnode 节点机器上先启动 JournalNode进程
hadoop-daemon.sh start journalnode
-
NameNode格式化
hadoop namenode -format
-
同步NameNode元数据
同步 hadoop001 元数据到 hadoop002
[hadoop@hadoop001 dfs]$ scp -r name hadoop002:/home/hadoop/data/dfs/
-
初始化 ZFCK
hdfs zkfc -formatZK
19/08/19 17:42:09 INFO ha.ActiveStandbyElector: Session connected.
19/08/19 17:42:09 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/ruozeclusterg7 in ZK.
19/08/19 17:42:09 INFO zookeeper.ZooKeeper: Session: 0x26ca8e014380000 closed
19/08/19 17:42:09 INFO zookeeper.ClientCnxn: EventThread shut down
19/08/19 17:42:09 INFO tools.DFSZKFailoverController: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DFSZKFailoverController at hadoop001/192.168.174.121
************************************************************/
-
启动 HDFS 分布式存储系统
start-dfs.sh
-
启动 YARN框架
start-yarn.sh
[hadoop@hadoop001 current]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/yarn-hadoop-resourcemanager-hadoop001.out
hadoop003: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/yarn-hadoop-nodemanager-hadoop003.out
hadoop002: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/yarn-hadoop-nodemanager-hadoop002.out
hadoop001: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/yarn-hadoop-nodemanager-hadoop001.out
- hadoop002 备机启动 RM
[hadoop@hadoop002 dfs]$ yarn-daemon.sh start resourcemanager
13.关闭集群
- 关闭 Hadoop(YARN-->HDFS)
[hadoop@hadoop001 sbin]# stop-yarn.sh
[hadoop@hadoop002 sbin]# yarn-daemon.sh stop resourcemanager [hadoop@hadoop001 sbin]# stop-dfs.sh
- 关闭 Zookeeper
[hadoop@hadoop001 bin]# zkServer.sh stop
[hadoop@hadoop002 bin]# zkServer.sh stop
[hadoop@hadoop003 bin]# zkServer.sh stop
14.再次开启集群
- 启动 Zookeeper
[hadoop@hadoop001 bin]# zkServer.sh start
[hadoop@hadoop002 bin]# zkServer.sh start
[hadoop@hadoop003 bin]# zkServer.sh start
- 启动 Hadoop(HDFS-->YARN)
[hadoop@hadoop001 sbin]# start-dfs.sh
[hadoop@hadoop001 sbin]# start-yarn.sh
[hadoop@hadoop002 sbin]# yarn-daemon.sh start resourcemanager
[hadoop@hadoop001 ~]# mr-jobhistory-daemon.sh start historyserver
15.监控集群
[root@hadoop001 ~]# hdfs dfsadmin -report
HDFS:http://192.168.174.121:50070/
HDFS:http://192.168.174.122:50070/
ResourceManger(Active):http://192.168.174.121:8088 ResourceManger(Standby):http://192.168.174.121:8088/cluster/cluster
JobHistory:http://192.168.174.121:19888/jobhistory
网友评论