这篇文章阐述下Hadoop分布式环境的搭建,Hadoop版本采用hadoop-2.6.0-cdh5.7.0,手头有三台机器,即hadoop000/hadoop001/hadoop002,我会把其中一台机器节点分配NameNode和ResourceManager角色,同时这台机器也作为一个数据存储节点分配DataNode和NodeManager角色,另外两台机器仅作为数据存储节点分配DataNode和NodeManager角色。
- hadoop000:NameNode/DataNode ResourceManager/NodeManager
- hadoop001:DataNode NodeManager
- hadoop002:DataNode NodeManager
准备工作
- hostname设置
在三台机器上分别使用sudo vi /etc/sysconfig/network命令修改hostname,比如对第一台机器做如下设置,另外两台同理:
NETWORKING=yes
HOSTNAME=hadoop000 - 配置hostname和ip地址的映射关系,使用sudo vi /etc/hosts对三台机器做如下配置:
192.168.199.102 hadoop000
192.168.199.247 hadoop001
192.168.199.138 hadoop002
前置安装
- ssh免密码登录
在每台机器上执行:ssh-keygen -t rsa
以hadoop000机器为主
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop000
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop001
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop002 - jdk安装
在hadoop000机器上解压jdk安装包,并设置JAVA_HOME到系统环境变量
tar -zxvf jdk-8u131-linux-x64.tar.gz -C ~/app/
设置环境变量
vi ~/.bash_profile
export JAVA_HOME=/home/hadoop/app/jdk1.8.0_131
export PATH=$JAVA_HOME/bin:$PATH
source ~/.bash_profile使之生效
集群安装
-
Hadoop安装
-
在hadoop000机器上解压Hadoop安装包,并设置HADOOP_HOME到系统环境变量
hadoop-env.sh
export JAVA_HOME=/home/hadoop/app/jdk1.7.0_79 -
core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop000:8020</value>
</property> -
hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/app/tmp/dfs/name</value>
</property><property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/app/tmp/dfs/data</value>
</property> -
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property><property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop000</value>
</property> -
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property> -
slaves
hadoop000
hadoop001
hadoop002
-
-
分发安装包和配置文件到hadoop001和hadoop002节点
scp -r ~/app hadoop@hadoop001:~/
scp -r ~/app hadoop@hadoop002:~/
scp ~/.bash_profile hadoop@hadoop001:~/
scp ~/.bash_profile hadoop@hadoop002:~/
在hadoop001和hadoop002机器上让.bash_profile生效 -
对NameNode做格式化:只要在hadoop000上执行即可
bin/hdfs namenode -format -
启动集群:只要在hadoop000上执行即可
sbin/start-all.sh -
验证
jps查看进程:- hadoop000:
SecondaryNameNode
DataNode
NodeManager
NameNode
ResourceManager - hadoop001:
NodeManager
DataNode - hadoop002:
NodeManager
DataNode
webui访问: hadoop000:50070(hdfs) hadoop000:8088(yarn)
- hadoop000:
-
集群停止: stop-all.sh
将Hadoop项目运行到集群中
1)上传数据到hadoop000机器的data目录下
2)上传开发的jar到hadoop000机器的lib目录下
3)需要将数据上传到hdfs
4)在分布式集群上运行我们开发的程序
比如我这里运行官方给的计算Pi的案例:
hadoop jar /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 2 3
网友评论