1. 说明
网上找了篇Hadoop 3.1环境搭建的文章,错误百出,所以决定自己写一个,用作学习和测试环境使用。
2. 软件版本
软件 | 版本 | 下载地址 |
---|---|---|
CentOS | 7.2.1511 x86_64 | |
Hadoop | 3.1.0 | hadoop-3.1.0-src.tar.gz |
JDK | jdk-8u172-linux-x64.rpm | jdk8 |
系统全部在虚拟机中运行。
3. 服务器规划
主机名 | IP地址 | 说明 | 运行进程 |
---|---|---|---|
node1 | 10.211.55.4 | node1节点(master) | NameNode, ResourceManager, SecondaryNameNode |
node2 | 10.211.55.5 | node2节点(worker) | DataNode, NodeManager |
node3 | 10.211.55.6 | node3节点(worker) | DataNode, NodeManager |
4. 服务器环境准备
本章配置每个节点都要进行,并且都以root账号登录服务器进行配置。
4.1 基础环境
# 添加host,把以下内容添加到hosts文件末尾
[root@node1 ~] vi /etc/hosts
10.211.55.4 node1
10.211.55.5 node2
10.211.55.6 node3
# 执行以下命令关闭防火墙
[root@node1 ~]systemctl stop firewalld && systemctl disable firewalld
# 停止SELINUX
[root@node1 ~]setenforce 0
# 将SELINUX的值改成disabled
[root@node1 ~]vi /etc/selinux/config
# 修改SELINUX值为disabled,禁用SELINUX
SELINUX=disabled
# 修改服务器hostname,和hosts文件中定义一致,重启后shell提示为node1
[root@node1 ~]vi /etc/hostname
node1
# 修改服务器shell,hadoop不支持非bash的shell,如果使用了zsh(例如:使用了ohmyzsh),则需要改回bash
[root@node1 ~]chsh -s /bin/bash
# 重启服务器
[root@node1 ~]reboot
4.2 配置免密码登录
使用root账号登录服务器。
# node1执行以下命令
# 生成密钥对,输入之后一直选择enter即可。生成的秘钥位于 ~/.ssh文件夹下
[root@node1 ~]# ssh-keygen -t rsa
[root@node1 ~]# scp ~/.ssh/id_rsa.pub root@node2:~
[root@node1 ~]# scp ~/.ssh/id_rsa.pub root@node3:~
# node2,node3 执行以下命令
[root@node2 ~]# mkdir -p .ssh
[root@node2 ~]# cd .ssh/
[root@node2 .ssh]# cat ~/id_rsa.pub >> authorized_keys
# 三个节点分别执行以下命令,修改权限,否则远程启动的时候会报错
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
4.3 安装JDK
每个节点下载并上传jdk安装程序。
# 安装jdk,默认安装到/usr/java/jdk1.8.0_172-amd64目录下,安装程序同时会创建链接:/usr/java/default和/usr/java/latest,可以用来配置JAVA_HOME
[root@node1 ~]# rpm -ivh jdk-8u172-linux-x64.rpm
# 配置环境变量
[root@node1 ~]# vi ~/.bash_profile
# 在末尾添加
export JAVA_HOME=/usr/java/default
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
# 刷新配置文件
source ~/.bash_profile
5. 安装Hadoop
在node1节点上安装hadoop并进行配置,然后复制到其他节点。使用root账号登录node1,依次执行以下配置。
5.1 安装Hadoop
# 创建目录
[root@node1 opt]# cd /opt/ && mkdir hadoop && cd hadoop
# 解压hadoop-3.1.0.tar.gz
[root@node1 hadoop]# tar xvf hadoop-3.1.0.tar.gz
# 修改环境变量
[root@node1 hadoop]# vi ~/.bash_profile
# 在文件末尾添加
export HADOOP_HOME=/opt/hadoop/hadoop-3.1.0
export PATH=$PATH:$HADOOP_HOME/bin
# 刷新配置文件
[root@node1 hadoop]# source ~/.bash_profile
5.2 修改配置文件
这些配置文件全部位于 /opt/hadoop/hadoop-3.1.0/etc/hadoop 文件夹下。
hadoop-env.sh
#The java implementation to use. By default, this environment
# variable is REQUIRED on ALL platforms except OS X!
# export JAVA_HOME=
export JAVA_HOME=/usr/java/default
core-site.xml
<configuration>
<!-- 指定HDFS主节点(namenode)的通信地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://node1:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储路径,默认值:/tmp/hadoop-${user.name} -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/data/tmp</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<!-- 设置secondarynamenode的http通讯地址,默认值:0.0.0.0:9868,既在当前节点上运行。
这里如果不设置,hadoop会自动把secondary运行在node1上,端口9868;如果设置了,可以放在其他节点上,并指定不同端口,无需masters文件,hadoop会自动把secondary namenode运行在指定的节点上。建议设置。
-->
<!--
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node2:50090</value>
</property>
-->
<!-- 设置namenode存放的路径,默认值:file://${hadoop.tmp.dir}/dfs/name -->
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop/data/name</value>
</property>
<!-- 设置hdfs副本数量,默认值:3 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 设置datanode存放的路径,默认值:file://${hadoop.tmp.dir}/dfs/data -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop/data/datanode</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<!-- 通知框架MR使用YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/opt/hadoop/hadoop-3.1.0/etc/hadoop,
/opt/hadoop/hadoop-3.1.0/share/hadoop/common/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/common/lib/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/hdfs/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/hdfs/lib/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/mapreduce/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/mapreduce/lib/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/yarn/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 默认值:org.apache.hadoop.mapred.ShuffleHandler,可选 -->
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandle</value>
</property>
<!-- 默认值为:0.0.0.0,如果yarn.resourcemanager.address不设置,这个值一定要设置。建议设置。 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value>
</property>
<!-- 默认值:${yarn.resourcemanager.hostname}:8032,可选 -->
<property>
<name>yarn.resourcemanager.address</name>
<value>node1:8040</value>
</property>
<!-- 默认值:${yarn.resourcemanager.hostname}:8031,可选 -->
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node1:8025</value>
</property>
<!-- 默认值:${yarn.resourcemanager.hostname}:8030,可选 -->
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>node1:8030</value>
</property>
</configuration>
在sbin/start-dfs.sh
和sbin/stop-dfs.sh
脚本开始处添加以下代码设置环境变量,设置dfs程序运行的账户。
HDFS_NAMENODE_USER=root
HDFS_DATANODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
在sbin/start-yarn.sh
和sbin/stop-yarn.sh
文件开始处添加以下代码设置环境变量,设置yarn程序运行的账。
YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
workers
[root@node1 hadoop]# touch /opt/hadoop/hadoop-3.1.0/etc/hadoop/workers
[root@node1 hadoop]# vim /opt/hadoop/hadoop-3.1.0/etc/hadoop/workers
#添加
node2
node3
注意:Hadoop 3.1.0中,workers文件名字为workers,不是slaves!
创建文件夹
[root@node1 hadoop]# mkdir -p /opt/hadoop/data/tmp
[root@node1 hadoop]# mkdir -p /opt/hadoop/data/name
[root@node1 hadoop]# mkdir -p /opt/hadoop/data/datanode
复制到其他节点
[root@node1 opt]# scp -r /opt/hadoop node2:/opt/
[root@node1 opt]# scp -r /opt/hadoop node3:/opt/
6. 启动
以root账户登录node1节点,第一次启动需要格式化。
[root@node1 opt]# /opt/hadoop/hadoop-3.1.0/bin/hdfs namenode -format
启动
[root@node1 opt]# /opt/hadoop/hadoop-3.1.0/sbin/start-all.sh
停止
[root@node1 opt]# /opt/hadoop/hadoop-3.1.0/sbin/stop-all.sh
7. 验证
在每个节点上运行jps
命令,查看JAVA进程。
访问yarn,检查Active Nodes,应该有2个,可以查看状态。
http://node1:8088/
访问namenode,检查datanodes,可以看到2个datanode,可以查看状态。
http://node1:9870/
访问secondarynamenode,出现空白hadoop界面。
http://node1:9868/
8. 测试
使用自带的example进行测试,使用root账号登录node1节点。
- 创建HDFS目录,创建好root目录之后,hdfs会自动用当前系统登录的账号作为当前目录,所以无需指定/user/root
cd /opt/hadoop/hadoop-3.1.0
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/root
hdfs dfs -mkdir input
- 将输入文件拷贝到分布式文件系统
hdfs dfs -put etc/hadoop/*.xml input
- 运行提供的示例程序,执行MapReduce任务
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep input output 'dfs[a-z.]+'
- 查看输出文件
将输出文件从分布式文件系统拷贝到本地文件系统查看:
hdfs dfs -get output output
cat output/*
或者直接查看:
hdfs dfs -cat output/*
9. 常见问题
9.1 Hadoop集群格式化hdfs报错:java.net.UnknownHostException: node1: node1: Name or service not known.
执行hdfs namenode -format
命令格式化hdfs的时候报错。
原因:
Hadoop在格式化HDFS的时候,通过hostname命令获取到的主机名是localhost.localdomain,然后在/etc/hosts文件中进行映射的时候,没有找到。
解决方案:
- 修改/etc/hostname文件中的主机名,使其和/etc/hosts文件中的一致:node1
- 重启操作系统
参考:
Hadoop格式化报错java.net.UnknownHostException:
错误日志:
2018-07-03 02:47:48,309 WARN net.DNS: Unable to determine local hostname -falling back to 'localhost'
java.net.UnknownHostException: node1: node1: Name or service not known
at java.net.InetAddress.getLocalHost(InetAddress.java:1505)
at org.apache.hadoop.net.DNS.resolveLocalHostname(DNS.java:283)
at org.apache.hadoop.net.DNS.<clinit>(DNS.java:61)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newBlockPoolID(NNStorage.java:1014)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newNamespaceInfo(NNStorage.java:608)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:169)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1190)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1631)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741)
Caused by: java.net.UnknownHostException: node1: Name or service not known
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getLocalHost(InetAddress.java:1500)
... 8 more
2018-07-03 02:47:48,321 WARN net.DNS: Unable to determine address of the host -falling back to 'localhost' address
java.net.UnknownHostException: node1: node1: Name or service not known
at java.net.InetAddress.getLocalHost(InetAddress.java:1505)
at org.apache.hadoop.net.DNS.resolveLocalHostIPAddress(DNS.java:306)
at org.apache.hadoop.net.DNS.<clinit>(DNS.java:62)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newBlockPoolID(NNStorage.java:1014)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newNamespaceInfo(NNStorage.java:608)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:169)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1190)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1631)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741)
Caused by: java.net.UnknownHostException: node1: Name or service not known
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getLocalHost(InetAddress.java:1500)
... 8 more
2018-07-03 02:47:48,328 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1508911316-127.0.0.1-1530600468322
2018-07-03 02:47:48,328 INFO common.Storage: Will remove files: [/opt/hadoop/data/name/current/fsimage_0000000000000000000, /opt/hadoop/data/name/current/seen_txid, /opt/hadoop/data/name/current/fsimage_0000000000000000000.md5, /opt/hadoop/data/name/current/VERSION]
2018-07-03 02:47:48,336 INFO common.Storage: Storage directory /opt/hadoop/data/name has been successfully formatted.
2018-07-03 02:47:48,346 INFO namenode.FSImageFormatProtobuf: Saving image file /opt/hadoop/data/name/current/fsimage.ckpt_0000000000000000000 using no compression
2018-07-03 02:47:48,420 INFO namenode.FSImageFormatProtobuf: Image file /opt/hadoop/data/name/current/fsimage.ckpt_0000000000000000000 of size 389 bytes saved in 0 seconds .
2018-07-03 02:47:48,428 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2018-07-03 02:47:48,433 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: node1: node1: Name or service not known
************************************************************/
9.2 Hadoop集群启动报错:bash v3.2+ is required. Sorry.
执行命令sbin/start-all.sh
启动Hadoop集群的时候报错bash v3.2+ is required. Sorry.
原因:
系统使用了zsh或者其他非bash的shell,而hadoop的脚本都是使用bash编写的。
解决方案:
修改每个节点的shell为bash: chsh -s /bin/bash
错误消息:
sbin/start-all.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [node1]
Last login: Tue Jul 3 02:53:34 EDT 2018 from 10.211.55.2 on pts/0
bash v3.2+ is required. Sorry.
Starting datanodes
Last login: Tue Jul 3 03:06:41 EDT 2018 on pts/0
bash v3.2+ is required. Sorry.
Starting secondary namenodes [node2]
Last login: Tue Jul 3 03:06:41 EDT 2018 on pts/0
bash v3.2+ is required. Sorry.
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Starting resourcemanager
Last login: Tue Jul 3 03:06:42 EDT 2018 on pts/0
bash v3.2+ is required. Sorry.
Starting nodemanagers
Last login: Tue Jul 3 03:06:44 EDT 2018 on pts/0
bash v3.2+ is required. Sorry.
9.3 Hadoop集群启动报错:Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
ssh登录本机的错误,解决方案如下:
- 执行start_all.sh的节点必须把本机的证书也放到authorized_keys中:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- 每台节点执行以下命令设置权限:
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
9.4 Hadoop集群启动报错:ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
解决方案请参考5.2
在sbin/start-dfs.sh
和sbin/stop-dfs.sh
脚本开始处添加以下代码设置环境变量,设置dfs程序运行的账户。
HDFS_NAMENODE_USER=root
HDFS_DATANODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
在sbin/start-yarn.sh
和sbin/stop-yarn.sh
文件开始处添加以下代码设置环境变量,设置yarn程序运行的账户。
YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
[root@node1 hadoop-3.1.0]# sbin/start-all.sh
Starting namenodes on [node1]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [node2]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Starting resourcemanager
ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
Starting nodemanagers
ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.
网友评论