美文网首页
搭建hadoop集群环境

搭建hadoop集群环境

作者: 大龄程序员在帝都 | 来源:发表于2017-04-04 18:35 被阅读218次

目标:
搭建hadoop集群环境

首先准备3台机器,设置免密码登陆:

一、免密码登陆

这个集群只用到了三台环境,但是我建立了四台机器,最后一台没用,便于以后用
1、为三台服务器设置名称,分别为node1、node2、node3、node4

vim /etc/hosts
#输入:
106.75.xxx.213 node1
106.75.xxx.203 node2
106.75.xxx.162 node3
106.75.xxx.52 node4

2、进行SSH免密码登陆设置
在每台服务器上都执行如下命令

$ssh-keygen
#执行后会有多个输入提示,不用输入任何内容,全部直接回车即可
ssh-copy-id -i /root/.ssh/id_rsa  root@node1
ssh-copy-id -i /root/.ssh/id_rsa  root@node2
ssh-copy-id -i /root/.ssh/id_rsa  root@node3
ssh-copy-id -i /root/.ssh/id_rsa  root@node4

二、安装Java环境

1、更新centos 系统的安装包,不更新也可以

yum update

2、安装下载jdk安装包
上传jdk包到/root/ww目录下,然后进行解压,最后编辑/etc/profile文件末尾添加

export GREP_OPTIONS=--color=auto
export JAVA_HOME=/root/ww/jdk1.8.0_121
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

执行完成以后,检查java版本

java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

以上证明java环境安装完成,版本为1.8

三、安装Hadoop

安装版本为: 2.7.3
1、每一台机器上安装如下环境

$ cd /home
$ wget http://apache.fayea.com/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
$ tar -xzf hadoop-2.7.3.tar.gz
$ mv hadoop-2.7.3 hadoop
$ cd hadoop
$ mkdir tmp hdfs
$ mkdir hdfs/data hdfs/name

2、配置core-site.xml

修改如下配置文件:

vim /home/hadoop/etc/hadoop/core-site.xml

在<configuration>标签下添加内容为:

<property>
        <name>fs.defaultFS</name>
        <value>hdfs://node1:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/home/hadoop/tmp</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131702</value>
    </property>

3、配置: hdfs-site.xml

修改如下配置文件:

vim /home/hadoop/etc/hadoop/hdfs-site.xml

在<configuration>标签下添加:

<property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/home/hadoop/hdfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/home/hadoop/hdfs/data</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>node1:9001</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>

4、配置mapred-site.xml

执行如下命令进行复制:

 cp mapred-site.xml.template  mapred-site.xml

复制完成以后编辑: mapred-site.xml文件

vim  /home/hadoop/etc/hadoop/mapred-site.xml

在对应的 <configuratiom>标签中添加如下内容:

<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>node1:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>node1:19888</value>
    </property>

5、配置

编辑如下配置文件

vim /home/hadoop/etc/hadoop/yarn-site.xml

在<configuration>中添加如下配置:

<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>node1:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>node1:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>node1:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>node1:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>node1:8088</value>
    </property>

6、配置 slaves

vim  /home/hadoop/etc/hadoop/slaves

添加如下内容:

node2
node3

7、配置 hadoop-env.sh

vim  /home/hadoop/etc/hadoop/hadoop-env.sh

把export JAVA_HOME=${JAVA_HOME}修改为自己的JAVA_HOME的绝对路径
我的修改为:

/root/ww/jdk1.8.0_121

8、把以上在master上配置的hadoop文件复制到node2 和node3上
在node1节点上执行:

$ scp -r /home/hadoop node2:/home
$ scp -r /home/hadoop node3:/home

9、设置hadoop环境变量
在每台服务器上执行

vim ~/.bashrc

添加如下代码

export PATH=$PATH:/home/hadoop/bin:/home/hadoop/sbin

执行生效:

source ~/.bashrc

配置完成以后开始最后一步

10、启动hadoop,见证hadoop
在node1启动hadoop,从节点会自动启动 (node1为我们的主节点,node2、node3为我们的从节点)

首先进行初始化

 hdfs namenode -format

然后进行启动:

hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode
start-dfs.sh
 start-yarn.sh
mr-jobhistory-daemon.sh start historyserver

启动日志如下:

Paste_Image.png

11、查看三台服务器的状态;
在 node1执行,node1相当于我们的主节点

查看主节点node1:

jps

查看主节点对应的进程是否起来:如下已经起来:

第一个节点也就是主节点进程启动

查看从节点进程是否起来

第二个节点启动了

第三个节点启动

Paste_Image.png

三个节点启动完成,hadoop集群搭建完成

REF
Hadoop集群搭建

遇见问题以及解决


搭建遇见问题:
之前我的vim /etc/hosts文件下:
配置的是外网地址映射节点名称:如下:

106.75.xxx.213 node1
106.75.xxx.203 node2
106.75.xxx.162 node3
106.75.xxx.52 node4

但是这样启动namenode节点时,总是出现问题,报如下错误:

Caused by: java.net.BindException: 无法指定被请求的地址
    at sun.nio.ch.Net.bind0(Native Method)
    at sun.nio.ch.Net.bind(Net.java:433)
    at sun.nio.ch.Net.bind(Net.java:425)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
    at org.apache.hadoop.ipc.Server.bind(Server.java:408)

解决方法:换成对应的内容地址就可以了,三个节点都换成如下

10.9.167.99 node1
10.9.186.20 node2
10.9.107.131 node3
10.9.94.139 node4

目前没有太好的解决方案,另外也不清楚哪里出了问题,但是如果配置成内容,后期外网地址可以访问吗?
刚才试了一下,是可以访问的,真是太好了。
这个解决思路参考

REF
无法指定请求地址解决参考

访问页面验证集群搭建成功:

hadoop集群页面

hadoop上面跑的应用情况页面访问:

应用访问

后续分析这些配置文件的意义,还有验证hdfs和mapreduce

上传文件到hdfs上
创建目录

hdfs dfs -mkdir -p /user/hadoop/input

上传文件,一些配置文件上传到刚刚创建的目录中

hdfs dfs -put /home/hadoop/etc/hadoop/kms*.xml /user/hadoop/input

上传完成以后,在hdfs上可以看到:

Paste_Image.png

执行hadoop给出的官方例子中的例子:

hadoop jar /home/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep /user/hadoop/input /user/hadoop/output 'dfs[a-z.]+'

执行输出结果:

[root@node1 ~]# hadoop jar /home/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep /user/hadoop/input /user/hadoop/output 'dfs[a-z.]+'
17/04/08 12:21:47 INFO client.RMProxy: Connecting to ResourceManager at node1/10.9.167.99:8032
17/04/08 12:21:49 INFO input.FileInputFormat: Total input paths to process : 2
17/04/08 12:21:49 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at java.lang.Thread.join(Thread.java:1249)
    at java.lang.Thread.join(Thread.java:1323)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
17/04/08 12:21:50 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at java.lang.Thread.join(Thread.java:1249)
    at java.lang.Thread.join(Thread.java:1323)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
17/04/08 12:21:50 INFO mapreduce.JobSubmitter: number of splits:2
17/04/08 12:21:50 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at java.lang.Thread.join(Thread.java:1249)
    at java.lang.Thread.join(Thread.java:1323)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
17/04/08 12:21:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491301159371_0001
17/04/08 12:21:51 INFO impl.YarnClientImpl: Submitted application application_1491301159371_0001
17/04/08 12:21:51 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1491301159371_0001/
17/04/08 12:21:51 INFO mapreduce.Job: Running job: job_1491301159371_0001
17/04/08 12:22:03 INFO mapreduce.Job: Job job_1491301159371_0001 running in uber mode : false
17/04/08 12:22:03 INFO mapreduce.Job:  map 0% reduce 0%
17/04/08 12:22:10 INFO mapreduce.Job:  map 50% reduce 0%
17/04/08 12:22:12 INFO mapreduce.Job:  map 100% reduce 0%
17/04/08 12:22:17 INFO mapreduce.Job:  map 100% reduce 100%
17/04/08 12:22:18 INFO mapreduce.Job: Job job_1491301159371_0001 completed successfully
17/04/08 12:22:18 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=6
        FILE: Number of bytes written=356951
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=9255
        HDFS: Number of bytes written=86
        HDFS: Number of read operations=9
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=10043
        Total time spent by all reduces in occupied slots (ms)=4617
        Total time spent by all map tasks (ms)=10043
        Total time spent by all reduce tasks (ms)=4617
        Total vcore-milliseconds taken by all map tasks=10043
        Total vcore-milliseconds taken by all reduce tasks=4617
        Total megabyte-milliseconds taken by all map tasks=10284032
        Total megabyte-milliseconds taken by all reduce tasks=4727808
    Map-Reduce Framework
        Map input records=308
        Map output records=0
        Map output bytes=0
        Map output materialized bytes=12
        Input split bytes=226
        Combine input records=0
        Combine output records=0
        Reduce input groups=0
        Reduce shuffle bytes=12
        Reduce input records=0
        Reduce output records=0
        Spilled Records=0
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=265
        CPU time spent (ms)=1430
        Physical memory (bytes) snapshot=508633088
        Virtual memory (bytes) snapshot=6334361600
        Total committed heap usage (bytes)=307437568
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=9029
    File Output Format Counters
        Bytes Written=86
17/04/08 12:22:19 INFO client.RMProxy: Connecting to ResourceManager at node1/10.9.167.99:8032
17/04/08 12:22:19 INFO input.FileInputFormat: Total input paths to process : 1
17/04/08 12:22:19 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at java.lang.Thread.join(Thread.java:1249)
    at java.lang.Thread.join(Thread.java:1323)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
17/04/08 12:22:19 INFO mapreduce.JobSubmitter: number of splits:1
17/04/08 12:22:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491301159371_0002
17/04/08 12:22:19 INFO impl.YarnClientImpl: Submitted application application_1491301159371_0002
17/04/08 12:22:19 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1491301159371_0002/
17/04/08 12:22:19 INFO mapreduce.Job: Running job: job_1491301159371_0002
17/04/08 12:22:32 INFO mapreduce.Job: Job job_1491301159371_0002 running in uber mode : false
17/04/08 12:22:32 INFO mapreduce.Job:  map 0% reduce 0%
17/04/08 12:22:38 INFO mapreduce.Job:  map 100% reduce 0%
17/04/08 12:22:45 INFO mapreduce.Job:  map 100% reduce 100%
17/04/08 12:22:45 INFO mapreduce.Job: Job job_1491301159371_0002 completed successfully
17/04/08 12:22:45 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=6
        FILE: Number of bytes written=236927
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=212
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=7
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=4179
        Total time spent by all reduces in occupied slots (ms)=4107
        Total time spent by all map tasks (ms)=4179
        Total time spent by all reduce tasks (ms)=4107
        Total vcore-milliseconds taken by all map tasks=4179
        Total vcore-milliseconds taken by all reduce tasks=4107
        Total megabyte-milliseconds taken by all map tasks=4279296
        Total megabyte-milliseconds taken by all reduce tasks=4205568
    Map-Reduce Framework
        Map input records=0
        Map output records=0
        Map output bytes=0
        Map output materialized bytes=6
        Input split bytes=126
        Combine input records=0
        Combine output records=0
        Reduce input groups=0
        Reduce shuffle bytes=6
        Reduce input records=0
        Reduce output records=0
        Spilled Records=0
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=148
        CPU time spent (ms)=830
        Physical memory (bytes) snapshot=304050176
        Virtual memory (bytes) snapshot=4223426560
        Total committed heap usage (bytes)=170004480
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=86
    File Output Format Counters
        Bytes Written=0
[root@node1 ~]#

注意:这个异常是hadoop本身的bug,这个执行是成功的,可以通过日志看出来


Paste_Image.png

相关文章

网友评论

      本文标题:搭建hadoop集群环境

      本文链接:https://www.haomeiwen.com/subject/isaaottx.html