（三）Hadoop集群环境搭建(完全分布式)

作者: 小猪Harry | 来源:发表于2018-09-10 23:32 被阅读0次

hadoop集群环境搭建之完全分布式集群环境搭建（二）
hadoop 集群
HBase学习笔记二：全分布式搭建
Spark On YARN 集群安装部署
实验室Hadoop环境搭建
Flink1.8 集群搭建完全指南(4)：Hadoop完全分布式
大数据平台Hadoop的分布式集群环境搭建
hadoop及其组件安装
hadoop集群环境搭建之伪分布式集群环境搭建（一）
Spark的安装及配置

克隆三个主机，修改主机名分别为hadoop01，hadoop02，hadoop03：

[root@hadoop01 ~]# hostname
hadoop01
[root@hadoop01 ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop01
[root@hadoop01 ~]# vi /etc/sysconfig/network
[root@hadoop01 ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop02
[root@hadoop01 ~]# reboot

配置三台机器：

[root@hadoop01 ~]# vi /etc/hosts
[root@hadoop01 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.216.135     hadoop01
192.168.216.136     hadoop02
192.168.216.137     hadoop03

服务器功能规划

hadoop01	hadoop02	hadoop03
NameNode
DataNode	DataNode	DataNode
NodeManager	NodeManager	NodeManager
HistoryServer	ResourceManager	SecondaryNameNode

1，在第一台机器上安装新的Hadoop
为了和之前机器上安装伪分布式Hadoop区分开来，我们将第一台机器上的Hadoop服务都停止掉，然后在一个新的目录/opt/modules/app下安装另外一个Hadoop。我们采用先在第一台机器上解压、配置Hadoop，然后再分发到其他两台机器上的方式来安装集群。

2，解压Hadoop目录

3，配置Hadoop JDK路径修改hadoop-env.sh、mapred-env.sh、yarn-env.sh文件中的JDK路径

4，配置core-site.xml

[root@hadoop01 hadoop]# vi core-site.xml 
[root@hadoop01 hadoop]# cat core-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
   <name>fs.defaultFS</name>
   <value>hdfs://hadoop01:8020</value>
 </property>
 <property>
   <name>hadoop.tmp.dir</name>
   <value>/opt/modules/app/hadoop-2.5.0/data/tmp</value>
 </property>
</configuration>
[root@hadoop01 hadoop]#

fs.defaultFS为NameNode的地址。

hadoop.tmp.dir为hadoop临时目录的地址，默认情况下，NameNode和DataNode的数据文件都会存在这个目录下的对应子目录下。应该保证此目录是存在的，如果不存在，先创建。

5，配置hdfs-site.xml

[root@hadoop01 hadoop]# vi hdfs-site.xml 
[root@hadoop01 hadoop]# cat hdfs-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
   <name>dfs.namenode.secondary.http-address</name>
   <value>hadoop03:50090</value>
 </property>
</configuration>

dfs.namenode.secondary.http-address是指定secondaryNameNode的http访问地址和端口号，因为在规划中，我们将hadoop03规划为SecondaryNameNode服务器。

6，配置slaves

[root@hadoop01 hadoop]# vi /opt/modules/app/hadoop/etc/hadoop/slaves 
[root@hadoop01 hadoop]# cat /opt/modules/app/hadoop/etc/hadoop/slaves 
hadoop01
hadoop02
hadoop03

slaves文件是指定HDFS上有哪些DataNode节点。

7，配置yarn-site.xml

[root@hadoop01 hadoop]# vi yarn-site.xml 
[root@hadoop01 hadoop]# cat yarn-site.xml 
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop02</value>
    </property>
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>106800</value>
    </property>

</configuration>

根据规划yarn.resourcemanager.hostname这个指定resourcemanager服务器指向hadoop02。

yarn.log-aggregation-enable是配置是否启用日志聚集功能。

yarn.log-aggregation.retain-seconds是配置聚集的日志在HDFS上最多保存多长时间。

8，配置mapred-site.xml

[root@hadoop01 hadoop]# vi mapred-site.xml
[root@hadoop01 hadoop]# cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop01:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop01:19888</value>
    </property>

</configuration>

mapreduce.framework.name设置mapreduce任务运行在yarn上。

mapreduce.jobhistory.address是设置mapreduce的历史服务器安装在hadoop01机器上。

mapreduce.jobhistory.webapp.address是设置历史服务器的web页面地址和端口号。

9，设置SSH无密码登录

Hadoop集群中的各个机器间会相互地通过SSH访问，每次访问都输入密码是不现实的，所以要配置各个机器间的SSH是无密码登录的。

a. 在hadoop01上生成公钥

[root@hadoop01 hadoop]# ssh-keygen -t rsa

Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
6c:c6:80:64:00:ec:ab:b0:94:21:71:2e:a8:8b:c2:40 root@hadoop01
The key's randomart image is:
+--[ RSA 2048]----+
|o...o            |
|...o .           |
|o+  . .          |
|+E.    +         |
|+.+     S        |
|++     o         |
|Bo               |
|*.               |
|.                |
+-----------------+

一路回车，都设置为默认值，然后再当前用户的Home目录下的.ssh目录中会生成公钥文件（id_rsa.pub）和私钥文件（id_rsa）。

b. 分发公钥

[root@hadoop01 hadoop]# yum install -y openssh-clients

[root@hadoop01 hadoop]# ssh-copy-id hadoop01
The authenticity of host 'hadoop01 (192.168.216.135)' can't be established.
RSA key fingerprint is bd:5c:85:99:82:b4:b9:9d:92:fa:35:48:63:e1:5c:ce.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop01,192.168.216.135' (RSA) to the list of known hosts.
root@hadoop01's password: 
Now try logging into the machine, with "ssh 'hadoop01'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[root@hadoop01 hadoop]# ssh-copy-id hadoop02
[root@hadoop01 hadoop]# ssh-copy-id hadoop03

同样的在hadoop02、hadoop03上生成公钥和私钥后，将公钥分发到三台机器上。

分发Hadoop文件

1，首先在其他两台机器上创建存放Hadoop的目录

[root@hadoop02 ~]# mkdir -p /opt/modules/app
[root@hadoop03 ~]# mkdir -p /opt/modules/app

2，通过Scp分发
Hadoop根目录下的share/doc目录是存放的hadoop的文档，文件相当大，建议在分发之前将这个目录删除掉，可以节省硬盘空间并能提高分发的速度。

[root@hadoop01 hadoop]# du -sh /opt/modules/app/hadoop/share/doc
[root@hadoop01 hadoop]# rm -rf /opt/modules/app/hadoop/share/doc/
[root@hadoop01 hadoop]# scp -r /opt/modules/app/hadoop/ hadoop02:/opt/modules/app
[root@hadoop01 hadoop]# scp -r /opt/modules/app/hadoop/ hadoop03:/opt/modules/app

3，格式NameNode
在NameNode机器上执行格式化：

[root@hadoop01 hadoop]# /opt/modules/app/hadoop/bin/hdfs namenode -format

如果需要重新格式化NameNode,需要先将原来NameNode和DataNode下的文件全部删除，不然会报错，NameNode和DataNode所在目录是在core-site.xml中hadoop.tmp.dir、dfs.namenode.name.dir、dfs.datanode.data.dir属性配置的。

因为每次格式化，默认是创建一个集群ID，并写入NameNode和DataNode的VERSION文件中（VERSION文件所在目录为dfs/name/current 和 dfs/data/current），重新格式化时，默认会生成一个新的集群ID,如果不删除原来的目录，会导致namenode中的VERSION文件中是新的集群ID,而DataNode中是旧的集群ID，不一致时会报错。

另一种方法是格式化时指定集群ID参数，指定为旧的集群ID。

启动集群

[root@hadoop01 sbin]# /opt/modules/app/hadoop/sbin/start-dfs.sh
18/09/11 07:07:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hadoop01]
hadoop01: starting namenode, logging to /opt/modules/app/hadoop/logs/hadoop-root-namenode-hadoop01.out
hadoop03: starting datanode, logging to /opt/modules/app/hadoop/logs/hadoop-root-datanode-hadoop03.out
hadoop02: starting datanode, logging to /opt/modules/app/hadoop/logs/hadoop-root-datanode-hadoop02.out
hadoop01: starting datanode, logging to /opt/modules/app/hadoop/logs/hadoop-root-datanode-hadoop01.out
Starting secondary namenodes [hadoop03]
hadoop03: starting secondarynamenode, logging to /opt/modules/app/hadoop/logs/hadoop-root-secondarynamenode-hadoop03.out
18/09/11 07:07:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@hadoop01 sbin]# 



[root@hadoop01 sbin]# jps
3185 Jps
2849 NameNode
2974 DataNode
[root@hadoop02 ~]# jps
2305 Jps
2227 DataNode
[root@hadoop03 ~]# jps
2390 Jps
2312 SecondaryNameNode
2217 DataNode

启动yarn

[root@hadoop01 sbin]# /opt/modules/app/hadoop/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-resourcemanager-hadoop01.out
hadoop02: starting nodemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-nodemanager-hadoop02.out
hadoop03: starting nodemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-nodemanager-hadoop03.out
hadoop01: starting nodemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-nodemanager-hadoop01.out
[root@hadoop01 sbin]# jps
3473 Jps
3329 NodeManager
2849 NameNode
2974 DataNode
[root@hadoop01 sbin]# 

[root@hadoop02 ~]# jps
2337 NodeManager
2227 DataNode
2456 Jps
[root@hadoop02 ~]# 

[root@hadoop03 ~]# jps
2547 Jps
2312 SecondaryNameNode
2217 DataNode
2428 NodeManager
[root@hadoop03 ~]#

在hadoop02上启动ResourceManager:

[root@hadoop02 ~]# /opt/modules/app/hadoop/sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-resourcemanager-hadoop02.out
[root@hadoop02 ~]# jps
2337 NodeManager
2227 DataNode
2708 Jps
2484 ResourceManager
[root@hadoop02 ~]#

启动日志服务器

因为我们规划的是在hadoop03服务器上运行MapReduce日志服务，所以要在hadoop03上启动。

[root@hadoop03 ~]# /opt/modules/app/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /opt/modules/app/hadoop/logs/mapred-root-historyserver-hadoop03.out
[root@hadoop03 ~]# jps
2312 SecondaryNameNode
2217 DataNode
2602 JobHistoryServer
2428 NodeManager
2639 Jps
[root@hadoop03 ~]#

配置windows里面的host

查看HDFS Web页面

hadoop01:50070

查看YARN Web 页面

hadoop02:8088

测试Job

我们这里用hadoop自带的wordcount例子来在本地模式下测试跑mapreduce。

1、准备mapreduce输入文件wc.input

[hadoop@bigdata-senior01 modules]$ cat /opt/data/wc.input
hadoop mapreduce hive
hbase spark storm
sqoop hadoop hive
spark hadoop

2、在HDFS创建输入目录input

[hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/hdfs dfs -mkdir /input

3、将wc.input上传到HDFS

[hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/hdfs dfs -put /opt/data/wc.input /input/wc.input

4、运行hadoop自带的mapreduce Demo

[hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /input/wc.input /output

5、查看输出文件

[hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/hdfs dfs -ls /output
Found 2 items
-rw-r--r--   3 hadoop supergroup          0 2016-07-14 16:36 /output/_SUCCESS
-rw-r--r--   3 hadoop supergroup         60 2016-07-14 16:36 /output/part-r-00000

hadoop集群环境搭建之完全分布式集群环境搭建（二）
hadoop集群环境搭建之完全分布式集群环境搭建（二）我们在上一篇 hadoop集群环境搭建之伪分布式集群环境搭...
hadoop 集群
标签：Hadoop 搭建分布式集群环境 MapReduce YARN HDFS 分布式环境搭建之环...
HBase学习笔记二：全分布式搭建
服务器准备本文搭建完全分布式Hbase在【Hadoop学习笔记三：高可用集群搭建（Hadoop2.x）】http...
Spark On YARN 集群安装部署
本文展示了在之前搭建的Hadoop分布式集群的基础上如何搭建Spark分布式集群环境一、已有环境 ubuntu ...
实验室Hadoop环境搭建
Hadoop分布式集群搭建流程记录软件环境 Hadoop版本：hadoop-2.7.2.tar.gz JAVA版...
Flink1.8 集群搭建完全指南(4)：Hadoop完全分布式
前面的准备工作做好之后，我们来搭建带Kerberos和SASL的完全分布式的Hadoop集群。 1. 集群环境准备...
大数据平台Hadoop的分布式集群环境搭建
大数据平台Hadoop的分布式集群环境搭建 1 概述本文章介绍大数据平台Hadoop的分布式环境搭建、以下为Ha...
hadoop及其组件安装
一、hadoop安装所需环境操作系统集群配置搭建3节点完全分布式集群，即1个nameNode，2个dataNo...
hadoop集群环境搭建之伪分布式集群环境搭建（一）
hadoop集群环境搭建之伪分布式集群环境搭建（一） 1、Linux基本环境配置 1.1 虚拟机网络模式选择NAT...
Spark的安装及配置
1 安装说明在安装spark之前，需要安装hadoop集群环境，如果没有可以查看：Hadoop分布式集群的搭建 ...