克隆三个主机,修改主机名分别为hadoop01,hadoop02,hadoop03:
[root@hadoop01 ~]# hostname
hadoop01
[root@hadoop01 ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop01
[root@hadoop01 ~]# vi /etc/sysconfig/network
[root@hadoop01 ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop02
[root@hadoop01 ~]# reboot
配置三台机器:
[root@hadoop01 ~]# vi /etc/hosts
[root@hadoop01 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.216.135 hadoop01
192.168.216.136 hadoop02
192.168.216.137 hadoop03
服务器功能规划
hadoop01 | hadoop02 | hadoop03 |
---|---|---|
NameNode | ||
DataNode | DataNode | DataNode |
NodeManager | NodeManager | NodeManager |
HistoryServer | ResourceManager | SecondaryNameNode |
1,在第一台机器上安装新的Hadoop
为了和之前机器上安装伪分布式Hadoop区分开来,我们将第一台机器上的Hadoop服务都停止掉,然后在一个新的目录/opt/modules/app下安装另外一个Hadoop。我们采用先在第一台机器上解压、配置Hadoop,然后再分发到其他两台机器上的方式来安装集群。
2,解压Hadoop目录
3,配置Hadoop JDK路径修改hadoop-env.sh、mapred-env.sh、yarn-env.sh文件中的JDK路径
4,配置core-site.xml
[root@hadoop01 hadoop]# vi core-site.xml
[root@hadoop01 hadoop]# cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/modules/app/hadoop-2.5.0/data/tmp</value>
</property>
</configuration>
[root@hadoop01 hadoop]#
fs.defaultFS为NameNode的地址。
hadoop.tmp.dir为hadoop临时目录的地址,默认情况下,NameNode和DataNode的数据文件都会存在这个目录下的对应子目录下。应该保证此目录是存在的,如果不存在,先创建。
5,配置hdfs-site.xml
[root@hadoop01 hadoop]# vi hdfs-site.xml
[root@hadoop01 hadoop]# cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop03:50090</value>
</property>
</configuration>
dfs.namenode.secondary.http-address是指定secondaryNameNode的http访问地址和端口号,因为在规划中,我们将hadoop03规划为SecondaryNameNode服务器。
6,配置slaves
[root@hadoop01 hadoop]# vi /opt/modules/app/hadoop/etc/hadoop/slaves
[root@hadoop01 hadoop]# cat /opt/modules/app/hadoop/etc/hadoop/slaves
hadoop01
hadoop02
hadoop03
slaves文件是指定HDFS上有哪些DataNode节点。
7,配置yarn-site.xml
[root@hadoop01 hadoop]# vi yarn-site.xml
[root@hadoop01 hadoop]# cat yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop02</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>106800</value>
</property>
</configuration>
根据规划yarn.resourcemanager.hostname这个指定resourcemanager服务器指向hadoop02。
yarn.log-aggregation-enable是配置是否启用日志聚集功能。
yarn.log-aggregation.retain-seconds是配置聚集的日志在HDFS上最多保存多长时间。
8,配置mapred-site.xml
[root@hadoop01 hadoop]# vi mapred-site.xml
[root@hadoop01 hadoop]# cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop01:19888</value>
</property>
</configuration>
mapreduce.framework.name设置mapreduce任务运行在yarn上。
mapreduce.jobhistory.address是设置mapreduce的历史服务器安装在hadoop01机器上。
mapreduce.jobhistory.webapp.address是设置历史服务器的web页面地址和端口号。
9,设置SSH无密码登录
Hadoop集群中的各个机器间会相互地通过SSH访问,每次访问都输入密码是不现实的,所以要配置各个机器间的SSH是无密码登录的。
a. 在hadoop01上生成公钥
[root@hadoop01 hadoop]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
6c:c6:80:64:00:ec:ab:b0:94:21:71:2e:a8:8b:c2:40 root@hadoop01
The key's randomart image is:
+--[ RSA 2048]----+
|o...o |
|...o . |
|o+ . . |
|+E. + |
|+.+ S |
|++ o |
|Bo |
|*. |
|. |
+-----------------+
一路回车,都设置为默认值,然后再当前用户的Home目录下的.ssh目录中会生成公钥文件(id_rsa.pub)和私钥文件(id_rsa)。
b. 分发公钥
[root@hadoop01 hadoop]# yum install -y openssh-clients
[root@hadoop01 hadoop]# ssh-copy-id hadoop01
The authenticity of host 'hadoop01 (192.168.216.135)' can't be established.
RSA key fingerprint is bd:5c:85:99:82:b4:b9:9d:92:fa:35:48:63:e1:5c:ce.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop01,192.168.216.135' (RSA) to the list of known hosts.
root@hadoop01's password:
Now try logging into the machine, with "ssh 'hadoop01'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
[root@hadoop01 hadoop]# ssh-copy-id hadoop02
[root@hadoop01 hadoop]# ssh-copy-id hadoop03
同样的在hadoop02、hadoop03上生成公钥和私钥后,将公钥分发到三台机器上。
分发Hadoop文件
1,首先在其他两台机器上创建存放Hadoop的目录
[root@hadoop02 ~]# mkdir -p /opt/modules/app
[root@hadoop03 ~]# mkdir -p /opt/modules/app
2,通过Scp分发
Hadoop根目录下的share/doc目录是存放的hadoop的文档,文件相当大,建议在分发之前将这个目录删除掉,可以节省硬盘空间并能提高分发的速度。
[root@hadoop01 hadoop]# du -sh /opt/modules/app/hadoop/share/doc
[root@hadoop01 hadoop]# rm -rf /opt/modules/app/hadoop/share/doc/
[root@hadoop01 hadoop]# scp -r /opt/modules/app/hadoop/ hadoop02:/opt/modules/app
[root@hadoop01 hadoop]# scp -r /opt/modules/app/hadoop/ hadoop03:/opt/modules/app
3,格式NameNode
在NameNode机器上执行格式化:
[root@hadoop01 hadoop]# /opt/modules/app/hadoop/bin/hdfs namenode -format
如果需要重新格式化NameNode,需要先将原来NameNode和DataNode下的文件全部删除,不然会报错,NameNode和DataNode所在目录是在core-site.xml中hadoop.tmp.dir、dfs.namenode.name.dir、dfs.datanode.data.dir属性配置的。
因为每次格式化,默认是创建一个集群ID,并写入NameNode和DataNode的VERSION文件中(VERSION文件所在目录为dfs/name/current 和 dfs/data/current),重新格式化时,默认会生成一个新的集群ID,如果不删除原来的目录,会导致namenode中的VERSION文件中是新的集群ID,而DataNode中是旧的集群ID,不一致时会报错。
另一种方法是格式化时指定集群ID参数,指定为旧的集群ID。
启动集群
[root@hadoop01 sbin]# /opt/modules/app/hadoop/sbin/start-dfs.sh
18/09/11 07:07:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hadoop01]
hadoop01: starting namenode, logging to /opt/modules/app/hadoop/logs/hadoop-root-namenode-hadoop01.out
hadoop03: starting datanode, logging to /opt/modules/app/hadoop/logs/hadoop-root-datanode-hadoop03.out
hadoop02: starting datanode, logging to /opt/modules/app/hadoop/logs/hadoop-root-datanode-hadoop02.out
hadoop01: starting datanode, logging to /opt/modules/app/hadoop/logs/hadoop-root-datanode-hadoop01.out
Starting secondary namenodes [hadoop03]
hadoop03: starting secondarynamenode, logging to /opt/modules/app/hadoop/logs/hadoop-root-secondarynamenode-hadoop03.out
18/09/11 07:07:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@hadoop01 sbin]#
[root@hadoop01 sbin]# jps
3185 Jps
2849 NameNode
2974 DataNode
[root@hadoop02 ~]# jps
2305 Jps
2227 DataNode
[root@hadoop03 ~]# jps
2390 Jps
2312 SecondaryNameNode
2217 DataNode
启动yarn
[root@hadoop01 sbin]# /opt/modules/app/hadoop/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-resourcemanager-hadoop01.out
hadoop02: starting nodemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-nodemanager-hadoop02.out
hadoop03: starting nodemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-nodemanager-hadoop03.out
hadoop01: starting nodemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-nodemanager-hadoop01.out
[root@hadoop01 sbin]# jps
3473 Jps
3329 NodeManager
2849 NameNode
2974 DataNode
[root@hadoop01 sbin]#
[root@hadoop02 ~]# jps
2337 NodeManager
2227 DataNode
2456 Jps
[root@hadoop02 ~]#
[root@hadoop03 ~]# jps
2547 Jps
2312 SecondaryNameNode
2217 DataNode
2428 NodeManager
[root@hadoop03 ~]#
在hadoop02上启动ResourceManager:
[root@hadoop02 ~]# /opt/modules/app/hadoop/sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-resourcemanager-hadoop02.out
[root@hadoop02 ~]# jps
2337 NodeManager
2227 DataNode
2708 Jps
2484 ResourceManager
[root@hadoop02 ~]#
启动日志服务器
因为我们规划的是在hadoop03服务器上运行MapReduce日志服务,所以要在hadoop03上启动。
[root@hadoop03 ~]# /opt/modules/app/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /opt/modules/app/hadoop/logs/mapred-root-historyserver-hadoop03.out
[root@hadoop03 ~]# jps
2312 SecondaryNameNode
2217 DataNode
2602 JobHistoryServer
2428 NodeManager
2639 Jps
[root@hadoop03 ~]#
配置windows里面的host
查看HDFS Web页面
hadoop01:50070
查看YARN Web 页面
hadoop02:8088
测试Job
我们这里用hadoop自带的wordcount例子来在本地模式下测试跑mapreduce。
1、 准备mapreduce输入文件wc.input
[hadoop@bigdata-senior01 modules]$ cat /opt/data/wc.input
hadoop mapreduce hive
hbase spark storm
sqoop hadoop hive
spark hadoop
2、 在HDFS创建输入目录input
[hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/hdfs dfs -mkdir /input
3、 将wc.input上传到HDFS
[hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/hdfs dfs -put /opt/data/wc.input /input/wc.input
4、 运行hadoop自带的mapreduce Demo
[hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /input/wc.input /output
5、 查看输出文件
[hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/hdfs dfs -ls /output
Found 2 items
-rw-r--r-- 3 hadoop supergroup 0 2016-07-14 16:36 /output/_SUCCESS
-rw-r--r-- 3 hadoop supergroup 60 2016-07-14 16:36 /output/part-r-00000
网友评论