本文介绍在 Windows 操作系统上安装、配置及运行 Hadoop 的方法与过程。
目录
- 版本说明
- 安装
- 配置
- 配置环境变量
- 配置 Hadoop 集群
- 启动运行
- Web UI
- 停止运行
版本说明
- Windows 10
- Hadoop 3.2.1
安装
-
Hadoop 3 依赖于 JDK 8 以上版本,所以首先需要下载、安装和配置 JDK(安装和配置过程略)。
-
下载 Hadoop,本示例下载文件为
hadoop-3.2.1.tar.gz
。 -
解压缩到指定安装目录。注意:解压缩分为两步:首先,解压缩
.tar.gz
文件;然后,解压缩.tar
文件。如果 Windows 10 解压缩.tar.gz
文件出现问题,可以先将.tar.gz
格式转换为zip
。(转换方法略) -
下载 https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.1/bin 放入 Hadoop 安装根目录的
bin
文件夹下。 -
创建一个文件夹
hadoop-env
存放下载的文件。
配置
配置环境变量
-
添加系统环境变量
HADOOP_HOME
指向 Hadoop 安装根目录。 -
修改系统环境变量
Path
,添加%HADOOP_HOME%\bin
。 -
输入
cmd
打开 Windows 命令提示符,输入命令hadoop version
查看版本。
C:\Users\...>hadoop version
Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /D:/Dev/Hadoop/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar
配置 Hadoop 集群
需要修改 4 个文件:
%HADOOP_HOME%\etc\hadoop\core-site.xml
%HADOOP_HOME%\etc\hadoop\mapred-site.xml
%HADOOP_HOME%\etc\hadoop\hdfs-site.xml
%HADOOP_HOME%\etc\hadoop\yarn-site.xml
- 修改
\etc\hadoop\core-site.xml
,添加fs.defaultFS
属性配置。
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
- 修改
\etc\hadoop\mapred-site.xml
,添加mapreduce.framework.name
属性配置。
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- 修改
\etc\hadoop\hdfs-site.xml
,添加dfs.replication
、dfs.namenode.name.dir
和dfs.datanode.data.dir
属性。
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///D:/Dev/Hadoop/hadoop-env/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///D:/Dev/Hadoop/hadoop-env/data/datanode</value>
</property>
</configuration>
- 修改
\etc\hadoop\yarn-site.xml
,添加yarn.nodemanager.aux-services
和yarn.nodemanager.aux-services.mapreduce.shuffle.class
属性。
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
- 执行命令
hdfs namenode -format
格式化 Name Node,因为 Hadoop 3.2.1 版本 Bug 导致会出现以下异常:
...
2020-04-26 17:46:55,871 ERROR namenode.NameNode: Failed to start namenode.
java.lang.UnsupportedOperationException
at java.nio.file.Files.setPosixFilePermissions(Files.java:2044)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:452)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:591)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:613)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:188)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1206)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1649)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1759)
2020-04-26 17:46:55,877 INFO util.ExitUtil: Exiting with status 1: java.lang.UnsupportedOperationException
2020-04-26 17:46:55,881 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at CTUY7JWX6208621/10.62.58.79
************************************************************/
解决方法:
- 在 https://github.com/FahaoTang/big-data 上下载
hadoop-hdfs-3.2.1.jar
; - 将
\share\hadoop\hdfs
下hadoop-hdfs-3.2.1.jar
重命名为hadoop-hdfs-3.2.1.jar.bak
; - 将刚才下载的
hadoop-hdfs-3.2.1.jar
拷贝到\share\hadoop\hdfs
目录下。
再次执行格式化 Name Node 的命令 hdfs namenode -format
,成功。
...
2020-04-26 18:04:31,888 INFO namenode.FSImageFormatProtobuf: Image file D:\Dev\Hadoop\hadoop-env\data\namenode\current\fsimage.ckpt_0000000000000000000 of size 404 bytes saved in 0 seconds .
2020-04-26 18:04:31,904 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2020-04-26 18:04:31,920 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2020-04-26 18:04:31,920 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at CTUY7JWX6208621/10.62.58.79
************************************************************/
启动运行
-
进入
%HADOOP_HOME%\sbin
目录,执行start-dfs.cmd
命令,会打开两个窗口,一个是名称节点(Name Node),一个是数据节点(Data Node)。 -
进入
%HADOOP_HOME%\sbin
目录,执行start-yarn.cmd
命令启动 Hadoop Yarn 服务。同样会打开两个窗口,一个是资源管理器(resource manager),一个是节点管理器(node manager)。
D:\Dev\Hadoop\hadoop-3.2.1\sbin>start-yarn.cmd
starting yarn daemons
- 执行
jps
命令检查是否所有服务都已成功启动。
D:\Dev\Hadoop\hadoop-3.2.1\sbin>jps
13140 DataNode
16596 NameNode
9956 Jps
10712 ResourceManager
11864
1132 NodeManager
除以上启动方法外,还可以使用 %HADOOP_HOME%\sbin
目录下的 start-all.cmd
一次性全部启动(已不建议使用)。
Web UI
Hadoop 提供了三个用户 Web 界面:
- 名称节点 Web 页:http://localhost:9870/dfshealth.html#tab-overview
- 数据节点 Web 页:http://localhost:9864/datanode.html
- YARN Web 页:http://localhost:8088/cluster
停止运行
%HADOOP_HOME%\sbin
目录下的 stop-all.cmd
一次性全部停止。
网友评论