美文网首页
Hadoop 在 Windows 操作系统中的安装、配置及运行

Hadoop 在 Windows 操作系统中的安装、配置及运行

作者: 又语 | 来源:发表于2020-04-26 17:08 被阅读0次

    本文介绍在 Windows 操作系统上安装、配置及运行 Hadoop 的方法与过程。


    目录

    • 版本说明
    • 安装
    • 配置
      • 配置环境变量
      • 配置 Hadoop 集群
    • 启动运行
    • Web UI
    • 停止运行

    版本说明

    • Windows 10
    • Hadoop 3.2.1

    安装

    1. Hadoop 3 依赖于 JDK 8 以上版本,所以首先需要下载、安装和配置 JDK(安装和配置过程略)。

    2. 下载 Hadoop,本示例下载文件为 hadoop-3.2.1.tar.gz

    3. 解压缩到指定安装目录。注意:解压缩分为两步:首先,解压缩 .tar.gz 文件;然后,解压缩 .tar 文件。如果 Windows 10 解压缩 .tar.gz 文件出现问题,可以先将 .tar.gz 格式转换为 zip。(转换方法略)

    4. 下载 https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.1/bin 放入 Hadoop 安装根目录的 bin 文件夹下。

    5. 创建一个文件夹 hadoop-env 存放下载的文件。


    配置

    配置环境变量
    1. 添加系统环境变量 HADOOP_HOME 指向 Hadoop 安装根目录。

    2. 修改系统环境变量 Path,添加 %HADOOP_HOME%\bin

    3. 输入 cmd 打开 Windows 命令提示符,输入命令 hadoop version 查看版本。

    C:\Users\...>hadoop version
    Hadoop 3.2.1
    Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
    Compiled by rohithsharmaks on 2019-09-10T15:56Z
    Compiled with protoc 2.5.0
    From source with checksum 776eaf9eee9c0ffc370bcbc1888737
    This command was run using /D:/Dev/Hadoop/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar
    
    配置 Hadoop 集群

    需要修改 4 个文件:

    • %HADOOP_HOME%\etc\hadoop\core-site.xml
    • %HADOOP_HOME%\etc\hadoop\mapred-site.xml
    • %HADOOP_HOME%\etc\hadoop\hdfs-site.xml
    • %HADOOP_HOME%\etc\hadoop\yarn-site.xml
    1. 修改 \etc\hadoop\core-site.xml,添加 fs.defaultFS 属性配置。
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:9000</value>
        </property>
    </configuration>
    
    1. 修改 \etc\hadoop\mapred-site.xml,添加 mapreduce.framework.name 属性配置。
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
           <name>mapreduce.framework.name</name>
           <value>yarn</value>
       </property>
    </configuration>
    
    1. 修改 \etc\hadoop\hdfs-site.xml,添加 dfs.replicationdfs.namenode.name.dirdfs.datanode.data.dir 属性。
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:///D:/Dev/Hadoop/hadoop-env/data/namenode</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:///D:/Dev/Hadoop/hadoop-env/data/datanode</value>
        </property>
    </configuration>
    
    1. 修改 \etc\hadoop\yarn-site.xml,添加 yarn.nodemanager.aux-servicesyarn.nodemanager.aux-services.mapreduce.shuffle.class 属性。
    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
    </configuration>
    
    1. 执行命令 hdfs namenode -format 格式化 Name Node,因为 Hadoop 3.2.1 版本 Bug 导致会出现以下异常:
    ...
    2020-04-26 17:46:55,871 ERROR namenode.NameNode: Failed to start namenode.
    java.lang.UnsupportedOperationException
            at java.nio.file.Files.setPosixFilePermissions(Files.java:2044)
            at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:452)
            at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:591)
            at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:613)
            at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:188)
            at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1206)
            at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1649)
            at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1759)
    2020-04-26 17:46:55,877 INFO util.ExitUtil: Exiting with status 1: java.lang.UnsupportedOperationException
    2020-04-26 17:46:55,881 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at CTUY7JWX6208621/10.62.58.79
    ************************************************************/
    

    解决方法:

    • https://github.com/FahaoTang/big-data 上下载 hadoop-hdfs-3.2.1.jar
    • \share\hadoop\hdfshadoop-hdfs-3.2.1.jar 重命名为 hadoop-hdfs-3.2.1.jar.bak
    • 将刚才下载的 hadoop-hdfs-3.2.1.jar 拷贝到 \share\hadoop\hdfs 目录下。

    再次执行格式化 Name Node 的命令 hdfs namenode -format,成功。

    ...
    2020-04-26 18:04:31,888 INFO namenode.FSImageFormatProtobuf: Image file D:\Dev\Hadoop\hadoop-env\data\namenode\current\fsimage.ckpt_0000000000000000000 of size 404 bytes saved in 0 seconds .
    2020-04-26 18:04:31,904 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
    2020-04-26 18:04:31,920 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
    2020-04-26 18:04:31,920 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at CTUY7JWX6208621/10.62.58.79
    ************************************************************/
    

    启动运行

    1. 进入 %HADOOP_HOME%\sbin 目录,执行 start-dfs.cmd 命令,会打开两个窗口,一个是名称节点(Name Node),一个是数据节点(Data Node)。

    2. 进入 %HADOOP_HOME%\sbin 目录,执行 start-yarn.cmd 命令启动 Hadoop Yarn 服务。同样会打开两个窗口,一个是资源管理器(resource manager),一个是节点管理器(node manager)。

    D:\Dev\Hadoop\hadoop-3.2.1\sbin>start-yarn.cmd
    starting yarn daemons
    
    1. 执行 jps 命令检查是否所有服务都已成功启动。
    D:\Dev\Hadoop\hadoop-3.2.1\sbin>jps
    13140 DataNode
    16596 NameNode
    9956 Jps
    10712 ResourceManager
    11864
    1132 NodeManager
    

    除以上启动方法外,还可以使用 %HADOOP_HOME%\sbin 目录下的 start-all.cmd 一次性全部启动(已不建议使用)。


    Web UI

    Hadoop 提供了三个用户 Web 界面:

    1. 名称节点 Web 页:http://localhost:9870/dfshealth.html#tab-overview
    2. 数据节点 Web 页:http://localhost:9864/datanode.html
    3. YARN Web 页:http://localhost:8088/cluster

    停止运行

    %HADOOP_HOME%\sbin 目录下的 stop-all.cmd 一次性全部停止。

    相关文章

      网友评论

          本文标题:Hadoop 在 Windows 操作系统中的安装、配置及运行

          本文链接:https://www.haomeiwen.com/subject/qihjwhtx.html