美文网首页
Hadoop 在 CentOS 7.x 操作系统中的安装、配置及

Hadoop 在 CentOS 7.x 操作系统中的安装、配置及

作者: 又语 | 来源:发表于2019-11-29 10:08 被阅读0次

    本文介绍在 CentOS 7.x 操作系统上安装 Hadoop 的方法与过程。


    目录

    • 版本说明
    • 安装
    • 配置
    • 启动运行
    • 停止运行

    版本说明

    • CentOS Linux release 7.6
    • Hadoop 3.2.1
    • JDK 8

    安装

    1. 因为 Hadoop 依赖 Java 环境运行,所以首先需要下载、安装和配置 JDK,参考:CentOS 7.x 重装 JDK
      注意:JDK 版本选择请参考 Hadoop Java Versions

    2. 安装 SSH,运行 SSHD
      说明:如果要使用可选的启动和停止脚本,必须安装 SSH 并运行 SSHD 来使用管理远程 Hadoop 守护进程的 Hadoop 脚本。此外,建议还安装 PDSH,以便更好地管理 SSH 资源。

    3. 下载 Hadoop,本示例使用 3.2.1 版本,下载文件是 hadoop-3.2.1.tar.gz

    4. 将下载文件拷贝到 CentOS 服务器特定目录下,如 /opt

    5. 执行解压命令 tar -zxvf hadoop-3.2.1.tar.gz


    配置

    默认情况下,Hadoop 被配置为以单个 Java 进程形式运行在非分布式模式下。Hadoop 也可以以伪分布式模式运行在单个节点上,其中每个 Hadoop 守护进程都运行在单独的 Java 进程中。

    1. 修改 %HADOOP_HOME%/etc/hadoop/hadoop-env.sh,添加 JAVA_HOME 配置。
    export JAVA_HOME=/opt/java/jre1.8.0_222/
    
    1. 修改 %HADOOP_HOME%/etc/hadoop/core-site.xml 如下。
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://127.0.0.1:9000</value>
      </property>
    </configuration>
    
    1. 修改 %HADOOP_HOME%/etc/hadoop/hdfs-site.xml 如下。
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property>
      <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/hadoop/data/tmp</value>
      </property>
    </configuration>
    

    启动运行

    1. 第一次启动之前执行系统格式化。
    [root@... hadoop-3.2.1]# bin/hdfs namenode -format
    WARNING: /opt/hadoop/hadoop-3.2.1/logs does not exist. Creating.
    2020-04-27 13:56:52,470 INFO namenode.NameNode: STARTUP_MSG: 
    /************************************************************
    STARTUP_MSG: Starting NameNode
    ...
    2020-04-27 13:56:53,611 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
    2020-04-27 13:56:53,611 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at ......
    ************************************************************/
    
    1. 执行 sbin/start-dfs.sh 命令启动 DFS,出现以下错误。
    [root@... hadoop-3.2.1]# sbin/start-dfs.sh
    Starting namenodes on [localhost]
    ERROR: Attempting to operate on hdfs namenode as root
    ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
    Starting datanodes
    ERROR: Attempting to operate on hdfs datanode as root
    ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
    Starting secondary namenodes [...]
    ERROR: Attempting to operate on hdfs secondarynamenode as root
    ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
    

    解决方法:

    • 修改 sbin/start-dfs.shsbin/stop-dfs.sh 脚本,在顶部添加以下内容:
    ...
    # limitations under the License.
    
    HDFS_DATANODE_USER=root
    HADOOP_SECURE_DN_USER=hdfs
    HDFS_NAMENODE_USER=root
    HDFS_SECONDARYNAMENODE_USER=root
    
    # Start hadoop dfs daemons.
    ...
    
    • 修改 sbin/start-yarn.shsbin/stop-yarn.sh 脚本,在顶部添加以下内容:
    ...
    # limitations under the License.
    
    YARN_RESOURCEMANAGER_USER=root
    HADOOP_SECURE_DN_USER=yarn
    YARN_NODEMANAGER_USER=root
    
    ## @description  usage info
    ...
    
    1. 再次执行 sbin/start-dfs.sh 命令启动 DFS,显示 Permission denied 告警。
    [root@... hadoop-3.2.1]# sbin/start-dfs.sh
    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
    Starting namenodes on [localhost]
    Last login: Mon Apr 27 15:03:13 CST 2020 on pts/0
    localhost: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
    Starting datanodes
    Last login: Mon Apr 27 15:05:46 CST 2020 on pts/0
    localhost: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
    Starting secondary namenodes [ctup000105163]
    Last login: Mon Apr 27 15:05:46 CST 2020 on pts/0
    ...: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
    

    原因是 ssh 下文件有访问权限限制,需要设置 ssh 免密登录协议。
    解决方法:

    • 输入 ssh-keygen -t rsa 命令生成密钥
    • 拷贝密钥到指定目录
    [root@... hadoop-3.2.1]# ssh-keygen -t rsa
    Generating public/private rsa key pair.
    Enter file in which to save the key (/root/.ssh/id_rsa): 
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
    Your identification has been saved in /root/.ssh/id_rsa.
    Your public key has been saved in /root/.ssh/id_rsa.pub.
    The key fingerprint is:
    SHA256:dP3fm4pW7Zs59c6R7ppSon2ShtaEuuPOiQuub5cjsc4 root@ctup000105163
    The key's randomart image is:
    +---[RSA 2048]----+
    |                 |
    |           .     |
    |        . . .    |
    |       . .   .   |
    |        S .   o  |
    |   .     . o o o+|
    |   .o . . * = .o=|
    |  o+.+oo.+ O..o+B|
    | .=Eoo=*+ o.+++X=|
    +----[SHA256]-----+
    [root@... hadoop-3.2.1]# cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
    
    1. 再次执行 sbin/start-dfs.sh 命令启动 DFS,成功。
    [root@... hadoop-3.2.1]# sbin/start-dfs.sh
    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
    Starting namenodes on [localhost]
    Last login: Mon Apr 27 15:11:31 CST 2020 from 10.62.58.79 on pts/2
    Starting datanodes
    Last login: Mon Apr 27 15:11:56 CST 2020 on pts/0
    Starting secondary namenodes [ctup000105163]
    Last login: Mon Apr 27 15:11:58 CST 2020 on pts/0
    

    停止运行

    [root@... hadoop-3.2.1]# sbin/stop-dfs.sh
    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
    Stopping namenodes on [localhost]
    Last login: Mon Apr 27 15:12:03 CST 2020 on pts/0
    Stopping datanodes
    Last login: Mon Apr 27 15:15:33 CST 2020 on pts/0
    Stopping secondary namenodes [ctup000105163]
    Last login: Mon Apr 27 15:15:35 CST 2020 on pts/0
    

    相关文章

      网友评论

          本文标题:Hadoop 在 CentOS 7.x 操作系统中的安装、配置及

          本文链接:https://www.haomeiwen.com/subject/gyywwctx.html