美文网首页
开发环境搭建 | 总结hadoop无法启动

开发环境搭建 | 总结hadoop无法启动

作者: Ricsy | 来源:发表于2023-08-29 17:38 被阅读0次


    一、背景介绍

    在单台 CentOS 7.4 系统的服务器上安装 hadoop,用作平时开发代码的测试环境,其中安装包含 hadoop-3.2.2.tar.gz、spark-3.0.1-bin-hadoop3.2.tgz。

    二、遇到问题

    问题1:进入hadoop目录执行 sbin/start-all.sh 时报错:localhost Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).

    做本机信任

    ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
    cat ~/.ssh/id_rsa.pub  >> ~/.ssh/authorized_keys
    

    问题2:进入hadoop目录执行 sbin/start-all.sh 时报错:ERROR: JAVA_HOME is not set and could not be found.

    修改 hadoop/etc/hadoop/hadoop-env.sh 脚本文件内容,可参考:
    export JAVA_HOME=/opt/software/jdk/java

    三、我的总结

    以hadoop安装到 /opt/software/big_data_env 且source_hadoop_env.sh放置于hadoop目录中为例:

    1、安装前

    • 完成相关依赖安装(java等)、环境设置(本机信任等)
    • 完成core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml等4个配置文件的配置,提前避免一些安装问题
    • 编写环境变量加载脚本,如:source_hadoop_env.sh

    core-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
         <name>fs.defaultFS</name>
         <value>hdfs://localhost:19000</value>
      </property>
    </configuration>
    

    hdfs-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
         <name>dfs.replication</name>
         <value>1</value>
      </property>
      <property>
         <name>dfs.namenode.http-address</name>
         <value>0.0.0.0:19870</value>
         <description>
          The address and the base port where the dfs namenode web ui wiil listen on.
        </description>
      </property>
      <property>
         <name>dfs.datanode.data.dir</name>
         <value>file:///opt/software/dfs/data</value>
      </property>
    </configuration>
    

    yarn-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
      </property>
      <property>
         <name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
         <value>true</value>
      </property>
      <property>
         <name>yarn.nodemanager.resource.memory-mb</name>
         <value>-1</value>
      </property>
      <property>
         <name>yarn.scheduler.maximum-allocation-mb</name>
         <value>2048</value>
      </property>
      <property>
         <name>yarn.nodemanager.resource.cpu-vcores</name>
         <value>-1</value>
      </property>
      <property>
         <name>yarn.nodemanager.env-whitelist</name>
         <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
      </property>
      <property>
         <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
         <value>98.5</value>
      </property>
      <property>
         <name>yarn.nodemanager.webapp.address</name>
         <value>${yarn.resourcemanager.hostname}:18088</value>
        <description>
          The http address of the RM web application.
          IF onlt a host is provided as the value,
          the webapp will be served on a random port.
        </description>
      </property>
    </configuration>
    

    mapred-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
         <name>mapreduce.framework.name</name>
         <value>yarn</value>
      </property>
      <property>
         <name>mapreduce.application.classpath</name>
         <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
      </property>
    </configuration>
    

    source_hadoop_env.sh

    #!/usr/bin/bash
    export JAVA_HOME=/opt/software/jdk/java
    export HADOOP_HOME=/opt/software/big_data_env/hadoop
    export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}:${HADOOP_HOME}/bin:$PATH
    
    export HDFS_NAMENODE_USER="root"
    export HDFS_DATANODE_USER="root"
    export HDFS_SECONDARYNAMENODE_USER="root"
    export YARN_RESOURCEMANAGER_USER="root"
    export YARN_NODEMANAGER_USER="root"
    
    export HADOOP_CONF_DIR=/opt/software/big_data_env/hadoop/etc/hadoop
    

    2、安装中

    • 执行 sbin/start-all.sh 前,加载环境变量(使用自动加载方法)并格式化namenode
    sed -i '$a\source /opt/software/big_data_env/hadoop/source_hadoop_env.sh' ~/.bash_profile
    source ~/.bash_profile
    hadoop namenode -format
    sbin/start-all.sh
    

    3、安装后

    • 查看进程是否存在(包含NameNode、DataNode、ResourceManager、SecondaryNameNode、NodeManager)
      jps
    • 查看 /opt/software/hadoop/logs 中各个日志打印内容是否有报错,其中如:
      tail -5000f /opt/software/big_data_env/hadoop/logs/hadoop-root-datanode-*.log
    • 查看页面是否可以正常访问
      1)hdfs namenode web interface:htttp://{主机地址}:19870
      2)hdfs file system: htttp://{主机地址}:19870/explorer.html
      3)yarn resourcemanager web interface: htttp://{主机地址}:18088/cluster/scheduler
      备注:从3.0开始 hdfs file system 可以直接上传下载数据文件.

    相关文章

      网友评论

          本文标题:开发环境搭建 | 总结hadoop无法启动

          本文链接:https://www.haomeiwen.com/subject/ekznmdtx.html