美文网首页
Hadoop-Spark集群安装

Hadoop-Spark集群安装

作者: 奔跑地蜗牛 | 来源:发表于2019-06-01 22:36 被阅读0次

    注意事项

    1. Hadoop是根据%JAVA_HOME%,%HADOOP_HOME%来确定位置的,所以需要在环境变量中设置这两个值,如下图:


      图片.png
    2. Hadoop路径设置好了,需要配置四个配置未见:core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xml,同时这些文件编码格式需要为UTF-8,否则在执行hdfs namenode -format 指令时会出现Invalid UTF-8报错

    图片.png

    3.ResourceManager启动报错“UnResolvedAddress”,估计是yarn-site.xml中的文件配置的yarn.resourcemanager.hostname有问题,hostname配置为主机名称即可,如下图:


    图片.png

    4."there is no HDFS_NAMENODE_USER defined. Aborting operation."问题
    将start-dfs.sh,stop-dfs.sh两个文件顶部添加以下参数

    HDFS_NAMENODE_USER=root
    HDFS_DATANODE_USER=root
    HDFS_SECONDARYNAMENODE_USER=root
    YARN_RESOURCEMANAGER_USER=root
    YARN_NODEMANAGER_USER=root
    

    start-yarn.sh,stop-yarn.sh顶部也需添加以下

    YARN_RESOURCEMANAGER_USER=root
    HADOOP_SECURE_DN_USER=yarn
    YARN_NODEMANAGER_USER=root
    

    5.免密登录关键在于各个节点用户名必须一样,否则以当前用户名去启动其他节点是不会成功的;

    6.添加免密登录后,如果报错localhost Permission Denied,那是因为主节点没有把id_rsa.pub添加到authorized_keys,如下图添加:


    图片.png

    命令为:

    cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys

    免密登录关键三条命令:

      1. ssh-keygen -t rsa
    • 2 scp id_rsa.pub 从节点ip@从节点主机名:.../dirForRsa(文件夹名称)
    • 3 cat .../dirForRsa/id_rsa.pub>>~/.ssh/authorized_keys

    7.如果启动start_dfs.sh出现错误“localhost: ERROR: Cannot set priority of datanode process 130126”,localhost说明它是在自身节点设置datanode,那是因为hadoop/etc/works配置中默认从节点是localhost,改为指定的从节点即可如下图:


    图片.png

    启动成功,则会出现如下提示:


    图片.png

    jps查看Master:


    图片.png
    jps查看Node0:
    图片.png

    jps查看Node1:


    图片.png

    总结

    Hadoop集群搭建:

      1. 环境变量配置/etc/profile如下配置
    # /etc/profile: system-wide .profile file for the Bourne shell (sh(1))
    # and Bourne compatible shells (bash(1), ksh(1), ash(1), ...).
    export PATH="$PATH:/snap/bin"
    export JAVA_HOME=/opt/jdk
    export JRE_HOME=$JAVA_HOME/jre
    export CLASSPATH=.:$JAVA_HOME/lib
    export PATH=$PATH:$JAVA_HOME/bin
    export HADOOP_HOME=/opt/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    
    if [ "${PS1-}" ]; then
    

    -2 . 配置hadoop集群四大配置文件:core-site.xml,hdfs-site.xml,mapred-site.xml和yarn-site.xml;
    其中需要注意的是,如果要配置secondarynamenode,需要在hdfs-site.xml和core-site.xml里面进行相应配置;
    各配置文件参考如下:
    core-site.xml:

    <configuration>
            <!--指定namenode的地址-->
        <property>
                    <name>fs.defaultFS</name>
                    <value>hdfs://192.168.1.113:9000</value>
        </property>
        <!--用来指定使用hadoop时产生文件的存放目录-->
        <property>
                 <name>hadoop.tmp.dir</name>
                 <value>/opt/hadoop-3.1.2/tmp</value> 
        </property>
        <property>
            <!--用来设置检查点备份日志的最长时间-->
            <name>fs.checkpoint.period</name> 
            <value>3600</value>
        </property>
    
     </configuration>
    
    

    hdfs-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <!--指定hdfs保存数据的副本数量-->
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
         <property>
                <name>dfs.nameservices</name>
                <value>hadoop-cluster</value>
         </property>
        <!--指定hdfs中namenode的存储位置-->
        <property>
                 <name>dfs.namenode.name.dir</name> 
                 <value>file:///data/hadoop/hdfs/namenode</value>
        </property>
    
        <!--指定hdfs中datanode的存储位置-->
        <property>
                 <name>dfs.namenode.checkpoint.dir</name>
                 <value>file:///data/hadoop/hdfs/secnamenode</value>
        </property>
        <property>
                <name>dfs.namenode.checkpoint.edits.dir</name>
                <value>file:///data/hadoop/hdfs/secnamenode</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:///data/hadoop/hdfs/datanode</value>
        </property>
        <!---指定web 监控的地址,和secondary运行地址-->
        <property>
              <name>dfs.namenode.http-address</name> 
              <value>master:50070</value>
        </property>
       <property>
              <name>dfs.namenode.secondary.http-address</name> 
              <value>node1:50090</value>
        </property>
    </configuration>
    
    

    mapred-site.xml:

    <configuration>
    
    <!--告诉hadoop以后MR(Map/Reduce)运行在YARN上-->
            <property>
                  <name>mapreduce.framework.name</name>
                  <value>yarn</value>
           </property>
    </configuration>
    

    yarn-site.xml:

    <configuration>
        <!--nomenodeManager获取数据的方式是shuffle-->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <!--指定Yarn的老大(ResourceManager)的地址-->     
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>master</value>
        </property> 
        <!--Yarn打印工作日志-->    
        <property>    
            <name>yarn.log-aggregation-enable</name> 
            <value>true</value>    
        </property>
        <property>
                <name>yarn.nodemanager.local-dirs</name>
                <value>file:///data/hadoop/yarn/namenode</value>
        </property>
    
    </configuration>
    
      1. 设置workers或者slaves
      1. master,slave节点设置/etc/hosts,如下:


        图片.png
    • 5 集群设置免密登录

    利用ssh-keygen -t rsa生成密钥,在通过scp传送pub密钥给slave,然后将pub密钥添加到authorized_keys中

      1. 在start-dfs.sh,stop-dfs.sh,start-yarn.sh,stop-yarn.sh等sh脚本头部添加相应的参数,参照注意事项4
      1. 利用hdfs namenode -format格式化
    • 8.运行start-dfs.sh,start-yarn.sh,成功运行

    Hadoop+Spark集群搭建

    • 1.搭建好hadoop
      1. 下载scala和spark安装包
    • 3.修改环境变量,添加scala和spark路径,/etc/profile配置如下:
    export PATH="$PATH:/snap/bin"
    export JAVA_HOME=/opt/jdk
    export JRE_HOME=$JAVA_HOME/jre
    export CLASSPATH=.:$JAVA_HOME/lib
    export PATH=$PATH:$JAVA_HOME/bin
    export HADOOP_HOME=/opt/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    export SCALA_HOME=/opt/scala
    export PATH=$PATH:$SCALA_HOME/bin
    export SPARK_HOME=/opt/spark
    export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
    
    
    • 4.修改spark/conf/spark-env.sh中的配置,如下:
    #!/usr/bin/env bash
    export JAVA_HOME=/opt/jdk
    export SCALA_HOME=/opt/scala
    export HADOOP_HOME=/opt/hadoop
    export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
    export SPARK_MASTER_IP=192.168.1.113
    export SPARK_MASTER_HOST=192.168.1.113
    export SPARK_LOCAL_IP=192.168.1.112
    export SPARK_WORKER_MEMORY=512m
    export SPARK_WORKER_CORES=2
    export SPARK_HOME=/opt/spark
    export SPARK_DIST_CLASSPATH=$(/opt/hadoop/bin/hadoop classpath)
    
    
      1. 修改spark/conf/slaves配置,如下:
    master
    node0
    node1
    
    • 6 .使用tar -zcvf spark.tar.gz spark将spark文件目录打包,然后使用scp将/etc/profile和该压缩包发送给slave节点,slave节点需要将spark/conf/SPARK_LOCAL_IP同步成成自身IP,同时将profile文件复制到/etc/,执行source /etc/profile,使得环境变量生效

    • 7.启动spark/sbin/start-master.sh

    相关文章

      网友评论

          本文标题:Hadoop-Spark集群安装

          本文链接:https://www.haomeiwen.com/subject/ckzcaqtx.html