美文网首页
Linux环境安装Hadoop集群

Linux环境安装Hadoop集群

作者: 年少时难免轻狂Ho | 来源:发表于2018-12-25 11:05 被阅读0次

    准备工作

    安装JDK,自行百度一下
    配置免秘登录,查看另外一篇文章https://www.jianshu.com/p/fa06f3d77094
    安装Zookeeper,查看另外一篇文章https://www.jianshu.com/p/d6967310777c

    1、下载Hadoop安装包

    apache版本
    https://hadoop.apache.org/releases.html
    cdh版本
    http://archive.cloudera.com/cdh5/cdh/5/

    2、配置文件说明

    文件名 格式 功能描述
    hadoop-env.sh Bash脚本 Hadoop运行环境变量设置
    core-site.xml xml 配置Hadoop core,如IO
    hdfs-site.xml xml 配置HDFS守护进程:NN、JN、DN
    yarn-env.sh Bash脚本 Yarn运行环境变量设置
    yarn-site.xml xml Yarn框架配置环境
    mapred-site.xml xml MR属性设置
    capacity-scheduler.xml xml Yarn调度属性设置
    container-executor.cfg Cfg Yarn Container配置
    mapred-queues.xml xml MR队列设置
    hadoop-metrics.properties Java属性 Hadoop Metrics配置
    hadoop-metrics2.properties Java属性 Hadoop Metrics配置
    slaves PlainText DN节点配置
    exclude PlainText 移除DN节点配置文件
    log4j.properties 系统日志设置

    3、hadoop-env.sh配置

    #Java环境变量
    export JAVA_HOME=~/jdk1.8.0_101
    #Hadoop配置文件路径
    export HADOOP_CONF_DIR=~/hadoop-2.5.0-cdh5.2.1-och4.0.1/etc/hadoop
    #Hadoop环境变量
    export HADOOP_HOME=~/hadoop-2.5.0-cdh5.2.1-och4.0.1
    #进程id路径
    export HADOOP_PID_DIR=~/data/hadoop/pids
    # hadoop为各个守护进程
    #【namenode,secondarynamenode,jobtracker,datanode,tasktracker】
    # 统一分配的内存在hadoop-env.sh中设置
    export HADOOP_HEAPSIZE=8192
    # NameNode内存(系统内存足够时,设置成16384M)
    export HADOOP_NAMENODE_OPTS="-Xmx4096m –Xms4096m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
    # DataNode内存(系统内存足够时,设置成2-4G)
    export HADOOP_DATANODE_OPTS="-Xmx3072m –Xms3072m -Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
    # secondrynamenode的内存,与NameNode保持一致
    export HADOOP_SECONDARYNAMENODE_OPTS="-Xmx4096m –Xms4096m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"
    # 调整客户端操作时的内存
    export HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS"
    # 配置hadoop相关日志
    export HADOOP_LOGFILE=hadoop-${HADOOP_IDENT_STRING}-${command}-${HOSTNAME}.log
    export HADOOP_ROOT_LOGGER=${HADOOP_ROOT_LOGGER:-"INFO,console"}
    export HADOOP_SECURITY_LOGGER=${HADOOP_SECURITY_LOGGER:-"WARN,RFAS"}
    export HDFS_AUDIT_LOGGER=${HDFS_AUDIT_LOGGER:-"WARN,NullAppender"}
    

    4、core-site.xml配置

    <!--默认端口是8020,但是由于其接收Client连接的RPC端口,所以如果在hdfs-site.xml中配置了RPC端口9000,所以fs.defaultFS端口变为9000-->
    <property>
        <name>fs.default.name</name>
        <value>hdfs://master:9000</value>
    </property>
    <!--注意修改此路径-->
    <property>
    <name>hadoop.tmp.dir</name>
        <value>/home/ocetl/data/hadoop/hadoop-${user.name}</value>
    </property>
    

    5、hdfs-site.xml配置

    <!--注意修改此路径-->
    <!--data存放路径-->
    <property>
      <name>dfs.datanode.data.dir</name>
      <value>/data1,/data2,/data3,/data4,/data5,/data6</value>
      <final>true</final>
    </property>
    <!--NameNode持久存储命名空间和事务日志的本地文件系统上的路径-->
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///home/ocetl/data/hadoop/hdfs/name</value>
        <final>true</final>
    </property>
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/home/ocetl/data/hadoop/journal</value>
    </property>
    <property>
        <name>dfs.hosts.exclude</name>
    <value>/home/ocetl/app/hadoop-2.5.0-cdh5.2.1-och4.0.1/etc/hadoop/excludes</value>
    </property>
    <!--注意修改主机名,einvoice243为NameNode主,einvoice244为NameNode备-->
    <property>
        <name>dfs.namenode.rpc-address.ocetl.nn1</name>
        <value>einvoice243:8030</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.ocetl.nn2</name>
        <value>einvoice244:8030</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.ocetl.nn1</name>
        <value>einvoice243:50082</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.ocetl.nn2</name>
        <value>einvoice244:50082</value>
    </property>
    <!--einvoice243/einvoice244/einvoice247均为ZooKeeper节点-->
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
     <value>qjournal://einvoice243:8488;einvoice244:8488;einvoice247:8488/ocetl</value>
    </property>
    <property>
    <name>ha.zookeeper.quorum</name>
     <value>einvoice243:21810,einvoice244:21810,einvoice247:21810</value>
    </property>
    

    6、yarn-site.xml配置

    <property>
        <name>yarn.resourcemanager.zk.state-store.address</name>
        <value>einvoice243:21810,einvoice244:21810,einvoice247:21810</value>
    </property>
    <property>
        <name>yarn.resourcemanager.zk-address</name>
     <value>einvoice243:21810,einvoice244:21810,einvoice247:21810</value>
    </property>
    <!--配置文件中einvoice243为resourcemanager主,einvoice244为resourcemanager备-->
    <!-- RM1 configs -->改为resourcemanager主机节点主机名
    <!-- RM2 configs -->改为resourcemanager备机节点主机名
    <!--检查其他配置项的主机名,注意修改-->
    <!-- RM1 configs -->
    <property>
        <name>yarn.resourcemanager.address.rm1</name>
        <value>einvoice243:23140</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address.rm1</name>
        <value>einvoice243:23130</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>einvoice243:23188</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
        <value>einvoice243:23125</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address.rm1</name>
        <value>einvoice243:23141</value>
    </property>
    <property>
        <name>yarn.resourcemanager.ha.admin.address.rm1</name>
        <value>einvoice243:23142</value>
    </property>
    <!-- RM2 configs -->
    <property>
        <name>yarn.resourcemanager.address.rm2</name>
        <value>einvoice244:23140</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address.rm2</name>
        <value>einvoice244:23130</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>einvoice244:23188</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
        <value>einvoice244:23125</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address.rm2</name>
        <value>einvoice244:23141</value>
    </property>
    <property>
        <name>yarn.resourcemanager.ha.admin.address.rm2</name>
        <value>einvoice244:23142</value>
    </property>
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>
    <property>
        <name>yarn.resourcemanager.ha.id</name>
        <!-- on rm1 set to rm1, on rm2 set to rm2 -->
        <value>rm1</value>    
    </property>
    <!-- 路径修改 -->
    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/data1/yarn/local,/data2/yarn/local,/data3/yarn/local,/data4/yarn/local,/data5/yarn/local,/data6/yarn/local</value>
    </property>
    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/data1/yarn/log,/data2/yarn/log,/data3/yarn/log,/data4/yarn/log,/data5/yarn/log,/data6/yarn/log</value>
    </property>
    

    7、mapred-env.sh配置

    export HADOOP_MAPRED_PID_DIR=~/data/hadoop/pids
    

    8、mapred-site.xml配置

    <!-- jobhistory properties -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>0.0.0.0:10120</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>0.0.0.0:19988</value>
    </property>
    

    9、slaves配置

    将Datanode的IP或hostname写入slaves文件。

    einvoice247
    einvoice248
    einvoice249
    einvoice250
    

    10、设置Hadoop环境变量

    vi ~/.bash_profile
    
    export HADOOP_HOME=~/hadoop-2.5.0-cdh5.2.1-och4.0.1
    export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
    export HADOOP_MAPRED_HOME=${HADOOP_HOME}
    export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    export HADOOP_HDFS_HOME=${HADOOP_HOME}
    export HADOOP_PREFIX=${HADOOP_HOME}
    export HADOOP_COMMON_HOME=${HADOOP_HOME}
    export HADOOP_YARN_HOME=${HADOOP_HOME}
    export YARN_HOME=${HADOOP_HOME}
    export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    export YARN_LOG_DIR=${HADOOP_HOME}/logs
    export YARN_PREFIX=${HADOOP_HOME}
    

    使环境变量生效

    source ~/.bash_profile
    

    11、分发到对应主机

    12、Hadoop启动前准备

    注意:此小节所有操作只有第一次安装才执行,后面操作禁用/慎用此处命令。

    格式化data目录
    启动journalnod前,格式化前先删除data目录除zookeeper外都可以删。
    rm -r ~/data/hadoop/hdfs/name/*
    rm -r ~/data/hadoop/journal/*
    rm -r ~/data/hadoop/pids/*
    启动ZooKeeper
    zkServer.sh start
    # 格式化ZK,创建命名空间,在一台namenode上执行【einvoice243】
    hdfs zkfc -formatZK
    启动JournalNode
    # 安装奇数个,与zookeeper相同主机
    hadoop-daemon.sh start journalnode
    注:einvoice243,einvoice244,einvoice247三台主机
    格式化NameNode
    # NameNode主节点上执行【einvoice243】
    hdfs namenode -format
    格式化DataNode
    # slaves文件中配置的所有主机上都有执行【einvoice247-250】
    hdfs datanode -format
    启动Hadoop进程
    # 主namenode节点执行【einvoice243】
    hadoop-daemon.sh start namenode
    # 备namenode节点执行【einvoice244】
    hdfs namenode -bootstrapStandby # 只是同步数据,不是启动
    # 主节点和备节点,启动所有进程【einvoice243、einvoice244】
    start-all.sh
    # 验证压缩方式,查看基本的压缩库是否正常
    hadoop org.apache.hadoop.util.NativeLibraryChecker 
    

    13、Hadoop启动

    注:第一次启动hadoop时按照上述步骤启动,以后启停hadoop时只需直接在主节点或者备节点执行

    start-all.sh
    stop-all.sh
    

    14、验证Hadoop是否启动成功

    # 测试Map
    hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.2.1.jar randomwriter rand
    # 测试Reduce
    hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.2.1.jar sort rand sort-rand
    # 测试yarn
    hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.2.1.jar pi 10 100
    # 验证结束后部分临时文件,请删除
    hadoop fs –rm -r /user/ocetl/*
    

    相关文章

      网友评论

          本文标题:Linux环境安装Hadoop集群

          本文链接:https://www.haomeiwen.com/subject/glgxlqtx.html