美文网首页Big Data Application
Flink1.8 集群搭建完全指南(4):Hadoop完全分布式

Flink1.8 集群搭建完全指南(4):Hadoop完全分布式

作者: MeazZa | 来源:发表于2019-07-01 17:46 被阅读0次

    前面的准备工作做好之后,我们来搭建带Kerberos和SASL的完全分布式的Hadoop集群。

    1. 集群环境准备

    我们现在有3台服务器,服务器列表如下:

    hostname ip 作用
    master 10.16.. NameNode, DataNode, ResourceManager, JobManager
    slave1 10.16.. DataNode, JobManager
    slave2 10.16.. DataNode, JobManager
    1.1 修改hosts文件

    在每台机器上执行以下命令获取hostname:

    $ hostname
    

    将每台机器的hostname和ip,添加到所有机器的/etc/hosts文件中,所有机器的/etc/hsots文件最终如以下所示:

    127.0.0.1 localhost.localdomain localhost
    127.0.0.1 localhost4.localdomain4 localhost4
    
    ::1 localhost.localdomain localhost
    ::1 localhost6.localdomain6 localhost6
    
    10.16.195.254 master
    10.16.196.1 slave1
    10.16.196.5 slave2
    

    在任意一台机器,通过域名可以ping到对应的ip地址,则配置成功。

    1.2 配置JDK环境
    • 通过yum安装JDK1.8版本的环境,命令如下:
    $ yum install java-1.8.0-openjdk*
    
    • 获取java的安装目录:
    $ whereis java
    java: /usr/bin/java /usr/lib/java /etc/java /usr/share/java /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64/bin/java /usr/share/man/man1/java.1.gz
    $ ls -l /usr/lib/jvm/
    total 0
    lrwxrwxrwx 1 root root  26 Dec 20  2018 java -> /etc/alternatives/java_sdk
    lrwxrwxrwx 1 root root  32 Dec 20  2018 java-1.8.0 -> /etc/alternatives/java_sdk_1.8.0
    lrwxrwxrwx 1 root root  40 Dec 20  2018 java-1.8.0-openjdk -> /etc/alternatives/java_sdk_1.8.0_openjdk
    drwxr-xr-x 9 root root 101 Dec 20  2018 java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64
    drwxr-xr-x 9 root root 101 Dec 20  2018 java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64-debug
    lrwxrwxrwx 1 root root  34 Dec 20  2018 java-openjdk -> /etc/alternatives/java_sdk_openjdk
    lrwxrwxrwx 1 root root  21 Dec 20  2018 jre -> /etc/alternatives/jre
    lrwxrwxrwx 1 root root  27 Dec 20  2018 jre-1.8.0 -> /etc/alternatives/jre_1.8.0
    lrwxrwxrwx 1 root root  35 Dec 20  2018 jre-1.8.0-openjdk -> /etc/alternatives/jre_1.8.0_openjdk
    lrwxrwxrwx 1 root root  51 Dec 20  2018 jre-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64 -> java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64/jre
    lrwxrwxrwx 1 root root  57 Dec 20  2018 jre-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64-debug -> java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64-debug/jre
    lrwxrwxrwx 1 root root  29 Dec 20  2018 jre-openjdk -> /etc/alternatives/jre_openjdk
    
    • 配置JAVA_HOME等环境变量
      打开/etc/profile文件,添加以下的内容:
    export JAVA_HOME=/usr/lib/jvm/java-1.8.0
    export PATH=$JAVA_HOME/bin:$PATH
    export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
    
    1.3 配置SSH免密登录

    在每台机器上生成ssh公钥和私钥对,命令如下:

    $ ssh-keygen -t rsa
    

    生成好的公钥在/.ssh/id_rsa.pub文件中,将所有机器的公钥写入到每台机器的/.ssh/authorized_keys文件中,并设置authorized_keys文件的权限为600。authorized_keys文件的示例如下:

    ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD2pELeQD25P/Mu+eKmfwAN7hyrixm243YYiPLn4goFe8q/uI9cUKivYNg14bGCavta8fVE90x4WJysXEjMA7SWk5Ic3jS6gEoFhXQ1F0FISpv0eAamikWHASgQNrqY3KGaEm1dxR8lV3/lc0TWjv9QEO3wCw8zj7l4r8LQL0wIaEZ8NB8ElSRx3yFHl6FZE2XEiu/+j61q9U612WMNXqgvTMS8Z5zDujuSgO4mVSOVTyfkE5baIbeZGGKjdNT/4400KBa5k0Qs+VGBaEZs5FxtsmXqBdG/r6Aef7yZivFPNz0mXqFknp5OAafpe/cfPr3weqmCePbUBVOnDIAQzEfj master
    ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5kUfv1h9fuWp/3xqEqlDcmrz0Bk2n0+/LLBeShtLpFn+/krF4az6BN5CAFCY5NBgebhfw/9AQSUmyrr9aUXkpi7664QweJsJAne4mxi9/lKkQi+2liV2mBVNly1ax8+tf6P3OKgSSiD+XSVzlr5StIQE9M/Cr67lELHjhV/rvY2ALEQXbZH666SWLL+KPkshLvtpRVqFQKUFPvn2cXBr+YShCBm7DasZcDAGg4XqlxCLaeyI4N+zsrrr/52cGHT/0yJKK42zJyZ2pyVN51rGDwQh0T+6AMEp2YJUo/o+2P9hD/HZTepmnCBef/UyUR6u0xgvBPK/QYvcgziFr/85P slave1
    ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCvUd0rjjGVz2umcWRMt3YHzxQBwIGdNo7QdXZcnILuTPqQ4PsIUTe+ULYrHcHlj+l6Z7XBO5ABd2BKks0Z8PR1eQyjY8yKv+P0LCe/fGKppsXzHvluexEe14aE95yI1aPguxAAqrLZ/NLhoQjoal2RvrGv6d/wLBPOdWx8DO2s2zbI5AuTawOyolSyOcSE5Mrgg3ahiYSs1OcopU8/pex3rOolfZVNbyyOjipL/QXdkcLLXQ0rpD41DzJzzgkNPmaG41rdcqjzFqLpE5O1qdFetfwcg1ZBniR3EdajGyd7jcccqXg2fWC/7+UarC4Dd7Yl9sup7zkExw/QhPiMY8fh slave2
    

    完成每台机器的配置后,可以通过ssh直接登录其他机器。

    2. 配置Hadoop

    2.1 下载Hadoop的安装包

    在Hadoop的下载页选择2.8.5版本的二进制文件,并下载在master节点的/data目录中,http://hadoop.apache.org/releases.html

    下载完成后,将hadoop的tar.gz包scp到其他slave节点的相同目录下,并在所有机器上解压缩安装包。

    $ wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
    $ scp hadoop-2.8.5.tar.gz root@slave1:/data
    $ scp hadoop-2.8.5.tar.gz root@slave2:/data
    $ tar -xvf hadoop-2.8.5.tar.gz
    
    2.2 Hadoop的环境变量配置

    在所有机器的/etc/profile文件中添加以下的内容:

    export HADOOP_HOME=/data/hadoop-2.8.5
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    
    2.3 配置kerberos账号

    Hadoop中通常会使用三个kerberos账号:hdfs,yarn和HTTP,添加账号的命令如下:

    $ kadmin.local -q "addprinc -randkey hdfs/master@HADOOP.COM"
    $ kadmin.local -q "addprinc -randkey yarn/master@HADOOP.COM"
    $ kadmin.local -q "addprinc -randkey HTTP/master@HADOOP.COM"
    

    生成每个账号的keytab文件:

    $ kadmin.local -q "xst -k hdfs.keytab hdfs/master@HADOOP.COM"
    $ kadmin.local -q "xst -k yarn.keytab yarn/master@HADOOP.COM"
    $ kadmin.local -q "xst -k HTTP.keytab HTTP/master@HADOOP.COM"
    

    将三个keytab文件合并为一个:

    $ ktutil
    ktutil:  rkt hdfs.keytab
    ktutil:  rkt yarn.keytab
    ktutil:  rkt HTTP.keytab
    ktutil:  wkt hadoop.keytab
    ktutil:  q
    
    2.4 分发keytab文件并登录

    将此文件移动到hadoop目录的etc/hadoop目录下,并scp到其他slave机器的相同目录:

    $ mv hadoop.keytab /data/hadoop-2.8.5/etc/hadoop/
    $ scp /data/hadoop-2.8.5/etc/hadoop/hadoop.keytab root@slave1:/data/hadoop-2.8.5/etc/hadoop
    $ scp /data/hadoop-2.8.5/etc/hadoop/hadoop.keytab root@slave2:/data/hadoop-2.8.5/etc/hadoop
    

    配置crontab每天登录一次:

    $ crontab -l
    0  0  *  *  *   kinit -k -t /data/hadoop-2.8.5/etc/hadoop/hadoop.keytab hdfs/master@HADOOP.COM
    0  0  *  *  *   kinit -k -t /data/hadoop-2.8.5/etc/hadoop/hadoop.keytab yarn/master@HADOOP.COM
    0  0  *  *  *   kinit -k -t /data/hadoop-2.8.5/etc/hadoop/hadoop.keytab HTTP/master@HADOOP.COM
    
    2.5 修改Hadoop配置文件

    配置文件的路径在/data/hadoop-2.8.5/etc/hadoop下,

    • slaves
    master
    slave1
    slave2
    
    • core-site.xml
    <configuration>
       <property>
          <name>fs.defaultFS</name>
          <value>hdfs://localhost:9000</value>
       </property>
       <property>
          <name>hadoop.security.authentication</name>
          <value>kerberos</value>
       </property>
       <property>
          <name>hadoop.security.authorization</name>
          <value>true</value>
       </property>
       <property>
          <name>fs.permissions.umask-mode</name>
          <value>027</value>
       </property>
    </configuration>
    
    • mapred-site.xml
    <configuration>
       <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
       </property>
    </configuration>
    
    • yarn-site.xml
    <configuration>
       <property>
          <name>yarn.resourcemanager.hostname</name>
          <value>ads-data-web-online012-bjdxt9p</value>
       </property>
       <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
       </property>
       <property>
          <name>yarn.resourcemanager.keytab</name>
          <value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
       </property>
       <property>
          <name>yarn.resourcemanager.principal</name>
          <value>yarn/master@HADOOP.COM</value>
       </property>
       <property>
          <name>yarn.nodemanager.keytab</name>
          <value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
       </property>
       <property>
          <name>yarn.nodemanager.principal</name>
          <value>yarn/master@HADOOP.COM</value>
       </property>
       <property>
          <name>yarn.nodemanager.resource.memory-mb</name>
          <value>16384</value>
       </property>
       <property>
          <name>yarn.scheduler.minimum-allocation-mb</name>
          <value>1024</value>
       </property>
       <property>
          <name>yarn.scheduler.maximum-allocation-mb</name>
          <value>16384</value>
       </property>
       <property>
          <name>yarn.nodemanager.vmem-check-enabled</name>
          <value>false</value>
       </property>
    </configuration>
    
    • hdfs-site.xml
    <configuration>
       <property>
          <name>dfs.namenode.secondary.http-address</name>
          <value>10.16.195.254:50090</value>
       </property>
       <property>
          <name>dfs.replication</name>
          <value>2</value>
       </property>
       <property>
          <name>dfs.namenode.name.dir</name>
          <value>file:/data/hadoop/dfs/name</value>
       </property>
       <property>
          <name>dfs.datanode.data.dir</name>
          <value>file:/data/hadoop/dfs/data</value>
       </property>
       <property>
          <name>dfs.datanode.max.xcievers</name>
          <value>4096</value>
          <description>max number of file which can be opened in a datanode</description>
       </property>
       <property>
          <name>dfs.block.access.token.enable</name>
          <value>true</value>
       </property>
       <property>
          <name>dfs.namenode.keytab.file</name>
          <value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
       </property>
       <property>
          <name>dfs.namenode.kerberos.principal</name>
          <value>hdfs/master@HADOOP.COM</value>
       </property>
       <property>
          <name>dfs.namenode.kerberos.https.principal</name>
          <value>HTTP/master@HADOOP.COM</value>
       </property>
       <property>
          <name>dfs.datanode.address</name>
          <value>0.0.0.0:1034</value>
       </property>
       <property>
          <name>dfs.datanode.http.address</name>
          <value>0.0.0.0:1036</value>
       </property>
       <property>
          <name>dfs.datanode.keytab.file</name>
          <value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
       </property>
       <property>
          <name>dfs.datanode.kerberos.principal</name>
          <value>hdfs/master@HADOOP.COM</value>
       </property>
       <property>
          <name>dfs.datanode.kerberos.https.principal</name>
          <value>HTTP/master@HADOOP.COM</value>
       </property>
       <!-- datanode SASL配置 -->
       <property>
          <name>dfs.http.policy</name>
          <value>HTTPS_ONLY</value>
       </property>
       <property>
          <name>dfs.data.transfer.protection</name>
          <value>integrity</value>
       </property>
       <!--journalnode 配置-->
       <property>
          <name>dfs.journalnode.keytab.file</name>
          <value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
       </property>
       <property>
          <name>dfs.journalnode.kerberos.principal</name>
          <value>hdfs/master@HADOOP.COM</value>
       </property>
       <property>
          <name>dfs.journalnode.kerberos.internal.spnego.principal</name>
          <value>HTTP/master@HADOOP.COM</value>
       </property>
       <!--webhdfs-->
       <property>
          <name>dfs.webhdfs.enabled</name>
          <value>true</value>
       </property>
       <property>
          <name>dfs.web.authentication.kerberos.principal</name>
          <value>HTTP/master@HADOOP.COM</value>
       </property>
       <property>
          <name>dfs.web.authentication.kerberos.keytab</name>
          <value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
       </property>
       <property>
          <name>dfs.datanode.data.dir.perm</name>
          <value>700</value>
       </property>
       <property>
          <name>dfs.nfs.kerberos.principal</name>
          <value>hdfs/master@HADOOP.COM</value>
       </property>
       <property>
          <name>dfs.nfs.keytab.file</name>
          <value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
       </property>
       <property>
          <name>dfs.secondary.https.address</name>
          <value>10.16.195.254:50495</value>
       </property>
       <property>
          <name>dfs.secondary.https.port</name>
          <value>50495</value>
       </property>
       <property>
          <name>dfs.secondary.namenode.keytab.file</name>
          <value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
       </property>
       <property>
          <name>dfs.secondary.namenode.kerberos.principal</name>
          <value>hdfs/master@HADOOP.COM</value>
       </property>
       <property>
          <name>dfs.secondary.namenode.kerberos.https.principal</name>
          <value>HTTP/master@HADOOP.COM</value>
       </property>
    </configuration>
    
    • 分别hadoop配置文件到其他机器
    $ scp /data/hadoop-2.8.5/etc/hadoop/* root@slave1:/data/hadoop-2.8.5/etc/hadoop
    $ scp /data/hadoop-2.8.5/etc/hadoop/* root@slave2:/data/hadoop-2.8.5/etc/hadoop
    
    2.6 NameNode格式化
    $ hdfs namenode -format
    19/06/30 22:23:45 INFO namenode.NameNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   user = root
    STARTUP_MSG:   host = master/127.0.0.1
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 2.8.5
    ...
    19/06/30 22:23:46 INFO util.ExitUtil: Exiting with status 0
    19/06/30 22:23:46 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at master/127.0.0.1
    

    3 启动Hadoop集群

    3.1 启动HDFS集群
    $ start-dfs.sh
    $ jps
    19282 DataNode
    28324 Jps
    19480 SecondaryNameNode
    18943 NameNode
    

    访问NameNode UI:https://10.16.195.254:50470/

    3.2 启动Yarn集群
    $ start-yarn.sh
    $ jps
    21088 NodeManager
    19282 DataNode
    28324 Jps
    19480 SecondaryNameNode
    18943 NameNode
    20959 ResourceManager
    

    访问Yarn UI:http://10.16.195.254:8088/

    至此Hadoop完全分布式的集群搭建完成。

    相关文章

      网友评论

        本文标题:Flink1.8 集群搭建完全指南(4):Hadoop完全分布式

        本文链接:https://www.haomeiwen.com/subject/ronyqctx.html