美文网首页
Docker 部署Hadoop集群

Docker 部署Hadoop集群

作者: 独钓寒雪1795 | 来源:发表于2020-03-22 22:54 被阅读0次
    一、网络设置以及集群部署规划
    image
    二、Docker容器
    • 拉取镜像
     docker pull daocloud.io/library/centos:latest
    
    • 创建容器

      • 按照集群的架构,创建容器时需要设置固定IP,所以先要在docker使用如下命令创建固定IP的子网
     docker network create --subnet=172.18.0.0/16 netgroup
    
    • docker的子网创建完成之后就可以创建固定IP的容器了
     docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name Cluster-master -h Cluster-master -p 50070:50070 -p 50075:50075 -p 50080:50080 --net netgroup --ip 172.18.0.2 daocloud.io/library/centos /usr/sbin/init 
     docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave1 -h cluster-slave1 -p 8088:8088 -p 8042:8042 -p 8044:8044 --net netgroup --ip 172.18.0.3 daocloud.io/library/centos /usr/sbin/init
     docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave2 -h cluster-slave2 -p 18000:18000 -p 18001:18001 -p 18002:18002 --net netgroup --ip 172.18.0.4 daocloud.io/library/centos /usr/sbin/init
    
    • 启动控制台并进入docker容器中
    docker exec -it master /bin/bash
    

    三、软件安装

    3.1 安装OpenSSH免密登录
     [root@cluster-master /]# yum -y install openssh openssh-server openssh-clients
     [root@cluster-master /]#  systemctl restart sshd
    

    由于 cluster-master 和cluster-slave1 需要访问其他三个集群,需要修改ssh

    [root@cluster-master /]# vi /etc/ssh/ssh_config
    

    将原来的StrictHostKeyChecking ask 设置StrictHostKeyChecking为no 保存

    master公钥分发

     [root@cluster-master /]#  ssh-keygen -t rsa
     [root@cluster-master .ssh]# cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
    

    文件生成之后用scp将公钥文件分发到集群slave主机
    docker的root密码需要重新设置,先检查passwd命令,没有则安装

    [root@cluster-master .ssh]# yum -y install passwd
    [root@cluster-master .ssh]# passwd 
    [root@cluster-master .ssh]# ssh root@slave1 'mkdir ~/.ssh'
    [root@cluster-master .ssh]# scp ~/.ssh/authorized_keys root@slave1:~/.ssh [root@cluster-master .ssh]# ssh root@slave2 'mkdir ~/.ssh’
    [root@cluster-master .ssh]# scp ~/.ssh/authorized_keys root@slave2:~/.ssh
    

    分发完成之后测试(ssh root@slave1)是否已经可以免输入密码登录
    cluster-slave1 也需要生成公钥

    [root@cluster-slave1 .ssh]# ssh-keygen -t rsa
    [root@cluster-slave1 .ssh]# ssh-copy-id  master
    [root@cluster-slave1 .ssh]# ssh-copy-id slave2
    [root@cluster-slave1 .ssh]# ssh-copy-id slave1
    
    3.2 Ansible安装
    [root@cluster-master ~]# yum -y install epel-release
    [root@cluster-master ~]# yum -y install ansible
    [root@cluster-master ~]# vi /etc/ansible/hosts
    [cluster]
    cluster-master
    cluster-slave1
    cluster-slave2
    [master]
    cluster-master
    [slaves]
    cluster-slave1
    cluster-slave2
    

    配置docker容器hosts
    由于/etc/hosts文件在容器启动时被重写,直接修改内容在容器重启后不能保留,为了让容器在重启之后获取集群hosts,使用了一种启动容器后重写hosts的方法。需要在~/.bashrc中追加以下指令

    [root@cluster-master ~]# vi .bashrc
     :>/etc/hosts
     cat >>/etc/hosts<<EOF
     127.0.0.1   localhost
     172.18.0.2  cluster-master
     172.18.0.3  cluster-slave1
     172.18.0.4  cluster-slave2
     EOF
    
    [root@cluster-master ~]# source ~/.bashrc
    

    使配置文件生效,可以看到/etc/hosts文件已经被改为需要的内容

    [root@cluster-master ~]# cat /etc/hosts
    127.0.0.1   localhost
    172.18.0.2  cluster-master
    172.18.0.3  cluster-slave1
    172.18.0.4  cluster-slave2
    

    用ansible分发.bashrc至集群slave下

    [root@cluster-master ~]# ansible cluster -m copy -a "src=~/.bashrc dest=~/"
    
    3.3 JDK && Hadoop

    下载hadoop3 到/opt目录下,解压安装包,并创建链接文件

    [root@cluster-master opt]# yum install java-1.8.0-openjdk-devel.x86_64
    [root@cluster-master opt]# wget https://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz
    [root@cluster-master opt]# tar -xzvf hadoop-2.9.2.tar.gz
    [root@cluster-master opt]# ln -s hadoop-2.9.2 hadoop
    

    各个机器需要有which命令 如果没有,hadoop 会报错, 需要安装

    [root@cluster-master opt]# yum -y install which
    

    配置java和hadoop环境变量 编辑 ~/.bashrc文件

    #hadoop
    export HADOOP_HOME=/opt/hadoop-2.9.2
    export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
    #java
    export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el8_1.x86_64
    export PATH=$JAVA_HOME/bin:$PATH
    
     [root@cluster-master opt]# source .bashrc
    

    配置hadoop运行所需配置文件

    • 修改core-site.xml
    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/home/hadoop/tmp</value>
            <description>A base for other temporary directories.</description>
        </property>
        <!-- file system properties -->
        <property>
            <name>fs.default.name</name>
            <value>hdfs://cluster-master:9000</value>
        </property>
        <property>
        <name>fs.trash.interval</name>
            <value>4320</value>
        </property>
    </configuration>
    
    • hdfs-site.xml
    <configuration>
    <property>
       <name>dfs.namenode.name.dir</name>
       <value>/home/hadoop/tmp/dfs/name</value>
     </property>
     <property>
       <name>dfs.datanode.data.dir</name>
       <value>/home/hadoop/data</value>
     </property>
     <property>
       <name>dfs.replication</name>
       <value>3</value>
     </property>
     <property>
       <name>dfs.webhdfs.enabled</name>
       <value>true</value>
     </property>
     <property>
       <name>dfs.permissions.superusergroup</name>
       <value>staff</value>
     </property>
     <property>
       <name>dfs.permissions.enabled</name>
       <value>false</value>
     </property>
     <property>
     <name>dfs.namenode.secondary.http-address</name>
     <value>cluster-slave2:50090</value>
     </property>
    </configuration>
    
    • mapred-site.xml
    <configuration>
    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property>
    </configuration>
    
    • yarn-site.xml
    <configuration>
       <property>
       <name>yarn.resourcemanager.hostname</name>
       <value>cluster-slave1</value>
     </property>
     <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
     </property>
     <property>
       <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
       <value>org.apache.hadoop.mapred.ShuffleHandler</value>
     </property>
     </configuration>
    

    打包hadoop 向slaves分发

    [root@master ~]# tar -cvf hadoop-dis.tar hadoop hadoop-2.9.2
    

    使用ansible-playbook分发.bashrc和hadoop-dis.tar至slave主机

      ---
    - hosts: cluster
      tasks:
        - name: copy .bashrc to slaves
          copy: src=~/.bashrc dest=~/
          notify:
            - exec source
        - name: copy hadoop-dis.tar to slaves
          unarchive: src=/opt/hadoop-dis.tar dest=/opt
      handlers:
        - name: exec source
          shell: source ~/.bashrc
    

    将以上yaml保存为hadoop-dis.yaml,并执行

    [root@cluster-master ~]# ansible-playbook hadoop-dis.yaml
    

    四、Hadoop 启动

    4.1 格式化namenode
    [root@cluster-master ~]# hadoop namenode -format
    
    4.2 HDFS 在cluster-master 上执行
    [root@cluster-master ~]# ./start-dfs.sh
    
    4.3 YARN 在cluster-slave1上执行
    [root@cluster-master ~]# ./start-yarn.sh
    
    4.4 验证服务
    http://宿主机IP:50070/   HDFS
    http://宿主机IP:8088/  Yarn
    

    转载
    https://www.cnblogs.com/coolwxb/p/10975352.html

    相关文章

      网友评论

          本文标题:Docker 部署Hadoop集群

          本文链接:https://www.haomeiwen.com/subject/xcybyhtx.html