美文网首页
CentOS 7 搭建CDH5.12.0及Kafka安装

CentOS 7 搭建CDH5.12.0及Kafka安装

作者: 白面葫芦娃92 | 来源:发表于2019-03-10 14:05 被阅读0次

    一、集群搭建&MySQL部署
    1.首先使用青云搭建了大数据集群,集群有三台机器,分别名为hadoop001,hadoop002,hadoop003
    2.在hadoop001机器上部署MySQL
    二、环境准备
    cm5.12.0地址:
    http://archive.cloudera.com/cm5/repo-as-tarball/5.12.0/cm5.12.0-centos7.tar.gz
    http://archive.cloudera.com/cm5/repo-as-tarball/5.12.0/cm5.12.0-centos7.tar.gz.sha1
    cdh5.12.0地址
    http://archive.cloudera.com/cdh5/parcels/5.12.0/CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel
    http://archive.cloudera.com/cdh5/parcels/5.12.0/CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel.sha1
    ***步骤3~8集群的每个机器都要操作一次
    3.部署jdk,家目录设置为$JAVA_HOME=/usr/java/jdk1.8.0_45,永久关闭SELINUX并关闭防火墙

    //关闭SELINUX,修改配置文件/etc/selinux/config,将SELINU置为disabled
    [root@hadoop001 ~]# cat /etc/selinux/config
    # This file controls the state of SELinux on the system.
    # SELINUX= can take one of these three values:
    #     enforcing - SELinux security policy is enforced.
    #     permissive - SELinux prints warnings instead of enforcing.
    #     disabled - No SELinux policy is loaded.
    SELINUX=disabled
    # SELINUXTYPE= can take one of three two values:
    #     targeted - Targeted processes are protected,
    #     minimum - Modification of targeted policy. Only selected processes are protected. 
    #     mls - Multi Level Security protection.
    SELINUXTYPE=targeted 
    [root@hadoop001 ~]# sestatus
    SELinux status:                 disabled
    //关闭防火墙
    [root@hadoop001 ~]# systemctl stop firewalld.service
    [root@hadoop001 ~]# systemctl status firewalld.service  //验证
    ● firewalld.service - firewalld - dynamic firewall daemon
       Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
       Active: inactive (dead)
         Docs: man:firewalld(1) 
    [root@hadoop001 ~]# systemctl disable firewalld.service
    [root@hadoop001 ~]# systemctl list-unit-files | grep firewalld  //验证
    firewalld.service                             disabled 
    

    4.检查python版本(系统默认的python版本不要修改,如果生产上跑Python服务需要更高版本,就另外装一个)

    [root@hadoop001 ~]# python --version
    Python 2.7.5
    

    5.配置NTP网络时间协议,同步时区和时间

    [root@hadoop001 ~]# timedatectl set-timezone Asia/Shanghai
    [root@hadoop001 ~]# timedatectl status
          Local time: Fri 2019-02-22 22:17:09 CST
      Universal time: Fri 2019-02-22 14:17:09 UTC
            RTC time: Fri 2019-02-22 14:17:09
           Time zone: Asia/Shanghai (CST, +0800)
         NTP enabled: no
    NTP synchronized: no
     RTC in local TZ: no
          DST active: n/a
    [root@hadoop001 ~]# yum install -y chrony
    [root@hadoop001 ~]# systemctl start chronyd
    [root@hadoop001 ~]# systemctl enable chronyd
    [root@hadoop001 ~]# yum install -y ntpdate
    [root@hadoop001 ~]# systemctl enable ntpd.service
    [root@hadoop001 ~]# timedatectl set-ntp yes
    [root@hadoop001 ~]# timedatectl status
          Local time: Wed 2019-03-06 22:59:01 CST
      Universal time: Wed 2019-03-06 14:59:01 UTC
            RTC time: Wed 2019-03-06 14:59:01
           Time zone: Asia/Shanghai (CST, +0800)
         NTP enabled: yes
    NTP synchronized: yes
     RTC in local TZ: no
          DST active: n/a
    //上面的命令要在三台机器都执行,下面的只在hadoop001执行
    [root@hadoop001 ~]# cp /etc/ntp.conf /etc/ntp.conf.bak
    [root@hadoop001 ~]# cp /etc/sysconfig/ntpd /etc/sysconfig/ntpd.bak
    [root@hadoop001 ~]# echo "restrict 192.168.137.0 mask 255.255.255.0 nomodify notrap" >> /etc/ntp.conf
    [root@hadoop001 ~]# echo "SYNC_HWCLOCK=yes" >> /etc/sysconfig/ntpd
    //上面的命令只在hadoop001执行,下面的要在三台机器都执行
    [root@hadoop001 ~]# systemctl restart ntpd
    [root@hadoop001 ~]# crontab -e
    */30 * * * * /usr/sbin/ntpdate 192.168.137.2
    no crontab for root - using an empty one
    crontab: installing new crontab
    
    如果不执行此步骤集群会告警

    6.关闭大页面

    [root@hadoop001 ~]# echo never > /sys/kernel/mm/transparent_hugepage/defrag
    [root@hadoop001 ~]# echo never > /sys/kernel/mm/transparent_hugepage/enabled
    [root@hadoop001 ~]# echo 'echo never > /sys/kernel/mm/transparent_hugepage/defrag'>>  /etc/rc.local
    [root@hadoop001 ~]# echo 'echo never > /sys/kernel/mm/transparent_hugepage/enabled'>>  /etc/rc.local
    

    7.swap 物理磁盘空间可以内存使用

    [root@hadoop001 ~]# echo 'vm.swappiness = 10' >> /etc/sysctl.conf
    [root@hadoop001 ~]# sysctl -p
    vm.swappiness = 10
    

    swap的取值范围是0~100,0不代表禁用,但是惰性最高;100代表积极性最高
    对集群计算的实时性要求高的,swap设置为0,能不使用磁盘当内存就不使用,允许job挂 迅速的加内存或调大参数然后重启job;
    对集群计算的实时性要求不高的,swap一般设置为10或30,不允许job挂,让job慢慢的运行
    8.安装http和启动http服务

    [root@hadoop001 ~]# yum install -y httpd
    [root@hadoop001 ~]# rpm -qa|grep httpd
    httpd-2.2.15-69.el6.centos.x86_64
    httpd-tools-2.2.15-69.el6.centos.x86_64
    [root@hadoop001 ~]# systemctl list-unit-files | grep httpd
    httpd.service                                 disabled
    [root@hadoop001 ~]# systemctl enable httpd.service
    Created symlink from /etc/systemd/system/multi-user.target.wants/httpd.service to /usr/lib/systemd/system/httpd.service.
    [root@hadoop001 ~]# systemctl list-unit-files | grep httpd
    httpd.service                                 enabled 
    [root@hadoop001 ~]# systemctl start httpd.service
    [root@hadoop001 ~]# systemctl status httpd.service
    ● httpd.service - The Apache HTTP Server
       Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
       Active: active (running) since Wed 2019-03-06 23:18:59 CST; 35s ago
         Docs: man:httpd(8)
               man:apachectl(8)
     Main PID: 3478 (httpd)
       Status: "Total requests: 0; Current requests/sec: 0; Current traffic:   0 B/sec"
       CGroup: /system.slice/httpd.service
               ├─3478 /usr/sbin/httpd -DFOREGROUND
               ├─3479 /usr/sbin/httpd -DFOREGROUND
               ├─3480 /usr/sbin/httpd -DFOREGROUND
               ├─3481 /usr/sbin/httpd -DFOREGROUND
               ├─3482 /usr/sbin/httpd -DFOREGROUND
               └─3483 /usr/sbin/httpd -DFOREGROUND
    
    Mar 06 23:18:58 hadoop001 systemd[1]: Starting The Apache HTTP Server...
    Mar 06 23:18:58 hadoop001 httpd[3478]: AH00558: httpd: Could not reliably d...ge
    Mar 06 23:18:59 hadoop001 systemd[1]: Started The Apache HTTP Server.
    Hint: Some lines were ellipsized, use -l to show in full.
    

    8‘.3台机器配置SSH相互通信信任

    // 8.1  3台机器执行 ssh-keygen
    [root@hadoop001 ~]# ssh-keygen
    // 8.2  选取hadoop001,生成authorized_keys文件
    [root@hadoop001 .ssh]# ll
    total 8
    -rw------- 1 root root 1675 Mar  7 15:37 id_rsa
    -rw-r--r-- 1 root root  396 Mar  7 15:37 id_rsa.pub
    [root@hadoop001 .ssh]# cat /root/.ssh/id_rsa.pub>> /root/.ssh/authorized_keys
    [root@hadoop001 .ssh]# ll
    total 12
    -rw-r--r-- 1 root root  396 Mar  7 15:39 authorized_keys
    -rw------- 1 root root 1675 Mar  7 15:37 id_rsa
    -rw-r--r-- 1 root root  396 Mar  7 15:37 id_rsa.pub
    // 8.3  将其他2台的id_rsa.pub内容,手动copy到第一台的authorized_keys文件
    [root@hadoop002 .ssh]# more id_rsa.pub
    [root@hadoop003 .ssh]# more id_rsa.pub
    //拷贝至authorized_keys文件(注意copy时,最好先放到记事本中,将回车去掉,成为一行,不然很可能配置互相信任失败)
    // 8.4  设置权限(每台机器)
    [root@hadoop001 .ssh]# chmod 700 -R ~/.ssh
    [root@hadoop001 .ssh]# chmod 600 ~/.ssh/authorized_keys //只有第一台有
    // 8.5  将第一台的authorized_keys scp 给其他2台(第一次传输,需要输入密码)
    [root@hadoop001 .ssh]# scp authorized_keys root@hadoop002:/root/.ssh
    [root@hadoop001 .ssh]# scp authorized_keys root@hadoop003:/root/.ssh
    //8.6  验证(每台机器上执行下面命令,只输入yes,不输入密码,则这5台互相通信了)
    [root@hadoop001 .ssh]# ssh root@hadoop002 date
    [root@hadoop001 .ssh]# ssh root@hadoop003 date
    

    三、配置离线数据源
    ****步骤9~11只在server一台机器上创建即可
    9.创建 parcels文件

    [root@hadoop001 ~]# cd /var/www/html
    [root@hadoop001 html]# mkdir parcels
    [root@hadoop001 html]# cd parcels
    [root@hadoop001 parcels]# rz  (上传CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel、CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel.sha1、manifest.json三个文件,然后将CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel.sha1重命名)
    [root@hadoop001 parcels]# ll
    total 1662604
    -rw-r--r-- 1 root root 1702423659 Feb 23 00:38 CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel
    -rw-r--r-- 1 root root         41 Feb 22 22:50 CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel.sha1
    -rw-r--r-- 1 root root      72612 Feb 23 09:06 manifest.json
    [root@hadoop001 parcels]# mv CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel.sha1 CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel.sha
    [root@hadoop001 parcels]# ll
    total 1662604
    -rw-r--r-- 1 root root 1702423659 Feb 23 00:38 CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel
    -rw-r--r-- 1 root root         41 Feb 22 22:50 CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel.sha
    -rw-r--r-- 1 root root      72612 Feb 23 09:06 manifest.json
    

    10.校验parcels文件下载未损坏(生产上因为没校验吃过亏)

    [root@hadoop001 parcels]# sha1sum CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel
    fa704f42b8da8916409c3f52f189629152ba2839  CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel
    [root@hadoop001 parcels]# cat CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel.sha
    fa704f42b8da8916409c3f52f189629152ba2839
    

    11.下载、上传、解压cloudera-manager-centos7-cm5.12.0_x86_64.tar.gz,然后移动到和官网一样的目录路径cm5/redhat/6/x86_64/下

    [root@hadoop001 parcels]# mkdir -p /opt/rpminstall
    [root@hadoop001 parcels]# cd /opt/rpminstall
    [root@hadoop001 rpminstall]# ll
    -rw-r--r-- 1 root root 952281419 Mar  7 14:18 cm5.12.0-centos7.tar.gz
    [root@hadoop001 rpminstall]# tar -zxf cm5.12.0-centos7.tar.gz -C /var/www/html/
    [root@hadoop001 rpminstall]# cd /var/www/html/
    [root@hadoop001 html]# ll
    drwxrwxr-x 3 1106  592 4096 Jul  7  2017 cm
    drwxr-xr-x 2 root root 4096 Feb 23 09:17 parcels
    [root@hadoop001 html]# mkdir -p cm5/redhat/7/x86_64/
    [root@hadoop001 html]# mv cm cm5/redhat/7/x86_64/
    [root@hadoop001 html]# ll
    total 8
    drwxr-xr-x 3 root root 4096 Mar  7 00:01 cm5
    drwxr-xr-x 2 root root 4096 Feb 23 09:17 parcels
    

    ***步骤12每个机器都要配置cloudera-manager.repo文件
    12.配置本地的yum源,cdh集群在安装时会就从本地down包,不会从官网了

    [root@hadoop001 html]# vi /etc/yum.repos.d/cloudera-manager.repo
    [cloudera-manager]
    name = Cloudera Manager, Version 5.12.0
    baseurl = http://192.168.137.2/cm5/redhat/7/x86_64/cm/5/
    gpgcheck = 0
    

    浏览器查看下面两个网址是否出来,假如有,就配置成功
    http://192.168.137.2/parcels/
    http://192.168.137.2/cm5/redhat/7/x86_64/cm/5/
    (192.168.137.2要替换为外网IP)

    四、使用RPM包安装并启动CM服务
    13.在cm实例中安装服务器rpm包

    [root@hadoop001 ~]# cd /var/www/html/cm5/redhat/7/x86_64/cm/5/RPMS/x86_64
    [root@hadoop001 x86_64]# ll
    total 932400
    -rw-rw-r-- 1 1106 592   9776948 Jul  7  2017 cloudera-manager-agent-5.12.0-1.cm5120.p0.120.el7.x86_64.rpm
    -rw-rw-r-- 1 1106 592 701119748 Jul  7  2017 cloudera-manager-daemons-5.12.0-1.cm5120.p0.120.el7.x86_64.rpm
    -rw-rw-r-- 1 1106 592      8720 Jul  7  2017 cloudera-manager-server-5.12.0-1.cm5120.p0.120.el7.x86_64.rpm
    -rw-rw-r-- 1 1106 592     10620 Jul  7  2017 cloudera-manager-server-db-2-5.12.0-1.cm5120.p0.120.el7.x86_64.rpm
    -rw-rw-r-- 1 1106 592  30604352 Jul  7  2017 enterprise-debuginfo-5.12.0-1.cm5120.p0.120.el7.x86_64.rpm
    -rw-rw-r-- 1 1106 592  71204325 Jul  7  2017 jdk-6u31-linux-amd64.rpm
    -rw-rw-r-- 1 1106 592 142039186 Jul  7  2017 oracle-j2sdk1.7-1.7.0+update67-1.x86_64.rpm
    [root@hadoop001 x86_64]# yum install -y cloudera-manager-daemons-5.12.0-1.cm5120.p0.120.el7.x86_64.rpm
    [root@hadoop001 x86_64]# yum install -y cloudera-manager-server-5.12.0-1.cm5120.p0.120.el7.x86_64.rpm
    

    14.配置mysql-connector-java.jar

    [root@hadoop001 x86_64]# mkdir -p /usr/share/java
    [root@hadoop001 x86_64]# cd /usr/share/java
    [root@hadoop001 x86_64]# rz
    [root@hadoop001 java]# ll
    total 968
    -rw-r--r-- 1 root root 989495 Sep 27 07:05 mysql-connector-java.jar
    

    注意一定要把版本号去掉,名字改成'mysql-connector-java.jar'
    15.MySQL配置cmf 用户 and 数据库

    mysql> create database cmf DEFAULT CHARACTER SET utf8;
    Query OK, 1 row affected (0.00 sec)
    mysql> grant all on cmf.* TO 'cmf'@'%' IDENTIFIED BY 'cmf_password';
    Query OK, 0 rows affected (0.00 sec)
    mysql> flush privileges;
    Query OK, 0 rows affected (0.00 sec)
    

    16.设置cloudera-scm-server 连接到 MySQL

    [root@hadoop001 java]# cd /etc/cloudera-scm-server/
    [root@hadoop001 cloudera-scm-server]# vi db.properties
    # Copyright (c) 2012 Cloudera, Inc. All rights reserved.
    #
    # This file describes the database connection.
    #
    
    # The database type
    # Currently 'mysql', 'postgresql' and 'oracle' are valid databases.
    com.cloudera.cmf.db.type=mysql
    
    # The database host
    # If a non standard port is needed, use 'hostname:port'
    com.cloudera.cmf.db.host=192.168.137.2:3306
    
    # The database name
    com.cloudera.cmf.db.name=cmf
    
    # The database user
    com.cloudera.cmf.db.user=cmf
    
    # The database user's password
    com.cloudera.cmf.db.password=cmf_password
    
    # The db setup type
    # By default, it is set to INIT
    # If scm-server uses Embedded DB then it is set to EMBEDDED
    # If scm-server uses External DB then it is set to EXTERNAL
    com.cloudera.cmf.db.setupType=EXTERNAL
    

    17.启动cm server

    [root@hadoop001 cloudera-scm-server]# service cloudera-scm-server start
    Starting cloudera-scm-server (via systemctl):              [  OK  ]
    //可以查看下方日志来实时观察cm server启动是否有问题
    [root@hadoop001 cloudera-scm-server]# cd /var/log/cloudera-scm-server/
    [root@hadoop001 cloudera-scm-server]# tail -f cloudera-scm-server.log
    2018-10-13 13:13:57,831 INFO WebServerImpl:org.mortbay.log: Started SelectChannelConnector@0.0.0.0:7180
    2018-10-13 13:13:57,831 INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty server.
    

    ****此处遇见一个坑,因为用的云主机,一定要设置内外网端口转发和放开防火墙7180端口,不然死活都打不开ip:7180
    18.打开web界面,安装CM server 和 agent



    19.集群一系列设置


    mysql> create database amon DEFAULT CHARACTER SET utf8;
    Query OK, 1 row affected (0.01 sec)
    mysql> grant all on amon.* TO 'amon'@'%' IDENTIFIED BY 'amon_password';
    Query OK, 0 rows affected (0.00 sec)
    mysql> flush privileges;
    Query OK, 0 rows affected (0.00 sec)
    

    20.kafka部署
    https://www.cloudera.com/documentation/kafka/latest/topics/kafka_installing.html#concept_m2t_d45_4r
    http://archive.cloudera.com/kafka/parcels/
    [root@hadoop001 ~]# cd /var/www/html
    [root@hadoop001 html]# mkdir kafka_parcels
    [root@hadoop001 html]# cd kafka_parcels
    [root@hadoop001 html]# rz
    [root@hadoop001 kafka_parcels]# ll
    total 66536
    -rw-r--r-- 1 root root 68116503 Oct 13 17:05 KAFKA-2.1.1-1.2.1.1.p0.18-el7.parcel
    -rw-r--r-- 1 root root       41 Oct 13 17:04 KAFKA-2.1.1-1.2.1.1.p0.18-el7.parcel.sha1
    -rw-r--r-- 1 root root     5252 Feb  6  2018 manifest.json
    [root@hadoop001 kafka_parcels]# mv KAFKA-2.1.1-1.2.1.1.p0.18-el7.parcel.sha1 KAFKA-2.1.1-1.2.1.1.p0.18-el7.parcel.sha
    [root@hadoop001 kafka_parcels]# ll
    total 66536
    -rw-r--r-- 1 root root 68116503 Oct 13 17:05 KAFKA-2.1.1-1.2.1.1.p0.18-el7.parcel
    -rw-r--r-- 1 root root       41 Oct 13 17:04 KAFKA-2.1.1-1.2.1.1.p0.18-el7.parcel.sha
    -rw-r--r-- 1 root root     5252 Feb  6  2018 manifest.json
    

    如果安装过程中出了问题,在机器上[root@hadoop001 ~]# cd /var/log/目录下找到的都是role log(角色日志)很多情况下找不到问题出在哪里,这种情况下应该去看stdout,stderr日志,通过web界面找



    五、如何停止集群,启动集群
    [root@hadoop001 ~]# service cloudera-scm-agent stop
    Stopping cloudera-scm-agent:                               [  OK  ]
    

    //上面的命令要在三台机器都执行,下面的只在hadoop001执行

    [root@hadoop001 ~]# service cloudera-scm-server stop
    Stopping cloudera-scm-server:                              [  OK  ]
    [root@hadoop001 ~]# service mysql stop
    Shutting down MySQL..                                      [  OK  ]
    

    (启动时反过来就行了,注意mysql是部署在mysqladmin用户下,service mysql start要在mysqladmin用户)

    [root@hadoop001 ~]# su - mysqladmin
    [mysqladmin@hadoop001 ~]$ service mysql start
    Starting MySQL                                             [  OK  ]
    [root@hadoop001 ~]# service cloudera-scm-server start
    //上面的命令只在hadoop001执行,下面的要在三台机器都执行
    [root@hadoop001 ~]# service cloudera-scm-agent start
    Starting cloudera-scm-agent:                               [  OK  ]
    

    相关文章

      网友评论

          本文标题:CentOS 7 搭建CDH5.12.0及Kafka安装

          本文链接:https://www.haomeiwen.com/subject/aibscqtx.html