美文网首页个人学习
30 分钟手工搭建 HDFS + Hive 环境

30 分钟手工搭建 HDFS + Hive 环境

作者: taojy123 | 来源:发表于2019-11-16 22:33 被阅读0次

    0 下载软件

    1 基础镜像构建

    下载好上述文件后将文件统一放置在一个目录中,并创建 Dockerfile

    FROM centos:7
    
    RUN curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.163.com/.help/CentOS7-Base-163.repo
    RUN yum clean all && yum makecache
    
    # 安装openssh-server和sudo软件包,并且将sshd的UsePAM参数设置成no
    RUN yum install -y openssh-server sudo
    RUN sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config
    #安装openssh-clients
    RUN yum  install -y openssh-clients
    
    RUN yum  install -y vim net-tools which
    
    # 添加测试用户root,密码root,并且将此用户添加到sudoers里
    RUN echo "root:root" | chpasswd
    RUN echo "root   ALL=(ALL)       ALL" >> /etc/sudoers
    # 下面这两句比较特殊,在centos6上必须要有,否则创建出来的容器sshd不能登录
    RUN ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key
    RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
    
    WORKDIR /usr/local/
    
    COPY jdk-8u161-linux-x64.tar.gz /usr/local/
    RUN tar -zxf jdk-8u161-linux-x64.tar.gz
    RUN mv jdk1.8.0_161 jdk1.8
    ENV JAVA_HOME /usr/local/jdk1.8
    ENV PATH $JAVA_HOME/bin:$PATH
    
    COPY hadoop-3.1.3.tar.gz /usr/local/
    RUN tar -zxf hadoop-3.1.3.tar.gz
    RUN mv hadoop-3.1.3 hadoop
    ENV HADOOP_HOME /usr/local/hadoop
    ENV PATH $HADOOP_HOME/bin:$PATH
    
    COPY apache-hive-3.1.2-bin.tar.gz /usr/local/
    RUN tar -zxf apache-hive-3.1.2-bin.tar.gz
    RUN mv apache-hive-3.1.2-bin hive
    ENV HIVE_HOME /usr/local/hive
    ENV PATH $HIVE_HOME/bin:$PATH
    RUN mkdir -p /home/hadoop/hive/tmp
    
    COPY mysql-connector-java-8.0.18.tar.gz /usr/local/
    RUN tar -zxf mysql-connector-java-8.0.18.tar.gz
    RUN mv mysql-connector-java-8.0.18/mysql-connector-java-8.0.18.jar $HIVE_HOME/lib
    
    WORKDIR /usr/local/hadoop
    
    # 启动sshd服务并且暴露22端口
    RUN mkdir /var/run/sshd
    EXPOSE 22
    CMD ["/usr/sbin/sshd", "-D"]
    

    确认当前目录下有以下文件

    • Dockerfile
    • jdk-8u161-linux-x64.tar.gz
    • hadoop-3.1.3.tar.gz
    • apache-hive-3.1.2-bin.tar.gz
    • mysql-connector-java-8.0.18.tar.gz

    然后构建 docker 镜像, 取名为 centos-hadoop

    docker build -t=centos-hadoop .
    

    2 搭建 HDFS 环境

    创建 docker 网络

    docker network create --subnet=172.20.10.0/24 hadoop
    

    创建三个节点容器

    docker run --name hadoop0 --hostname hadoop0 --net hadoop --ip 172.20.10.100 -d -P -p 50070:50070 -p 8088:8088 -p 9083:9083 -p 10000:10000 -p 8888:8888 centos-hadoop
    
    docker run --name hadoop1 --hostname hadoop1 --net hadoop --ip 172.20.10.101 -d -P centos-hadoop
    
    docker run --name hadoop2 --hostname hadoop2 --net hadoop --ip 172.20.10.102 -d -P centos-hadoop
    

    设置ssh免密码登录

    docker exec -it hadoop0 bash
    cd  ~
    mkdir .ssh
    cd .ssh
    ssh-keygen -t rsa
    (一直按回车即可)
    ssh-copy-id -i localhost
    ssh-copy-id -i hadoop0
    ssh-copy-id -i hadoop1
    ssh-copy-id -i hadoop2
    (密码都是 root)
    exit
    
    docker exec -it hadoop1 bash
    cd  ~
    mkdir .ssh
    cd .ssh
    ssh-keygen -t rsa
    ssh-copy-id -i localhost
    ssh-copy-id -i hadoop0
    ssh-copy-id -i hadoop1
    ssh-copy-id -i hadoop2
    exit
    
    docker exec -it hadoop2 bash
    cd  ~
    mkdir .ssh
    cd .ssh
    ssh-keygen -t rsa
    ssh-copy-id -i localhost
    ssh-copy-id -i hadoop0
    ssh-copy-id -i hadoop1
    ssh-copy-id -i hadoop2
    exit
    

    在hadoop0上修改hadoop的配置文件:

    exec -it hadoop0 bash
    cd /usr/local/hadoop/etc/hadoop
    

    hadoop-env.sh 中添加

    export JAVA_HOME=/usr/local/jdk1.8
    

    core-site.xml 中添加

    <configuration>
            <property>
                    <name>fs.defaultFS</name>
                    <value>hdfs://hadoop0:9000</value>
            </property>
            <property>
                    <name>hadoop.tmp.dir</name>
                    <value>/usr/local/hadoop/tmp</value>
            </property>
             <property>
                     <name>fs.trash.interval</name>
                     <value>1440</value>
            </property>
    </configuration>
    

    hdfs-site.xml 中添加

    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        <property>
            <name>dfs.permissions</name>
            <value>false</value>
        </property>
        <property>
            <name>dfs.namenode.http-address</name>
            <value>0.0.0.0:50070</value>
        </property>
    </configuration>
    

    yarn-site.xml 中添加

    <configuration>
            <property>
                    <name>yarn.nodemanager.aux-services</name>
                    <value>mapreduce_shuffle</value>
            </property>
            <property> 
                    <name>yarn.log-aggregation-enable</name> 
                    <value>true</value> 
            </property>
    </configuration>
    

    mapred-site.xml 中添加

    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    <property>
      <name>yarn.app.mapreduce.am.env</name>
      <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
    </property>
    <property>
      <name>mapreduce.map.env</name>
      <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
    </property>
    <property>
      <name>mapreduce.reduce.env</name>
      <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
    </property>
    </configuration>
    

    修改启动和停止的脚本文件:

    cd /usr/local/hadoop/sbin
    

    start-dfs.sh stop-dfs.sh 行首空白处添加

    HDFS_DATANODE_USER=root
    HADOOP_SECURE_DN_USER=hdfs
    HDFS_NAMENODE_USER=root
    HDFS_SECONDARYNAMENODE_USER=root
    

    start-yarn.sh stop-yarn.sh 行首空白处添加

    YARN_RESOURCEMANAGER_USER=root
    HADOOP_SECURE_DN_USER=yarn
    YARN_NODEMANAGER_USER=root
    

    确认 hdfs 命令可用

    which hdfs
    

    格式化

    hdfs namenode -format
    

    先尝试启动伪分布 hadoop (可跳过此步)

    cd /usr/local/hadoop
    sbin/start-dfs.sh
    sbin/start-yarn.sh
    

    验证 jps 类似如下

    $ jps
    1970 ResourceManager
    1330 NameNode
    2099 NodeManager
    1463 DataNode
    2440 Jps
    1678 SecondaryNameNode
    

    停止伪分布hadoop

    sbin/stop-dfs.sh
    sbin/stop-yarn.sh
    

    分布式配置

    etc/hadoop/yarn-site.xml 增加

    <property>
        <description>The hostname of the RM.</description>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop0</value>
    </property>
    

    etc/hadoop/workers 增加

    hadoop1
    hadoop2
    

    复制配置文件到其他节点

    scp  -rq /usr/local/hadoop   hadoop1:/usr/local
    scp  -rq /usr/local/hadoop   hadoop2:/usr/local
    

    启动hadoop分布式集群服务, 各节点均执行

    sbin/stop-dfs.sh
    sbin/stop-yarn.sh
    sbin/start-dfs.sh
    sbin/start-yarn.sh
    

    验证集群是否正常

    hadoop0上需要有这几个 jps 进程

    $ jps
    4643 Jps
    4073 NameNode
    4216 SecondaryNameNode
    4381 ResourceManager
    

    hadoop1 hadoop2上需要有这几个 jps 进程

    $ jps
    715 NodeManager
    849 Jps
    645 DataNode
    

    Web UI
    http://your.domain:50070
    http://your.domain:8088

    文件读写验证

    cat > a.txt << EOF
    a,1,12.4
    b,20,5.5
    EOF
    hdfs dfs -mkdir /test
    hdfs dfs -put a.txt /test/
    hdfs dfs -ls /test
    hdfs dfs -text /test/a.txt
    

    mapreduce 验证

    cat > b.txt << EOF
    hello world
    hello hadoop
    EOF
    hdfs dfs -put b.txt /
    cd /usr/local/hadoop/share/hadoop/mapreduce
    hadoop jar hadoop-mapreduce-examples-3.1.3.jar wordcount /b.txt /out
    hdfs dfs -text /out/part-r-00000
    

    以上就搭建好了 HDFS 分布式文件系统了!

    3 搭建 Hive 环境

    创建元数据库

    docker run --name mysql -v /var/lib/mysql:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=root -p 3306:3306 --net=hadoop -d mysql:5.7
    docker exec -it mysql bash
    mysql -u root -proot
    create database metastore default character set utf8mb4 collate utf8mb4_unicode_ci;
    

    在 hdfs 创建目录

    hdfs dfs -mkdir -p /user/hive/warehouse
    hdfs dfs -mkdir -p /user/hive/tmp
    hdfs dfs -mkdir -p /user/hive/log
    hdfs dfs -chmod -R 777 /user/hive/warehouse
    hdfs dfs -chmod -R 777 /user/hive/tmp
    hdfs dfs -chmod -R 777 /user/hive/log
    

    配置 hive:

    mkdir -p /home/hadoop/hive/tmp
    cd /usr/local/hive
    cd hive/conf
    cp hive-env.sh.template hive-env.sh
    cp hive-default.xml.template hive-site.xml
    cp hive-log4j2.properties.template hive-log4j2.properties
    cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
    

    hive-env.sh

    export JAVA_HOME=/usr/local/jdk1.8    ##Java路径
    export HADOOP_HOME=/usr/local/hadoop   ##Hadoop安装路径
    export HIVE_HOME=/usr/local/hive    ##Hive安装路径
    export HIVE_CONF_DIR=/hive/conf    ##Hive配置文件路径
    

    hive-site.xml

    <property>
        <name>hive.exec.scratchdir</name>
        <value>/user/hive/tmp</value>
    </property>
    
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>
    
    <property>
        <name>hive.querylog.location</name>
        <value>/user/hive/log</value>
    </property>
    
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://taojy123.com:3306/metastore?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>
      </property>
    
      <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
      </property>
    
      <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
      </property>
    
      <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>root</value>
      </property>
    
    ${system:java.io.tmpdir} 替换成 /home/hadoop/hive/tmp
    {system:user.name} 替换成 {user.name}
    
    :%s/${system:java.io.tmpdir}/\/home\/hadoop\/hive\/tmp/g
    :%s/{system:user.name}/{user.name}/g
    

    初始化 hive

    schematool -dbType mysql -initSchema
    

    可能会遇到两个报错:

    1. NoSuchMethodError … checkArgument
      解决方法
    cd /usr/local/hive/lib
    mv guava-19.0.jar guava-19.0.jar.bak
    cp /usr/local/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar ./
    
    1. WstxParsingExceptionIllegal character entity
      解决方法
    vim /usr/local/hive/conf/hive-site.xml
    删除 3215 行中的 &#8; 字符
    

    再次初始化 hive 成功

    开启 server

    nohup hive --service hiveserver2 &
    nohup hive --service metastore &
    

    尝试创建外部表

    $ hive
    create external table test
    (name string, num int, score float)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    STORED AS TEXTFILE
    location '/test';
    

    查看表数据

    select * from test;
    

    如正常看到2行表数据,说明 Hive 环境搭建成功!

    相关文章

      网友评论

        本文标题:30 分钟手工搭建 HDFS + Hive 环境

        本文链接:https://www.haomeiwen.com/subject/dpluictx.html