美文网首页
centos系统下hadoop、hive的安装配置

centos系统下hadoop、hive的安装配置

作者: 雨中的单车 | 来源:发表于2021-08-31 14:27 被阅读0次

    1、下载hadoop
    https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/

    最新版本hadoop-3.3.1
    2、安装jdk
    rpm -ivh jdk-8u261-linux-x64.rpm
    3、解压hadoop文件
    tar zxvf hadoop-3.3.1.tar.gz
    4、指定jdk地址
    修改hadoop-3.3.1/etc/hadoop/hadoop-env.sh文件
    到hadoop-3.3.1目录下,执行
    vi etc/hadoop/hadoop-env.sh
    

    增加 export JAVA_HOME=/usr/java/jdk1.8.0_261-amd64


    image.png

    5、伪分布式部署
    1)配置etc/hadoop/core-site.xml文件

    vi etc/hadoop/core-site.xml
    

    增加以下内容

        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:9000</value>
        </property>
    

    2)配置etc/hadoop/hdfs-site.xml文件

    vi etc/hadoop/hdfs-site.xml
    

    增加以下内容

        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
    

    3)配置ssh localhost无密码登录

     ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
     cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
     chmod 0600 ~/.ssh/authorized_keys
    

    4)格式化文件系统

    bin/hdfs namenode -format
    

    5)执行sbin/start-dfs.sh,报以下错误

    [root@iZ2zeb8tcng37z21t5bk9cZ hadoop-3.3.1]# sbin/start-dfs.sh
    Starting namenodes on [localhost]
    ERROR: Attempting to operate on hdfs namenode as root
    ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
    Starting datanodes
    ERROR: Attempting to operate on hdfs datanode as root
    ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
    Starting secondary namenodes [iZ2zeb8tcng37z21t5bk9cZ]
    ERROR: Attempting to operate on hdfs secondarynamenode as root
    ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
    

    需要在sbin目录下start-dfs.sh和stop-dfs.sh文件空白处增加

    HDFS_DATANODE_USER=root
    HADOOP_SECURE_DN_USER=hdfs
    HDFS_NAMENODE_USER=root
    HDFS_SECONDARYNAMENODE_USER=root
    

    6)在sbin/start-yarn.sh和sbin/stop-yarn.sh文件空白处增加

    YARN_RESOURCEMANAGER_USER=root
    HADOOP_SECURE_DN_USER=yarn
    YARN_NODEMANAGER_USER=root
    

    启动dfs和yarn

    sbin/start-dfs.sh
    sbin/start-yarn.sh
    

    7)生成执行MapReduce作业所需的HDFS目录

    bin/hdfs dfs -mkdir /user
    bin/hdfs dfs -mkdir /user/<username>
    
    image.png

    8)将输入文件复制到分布式文件系统

    bin/hdfs dfs -mkdir input
    bin/hdfs dfs -put etc/hadoop/*.xml input
    

    9)检查输出文件:将输出文件从分布式文件系统复制到本地文件系统并进行检查

    bin/hdfs dfs -get output output
     cat output/*
    

    查看分布式文件系统上的输出文件

    bin/hdfs dfs -cat output/*
    

    10)配置单节点YARN
    配置etc/hadoop/mapred-site.xml文件

    vi etc/hadoop/mapred-site.xml
    

    增加

    <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <property>
            <name>mapreduce.application.classpath</name>
            <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
        </property>
    

    配置etc/hadoop/yarn-site.xml文件

    vi etc/hadoop/yarn-site.xml
    

    增加

    <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.nodemanager.env-whitelist</name>
            <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
        </property>
    

    11)安装配置Hive
    解压文件

    tar zxvf apache-hive-2.3.9-bin.tar.gz
    

    修改目录

    mv apache-hive-2.3.9-bin/ hive-2.3.9
    

    12)进入hive-2.3.9/conf路径,重命名配置文件:

    mv hive-env.sh.template hive-env.sh
    

    13)修改hive-env.sh文件

    vi hive-env.sh
    

    增加以下内容

    # Set HADOOP_HOME to point to a specific hadoop install directory
    # 指定Hadoop安装路径
    HADOOP_HOME=/root/hadoop-3.3.1
    
    # Hive Configuration Directory can be controlled by:
    # 指定Hive配置文件夹
    export HIVE_CONF_DIR=/root/hive-2.3.9/conf
    

    14)修改环境变量

    vi /etc/profile
    

    增加以下内容

    export HIVE_HOME=/root/hive-2.3.9
    export PATH=$PATH:$HIVE_HOME/bin
    # Hadoop环境加入Hive依赖
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*
    

    声明环境变量:

    source /etc/profile
    

    15)安装mysql
    解压安装包

    tar xvf mysql-8.0.26-1.el8.x86_64.rpm-bundle.tar
    

    安装顺序

     rpm -ivh mysql-community-common-8.0.26-1.el8.x86_64.rpm
     rpm -ivh mysql-community-client-plugins-8.0.26-1.el8.x86_64.rpm
     rpm -ivh mysql-community-libs-8.0.26-1.el8.x86_64.rpm
     rpm -ivh mysql-community-client-8.0.26-1.el8.x86_64.rpm
     rpm -ivh mysql-community-server-8.0.26-1.el8.x86_64.rpm
    

    16)初始化数据库

    mysqld --initialize
    

    17)查看配置文件、赋权
    cat /etc/my.cnf
    这里面有一行是 datadir=/var/lib/mysql 表示数据文件存放地址,这个文件夹要给mysql用户赋权,不然是无法启动数据库的。执行

    chown mysql:mysql /var/lib/mysql -R
    

    命令进行赋权。赋权必须在数据库初始化后进行(前提是你用非mysql用户安装mysql,而不是mysql用户),不然启动数据库会报错。
    18)启动数据库
    启动MySql

    systemctl start mysqld.service
    

    停止MySql

    systemctl stop mysqld.service
    

    重启MySql

    systemctl restart mysqld.service
    

    设置MySql开机自启

    systemctl enable mysqld
    

    19)查看修改root用户密码
    查看数据库的初始密码

    cat /var/log/mysqld.log | grep password
    

    修改初始密码

    mysqladmin -uroot -p'Ush&4PGR=0Vj' password Mysql123456
    

    如果初始密码中有特殊字符,如<、&等字符,可以在密码信息两边加上单引号。
    20)设置远程客户端登录mysql

    mysql -u root -p
    use mysql
    update user set host='%' where user = 'root';
    select host,user from user;
    

    设置完成需要重启数据库

    systemctl restart mysqld.service
    

    21)在hive中上传mysql驱动包
    安装mysql驱动包

    rpm -ivh mysql-connector-java-8.0.26-1.el8.noarch.rpm
    
    image.png

    说明需要java-headless
    执行以下命令进行安装

    yum install java-headless
    

    再次安装mysql驱动包

    rpm -ivh mysql-connector-java-8.0.26-1.el8.noarch.rpm
    

    查找mysql驱动包路径

    find / -name mysql-connector-java.jar
    
    image.png

    拷贝mysql-connector-java.jar到hive-2.3.9/lib目录下

    cp /usr/share/java/mysql-connector-java.jar /root/hive-2.3.9/lib
    

    22)在hive-2.3.9/conf路径创建配置文件hive-site.xml

    vi hive-site.xml
    

    增加以下内容

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
        <property>
          <name>javax.jdo.option.ConnectionURL</name>
          <value>jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true</value>
          <description>JDBC connect string for a JDBC metastore</description>
        </property>
    
        <property>
          <name>javax.jdo.option.ConnectionDriverName</name>
          <value>com.mysql.cj.jdbc.Driver</value>
          <description>Driver class name for a JDBC metastore</description>
        </property>
    
        <property>
          <name>javax.jdo.option.ConnectionUserName</name>
          <value>root</value>
          <description>username to use against metastore database</description>
        </property>
    
        <property>
          <name>javax.jdo.option.ConnectionPassword</name>
          <value>HadoopMysql123456</value>
          <description>password to use against metastore database</description>
        </property>
        
        # 查询表时显示表头信息
        <property>
          <name>hive.cli.print.header</name>
          <value>true</value>
        </property>
    
        # 显示当前所在的数据库
        <property>
          <name>hive.cli.print.current.db</name>
          <value>true</value>
        </property>
    </configuration>
    

    初始化metastore,起服务

    schematool -dbType mysql -initSchema
    hive --service metastore &
    

    23)修改hadoop 配置文件 etc/hadoop/core-site.xml,加入如下配置项

    vi etc/hadoop/core-site.xml
    

    增加以下内容

    <property>
        <name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.root.groups</name>
        <value>*</value>
    </property>
    

    需要重启stop-dfs.sh、stop-yarn.sh

    sbin/stop-dfs.sh
    sbin/stop-yarn.sh
    sbin/start-dfs.sh
    sbin/start-yarn.sh
    

    24)创建数据库、赋权
    登录hive,创建chinese_consul数据库

    hive
    create database if not exists chinese_consul;
    quit;
    

    赋权

    ./hadoop dfs -chmod -R 777 /user/hive/warehouse/chinese_consul.db
    

    通过前端客户端insert数据时报以下错误:

    org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=EXECUTE, inode="/tmp/hadoop-yarn":root:supergroup:drwx------
    

    是权限不足导致
    执行以下语句进行赋权

    hadoop fs -chown hadoop:hadoop /tmp/hadoop-yarn
    hadoop fs -chmod -R 777 /tmp/hadoop-yarn
    

    存在的问题
    Exiting with status 1: java.io.IOException: NameNode is not formatted.
    此时9000端口是没有打开
    解决方法:重新格式化文件系统
    bin/hdfs namenode -format
    如果想修改resourcemanager前端管理页面地址http://localhost:8088
    修改hadoop-3.3.1/etc/hadoop/yarn-site.xml文件
    增加

        <property>
            <name>yarn.resourcemanager.webapp.address</name>
            <value>${yarn.resourcemanager.hostname}:8088</value>
        </property>
    

    hive建表,注释内容乱码解决
    登录mysql数据库

    mysql -u root -p
    

    切换到metastore库。

    use metastore;
    

    执行一下操作。

    alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
    alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
    alter table PARTITION_PARAMS  modify column PARAM_VALUE varchar(4000) character set utf8;
    alter table PARTITION_KEYS  modify column PKEY_COMMENT varchar(4000) character set utf8;
    alter table  INDEX_PARAMS  modify column PARAM_VALUE  varchar(4000) character set utf8;
    

    重新建表即可。
    修改hadoop的ResourceManager的默认端口,默认端口为8088,但是这个地址很容易被挖矿程序攻击。
    修改yarn-site.xml文件

    vi etc/hadoop/yarn-site.xml
    

    增加以下内容

        <property>
            <name>yarn.resourcemanager.webapp.address</name>
            <value>${yarn.resourcemanager.hostname}:8888</value>
        </property>
    

    8888为新的端口
    重启
    sbin/stop-dfs.sh
    sbin/stop-yarn.sh
    sbin/start-dfs.sh
    sbin/start-yarn.sh
    参考文档:
    https://hadoop.apache.org/docs/r3.3.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

    相关文章

      网友评论

          本文标题:centos系统下hadoop、hive的安装配置

          本文链接:https://www.haomeiwen.com/subject/bpjziltx.html