美文网首页
Hive的安装与配置

Hive的安装与配置

作者: NikolasNull | 来源:发表于2019-08-22 14:33 被阅读0次

    一、安装与配置hive

    安装hive之前需要到官网查询hive与Hadoop版本的兼容性。这里我所选的hive和Hadoop的版本如下:

    • hadoop-2.7.4
    • hive-2.3.5

    1. 编辑hive的环境变量

    2. 编辑 hive-env.sh 文件

    # The heap size of the jvm stared by hive shell script can be controlled via:
    #
    export HADOOP_HEAPSIZE=2048
    #
    # Larger heap size may be required when running queries over large number of files or partitions. 
    # By default hive shell scripts use a heap size of 256 (MB).  Larger heap size would also be 
    # appropriate for hive server.
    
    # export JAVA_HOME=/usr/local/java
    
    # Set HADOOP_HOME to point to a specific hadoop install directory
    HADOOP_HOM=/home/hadoop/hadoop-2.7.4/
    
    # Hive Configuration Directory can be controlled by:
    export HIVE_CONF_DIR=/home/hadoop/hive-235/conf
    
    # Folder containing extra libraries required for hive compilation/execution can be controlled by:
    # export HIVE_AUX_JARS_PATH=
    export HIVE_AUX_JARS_PATH=/home/hadoop/hive-235/lib
    

    3. 编辑hive-site.xml文件

    <configuration>
        <property>
            <name>hive.metastore.schema.verification</name>
            <value>false</value>
            <description>
                强制metastore的schema一致性,开启的话会校验在metastore中存储的信息的版本和hive的jar包中的版本一致性,并且关闭自动schema迁移,用户必须手动的升级hive并且迁移schema,关闭的话只会在版本不一致时给出警告,默认是false不开启;
            </description>
        </property>
    
        <property>
            <name>datanucleus.schema.autoCreateAll</name>
            <value>true</value>
        </property>
        <property>
            <name>hive.auto.convert.join</name>
            <value>false</value>
            <description>根据输入文件的大小决定是否将普通join转换为mapjoin的一种优化,默认不开启false;</description>
        </property>
    
        <property>
            <name>hive.server2.enable.impersonation</name>
            <description>Enable user impersonation for HiveServer2</description>
            <value>true</value>
        </property>
    
        <property>
            <name>hive.server2.thrift.port</name>
            <value>10000</value>
        </property>
        <property>
            <name>hive.server2.thrift.bind.host</name>
            <value>node1</value>
        </property>
    
        <property>
          <name>hive.exec.scratchdir</name>
          <value>/user/hive/tmp</value>
          <description>hive用来存储不同阶段的map/reduce的执行计划的目录,同时也存储中间输出结果,默认是/tmp/<user.name>/hive,我们实际一般会按组区分,然后组内自建一个tmp目录存储;</description>
        </property>
    
        <property>
          <name>hive.querylog.location</name>
          <value>/user/hive/log/hadoop</value>
          <description>Location of Hive run time structured log file</description>
        </property>
        
        <property>
            <name>hive.metastore.warehouse.dir</name>
            <value>/user/hive/warehouse</value>
            <description>hive元数据目录</description>
        </property>
        
        <property>
            <name>javax.jdo.option.ConnectionURL</name>
            <value>jdbc:mysql://node6:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>
            <description>hive MySQL元数据库连接URL</description>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionDriverName</name>
            <value>com.mysql.jdbc.Driver</value>
            <description>MySQL驱动程序</description>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionUserName</name>
            <value>leo</value>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionPassword</name>
            <value>Yyf5211314!</value>
        </property>
        
        <property>
            <name>hive.support.concurrency</name>
            <value>false</value>
            <description>hive是否支持并发,默认是false,支持读写锁的话,必须要起zookeeper;</description>
        </property>
        
        <property>
            <name>hive.enforce.bucketing</name>
            <value>false</value>
            <description>数据分桶是否被强制执行,默认false,如果开启,则写入table数据时会启动分桶.</description>
        </property>
        <property>
            <name>hive.exec.dynamic.partition.mode</name>
            <value>nonstrict</value>
            <description>默认strict,在strict模式下,动态分区的使用必须在一个静态分区确认的情况下,其他分区可以是动态;</description>
        </property>
        <property>
            <name>hive.txn.manager</name>
            <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
        </property>
        <property>
            <name>hive.compactor.initiator.on</name>
            <value>true</value>
        </property>
        <property>
            <name>hive.compactor.worker.threads</name>
            <value>10</value>
        </property>
    
        <property>
            <name>hive.exec.max.dynamic.partitions</name>
            <value>100000</value>
            <description>动态分区的上限,默认1000;</description>
        </property>
        <property>
            <name>hive.exec.max.dynamic.partitions.pernode</name>
            <value>100000</value>
            <description>每个mapper/reducer节点可以创建的最大动态分区数,默认100;</description>
        </property>
    
        <property>
            <name>hive.exec.parallel.thread.number</name>
            <value>8</value>
            <description>就是控制对于同一个sql来说同时可以运行的job的最大值,该参数默认为8.此时最大可以同时运行8个job.</description>
        </property>
    </configuration>
    
    • 创建hive的临时目录
    mkdir -p /home/hadoop/hive-data/tmp/
    

    并在hive-site.xml中修改:

    把{system:java.io.tmpdir} 改成/home/hadoop/hive-data/tmp/

    把 {system:user.name} 改成 {user.name}

    • 创建hive的hdfs目录
    # 元数据目录
    hadoop fs -mkdir -p /user/hive/warehouse
    
    # 临时目录
    hadoop fs -mkdir -p /user/hive/tmp
    
    # 创建查询日志目录
    hadoop fs -mkdir -p /user/hive/log
    
    hadoop fs -chmod -R 777 /user/hive/warehouse  
    hadoop fs -chmod -R 777 /user/hive/tmp  
    hadoop fs -chmod -R 777 /user/hive/log
    

    4. 下载或复制mysql的驱动包到hive的lib目录中

    cd  /home/hadoop/hive-235/lib/
    wget http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.38/mysql-connector-java-5.1.38.jar
    

    4. 初始化MySQL的hive数据库

    cd  /home/hadoop/hive-235/bin/
    ./schematool -initSchema -dbType mysql
    
    mysql

    5. 简单测试,进入hive shell

    hive> create database test;
    OK
    Time taken: 0.175 seconds
    hive> create table test_tab (name string,age int);
    OK
    Time taken: 0.82 seconds
    hive> insert into test_tab values('yyf',23);
    hive> select * from test_tab;
    OK
    yyf 23
    Time taken: 0.167 seconds, Fetched: 1 row(s)
    

    小节

    以上简单记录了hive的安装与配置过程,更多详细的配置还请参考官网,上述记录有错误之处,还请大家指正。

    相关文章

      网友评论

          本文标题:Hive的安装与配置

          本文链接:https://www.haomeiwen.com/subject/pfpnsctx.html