一、安装与配置hive
安装hive之前需要到官网查询hive与Hadoop版本的兼容性。这里我所选的hive和Hadoop的版本如下:
- hadoop-2.7.4
- hive-2.3.5
1. 编辑hive的环境变量
2. 编辑 hive-env.sh 文件
# The heap size of the jvm stared by hive shell script can be controlled via:
#
export HADOOP_HEAPSIZE=2048
#
# Larger heap size may be required when running queries over large number of files or partitions.
# By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be
# appropriate for hive server.
# export JAVA_HOME=/usr/local/java
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOM=/home/hadoop/hadoop-2.7.4/
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/home/hadoop/hive-235/conf
# Folder containing extra libraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=
export HIVE_AUX_JARS_PATH=/home/hadoop/hive-235/lib
3. 编辑hive-site.xml文件
<configuration>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
强制metastore的schema一致性,开启的话会校验在metastore中存储的信息的版本和hive的jar包中的版本一致性,并且关闭自动schema迁移,用户必须手动的升级hive并且迁移schema,关闭的话只会在版本不一致时给出警告,默认是false不开启;
</description>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<property>
<name>hive.auto.convert.join</name>
<value>false</value>
<description>根据输入文件的大小决定是否将普通join转换为mapjoin的一种优化,默认不开启false;</description>
</property>
<property>
<name>hive.server2.enable.impersonation</name>
<description>Enable user impersonation for HiveServer2</description>
<value>true</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>node1</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/user/hive/tmp</value>
<description>hive用来存储不同阶段的map/reduce的执行计划的目录,同时也存储中间输出结果,默认是/tmp/<user.name>/hive,我们实际一般会按组区分,然后组内自建一个tmp目录存储;</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/user/hive/log/hadoop</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>hive元数据目录</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://node6:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value>
<description>hive MySQL元数据库连接URL</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL驱动程序</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>leo</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Yyf5211314!</value>
</property>
<property>
<name>hive.support.concurrency</name>
<value>false</value>
<description>hive是否支持并发,默认是false,支持读写锁的话,必须要起zookeeper;</description>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>false</value>
<description>数据分桶是否被强制执行,默认false,如果开启,则写入table数据时会启动分桶.</description>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
<description>默认strict,在strict模式下,动态分区的使用必须在一个静态分区确认的情况下,其他分区可以是动态;</description>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>10</value>
</property>
<property>
<name>hive.exec.max.dynamic.partitions</name>
<value>100000</value>
<description>动态分区的上限,默认1000;</description>
</property>
<property>
<name>hive.exec.max.dynamic.partitions.pernode</name>
<value>100000</value>
<description>每个mapper/reducer节点可以创建的最大动态分区数,默认100;</description>
</property>
<property>
<name>hive.exec.parallel.thread.number</name>
<value>8</value>
<description>就是控制对于同一个sql来说同时可以运行的job的最大值,该参数默认为8.此时最大可以同时运行8个job.</description>
</property>
</configuration>
- 创建hive的临时目录
mkdir -p /home/hadoop/hive-data/tmp/
并在hive-site.xml中修改:
把{system:java.io.tmpdir} 改成/home/hadoop/hive-data/tmp/
把 {system:user.name} 改成 {user.name}
- 创建hive的hdfs目录
# 元数据目录
hadoop fs -mkdir -p /user/hive/warehouse
# 临时目录
hadoop fs -mkdir -p /user/hive/tmp
# 创建查询日志目录
hadoop fs -mkdir -p /user/hive/log
hadoop fs -chmod -R 777 /user/hive/warehouse
hadoop fs -chmod -R 777 /user/hive/tmp
hadoop fs -chmod -R 777 /user/hive/log
4. 下载或复制mysql的驱动包到hive的lib目录中
cd /home/hadoop/hive-235/lib/
wget http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.38/mysql-connector-java-5.1.38.jar
4. 初始化MySQL的hive数据库
cd /home/hadoop/hive-235/bin/
./schematool -initSchema -dbType mysql

5. 简单测试,进入hive shell
hive> create database test;
OK
Time taken: 0.175 seconds
hive> create table test_tab (name string,age int);
OK
Time taken: 0.82 seconds
hive> insert into test_tab values('yyf',23);
hive> select * from test_tab;
OK
yyf 23
Time taken: 0.167 seconds, Fetched: 1 row(s)
小节
以上简单记录了hive的安装与配置过程,更多详细的配置还请参考官网,上述记录有错误之处,还请大家指正。
网友评论