一、安装
1.1、安装Mysql
我https://www.jianshu.com/p/d6a275ea554d这里有介绍
进入mysql,增加hive/hive用户:
mysql> CREATE DATABASE IF NOT EXISTS hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
Query OK, 1 row affected (0.00 sec)
mysql> grant all on hive.* to 'hive'@'%' identified by 'hive';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> exit;
1.2、安装HIVE
1 解压缩
yay@yay-ThinkPad-T470-W10DG:~/下载$ tar -zxvf apache-hive-2.3.7-bin.tar.gz -C ~/software
yay@yay-ThinkPad-T470-W10DG:~/下载$ cd ../software
yay@yay-ThinkPad-T470-W10DG:~/software$ mv apache-hive-2.3.7-bin hive237
编辑/etc/profile
#hive
export HIVE_HOME=/home/yay/software/hive237
export PATH=$HIVE_HOME/bin:$PATH
export CLASSPATH=$CLASSPATH:/home/yay/software/hive237/lib/*:.
export HADOOP_USER_CLASSPATH_FIRST=true
//下面这行很重要,否则在hive中易出现FAILED: SemanticException Cannot find class 'org.elasticsearch.hadoop.hive.EsStorageHandler'这个错误
export HIVE_AUX_JARS_PATH=/home/yay/software/elasticsearch-hadoop-2.1.1/dist/elasticsearch-hadoop-2.1.1.jar
2 配置
yay@yay-ThinkPad-T470-W10DG:~/software$ cd hive237/conf
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ cp hive-env.sh.template hive-env.sh
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ cp hive-default.xml.template hive-site.xml
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ cp hive-log4j2.properties.template hive-log4j2.properties
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
启动hdfs后配置下面内容(start-dfs.sh start-yarn.sh)
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -mkdir -p /usr/hive/warehouse
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -mkdir -p /usr/hive/tmp
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -mkdir -p /usr/hive/log
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -chmod g+w /usr/hive/warehouse
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -chmod g+w /usr/hive/tmp
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -chmod g+w /usr/hive/log
编辑conf/hive-env.sh:
export JAVA_HOME=/home/yay/software/java/jdk1.8.0_191
export HIVE_HOME=/home/yay/software/hive237
export HADOOP_HOME=/home/yay/software/hadoop-2.6.0
export HIVE_CONF_DIR=/home/yay/software/hive237/conf
编辑/home/yay/software/hive237/conf/hive-site.xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--修改-->
<property>
<name>hive.exec.local.scratchdir</name>
<value>/home/yay/hivedata/local/hive</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/home/yay/hivedata/local/hive/hive-downloaded-addDir/</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<!--修改-->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<!--修改-->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>
<!--修改-->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<!--修改-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<!--修改-->
<property>
<name>hive.querylog.location</name>
<value>/home/yay/hivedata/local/hive/querylog-location-addDir/</value>
<description>Location of Hive run time structured log file</description>
</property>
<!--修改-->
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/home/yay/hivedata/local/hive/hive-logging-operation-log-addDir</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
</configuration>
下载https://mirrors.tuna.tsinghua.edu.cn/mysql/downloads/Connector-J/mysql-connector-java-5.1.49.tar.gz ,解压后把里面的mysql-connector-java-5.1.49-bin.jar拷贝到/home/yay/software/hive237/lib
数据库初始化:
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/bin$ ./schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/yay/software/hive237/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/yay/software/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://localhost:3306/hive
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: hive
Sat Jul 25 22:54:06 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Sat Jul 25 22:54:06 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Initialization script completed
Sat Jul 25 22:55:02 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
schemaTool completed
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/bin$
二、和Elasticsearch的互操作
2.1 从HDFS上的CSV向ElasticSearch导入数据
为CSV文件创建外部表
hive> create external table source (id STRING, caseNumber STRING, eventDate DATE,block STRING,iucr STRING, primaryType STRING,description STRING,location STRING,arrest BOOLEAN,domestic BOOLEAN,lat DOUBLE,lon DOUBLE) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerds' stored as TEXTFILE location '/ch07';
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.serde2.OpenCSVSerds
hive> create external table source (id STRING, caseNumber STRING, eventDate DATE,block STRING,iucr STRING, primaryType STRING,description STRING,location STRING,arrest BOOLEAN,domestic BOOLEAN,lat DOUBLE,lon DOUBLE) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' stored as TEXTFILE location '/ch07';
OK
Time taken: 1.676 seconds
hive>
再创建一个外部目标表:
hive> create external table crimes (id STRING, caseNumber STRING, eventDate DATE,block STRING,iucr STRING, primaryType STRING,description STRING,location STRING,arrest BOOLEAN,domestic BOOLEAN,geoLocation STRUCT<lat:DOUBLE, lon:DOUBLE>) stored BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='esh_hive/crimes')
把source获取的数据插入crimes表:
hive> insert overwrite table crimes select s.id , s.caseNumber , s.eventDate ,s.block ,s.iucr ,s.primaryType ,s.description ,s.location ,s.arrest ,s.domestic,named_struct('lat',cast(a.lat as double),"lon",cast(a.lon as double)) from source s;
FAILED: SemanticException [Error 10004]: Line 1:177 Invalid table alias or column reference 'a': (possible column names are: id, casenumber, eventdate, block, iucr, primarytype, description, location, arrest, domestic, lat, lon)
hive>
图片.png 图片.png注意:如果上面的insert语句执行时候出现如下错误提示:
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://localhost:9000/home/yay/software/elasticsearch-hadoop-2.1.1/dist/elasticsearch-hadoop-2.1.1.jar)'
我实验了不少方法一直不知道引发原因,只好创建了这样一个目录:yay@yay-ThinkPad-T470-W10DG:~$ hdfs dfs -mkdir -p /home/yay/software/elasticsearch-hadoop-2.1.1/dist/ yay@yay-ThinkPad-T470-W10DG:~$ hdfs dfs -put /home/yay/software/elasticsearch-hadoop-2.1.1/dist/elasticsearch-hadoop-2.1.1.jar /home/yay/software/elasticsearch-hadoop-2.1.1/dist yay@yay-ThinkPad-T470-W10DG:~$
然后执行成功
网友评论