Hive

作者: 逸章 | 来源:发表于2020-07-25 15:22 被阅读0次

一、安装

1.1、安装Mysql

https://www.jianshu.com/p/d6a275ea554d这里有介绍
进入mysql,增加hive/hive用户:

mysql> CREATE DATABASE IF NOT EXISTS hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
Query OK, 1 row affected (0.00 sec)

mysql> grant all on hive.* to 'hive'@'%' identified by 'hive';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> exit;

1.2、安装HIVE

1 解压缩

yay@yay-ThinkPad-T470-W10DG:~/下载$ tar -zxvf apache-hive-2.3.7-bin.tar.gz -C ~/software

yay@yay-ThinkPad-T470-W10DG:~/下载$ cd ../software
yay@yay-ThinkPad-T470-W10DG:~/software$ mv apache-hive-2.3.7-bin hive237

编辑/etc/profile

#hive
export HIVE_HOME=/home/yay/software/hive237
export PATH=$HIVE_HOME/bin:$PATH

export CLASSPATH=$CLASSPATH:/home/yay/software/hive237/lib/*:.
export HADOOP_USER_CLASSPATH_FIRST=true

//下面这行很重要,否则在hive中易出现FAILED: SemanticException Cannot find class 'org.elasticsearch.hadoop.hive.EsStorageHandler'这个错误
export HIVE_AUX_JARS_PATH=/home/yay/software/elasticsearch-hadoop-2.1.1/dist/elasticsearch-hadoop-2.1.1.jar

2 配置

yay@yay-ThinkPad-T470-W10DG:~/software$ cd hive237/conf
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ cp hive-env.sh.template hive-env.sh
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ cp hive-default.xml.template hive-site.xml
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ cp hive-log4j2.properties.template hive-log4j2.properties
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties

启动hdfs后配置下面内容(start-dfs.sh start-yarn.sh)

yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -mkdir -p /usr/hive/warehouse
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -mkdir -p /usr/hive/tmp
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -mkdir -p /usr/hive/log
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -chmod g+w /usr/hive/warehouse
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -chmod g+w /usr/hive/tmp
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -chmod g+w /usr/hive/log

编辑conf/hive-env.sh:

export JAVA_HOME=/home/yay/software/java/jdk1.8.0_191
export HIVE_HOME=/home/yay/software/hive237
export HADOOP_HOME=/home/yay/software/hadoop-2.6.0
export HIVE_CONF_DIR=/home/yay/software/hive237/conf

编辑/home/yay/software/hive237/conf/hive-site.xml:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>  
  <!--修改-->
  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/home/yay/hivedata/local/hive</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/home/yay/hivedata/local/hive/hive-downloaded-addDir/</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>

   <!--修改-->
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>Username to use against metastore database</description>
  </property>   

  <!--修改-->
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
    <description>password to use against metastore database</description>
  </property>

    <!--修改-->
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://localhost:3306/hive</value>
        <description>JDBC connect string for a JDBC metastore</description>
    </property>

  <!--修改-->
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>
    </property>  
  

  <!--修改-->
  <property>
    <name>hive.querylog.location</name>
    <value>/home/yay/hivedata/local/hive/querylog-location-addDir/</value>
    <description>Location of Hive run time structured log file</description>
  </property>    

  <!--修改-->
  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/home/yay/hivedata/local/hive/hive-logging-operation-log-addDir</value>
    <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
  </property>  
 
</configuration>

下载https://mirrors.tuna.tsinghua.edu.cn/mysql/downloads/Connector-J/mysql-connector-java-5.1.49.tar.gz ,解压后把里面的mysql-connector-java-5.1.49-bin.jar拷贝到/home/yay/software/hive237/lib

数据库初始化:

yay@yay-ThinkPad-T470-W10DG:~/software/hive237/bin$ ./schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/yay/software/hive237/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/yay/software/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:    jdbc:mysql://localhost:3306/hive
Metastore Connection Driver :    com.mysql.jdbc.Driver
Metastore connection User:   hive
Sat Jul 25 22:54:06 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Sat Jul 25 22:54:06 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Initialization script completed
Sat Jul 25 22:55:02 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
schemaTool completed
yay@yay-ThinkPad-T470-W10DG:~/software/hive237/bin$ 

二、和Elasticsearch的互操作

2.1 从HDFS上的CSV向ElasticSearch导入数据

为CSV文件创建外部表

hive> create external table source (id STRING, caseNumber STRING, eventDate  DATE,block STRING,iucr STRING,  primaryType STRING,description STRING,location STRING,arrest BOOLEAN,domestic BOOLEAN,lat DOUBLE,lon DOUBLE)  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerds' stored as TEXTFILE location '/ch07';
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.serde2.OpenCSVSerds
hive> create external table source (id STRING, caseNumber STRING, eventDate  DATE,block STRING,iucr STRING,  primaryType STRING,description STRING,location STRING,arrest BOOLEAN,domestic BOOLEAN,lat DOUBLE,lon DOUBLE)  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' stored as TEXTFILE location '/ch07';
OK
Time taken: 1.676 seconds
hive> 

再创建一个外部目标表:

hive>  create external table crimes (id STRING, caseNumber STRING, eventDate  DATE,block STRING,iucr STRING,      primaryType STRING,description STRING,location STRING,arrest BOOLEAN,domestic BOOLEAN,geoLocation STRUCT<lat:DOUBLE, lon:DOUBLE>) stored BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='esh_hive/crimes')

把source获取的数据插入crimes表:

hive> insert overwrite table crimes select s.id , s.caseNumber , s.eventDate  ,s.block ,s.iucr ,s.primaryType ,s.description ,s.location ,s.arrest ,s.domestic,named_struct('lat',cast(a.lat as double),"lon",cast(a.lon as double)) from source s;
FAILED: SemanticException [Error 10004]: Line 1:177 Invalid table alias or column reference 'a': (possible column names are: id, casenumber, eventdate, block, iucr, primarytype, description, location, arrest, domestic, lat, lon)
hive> 

注意:如果上面的insert语句执行时候出现如下错误提示:Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://localhost:9000/home/yay/software/elasticsearch-hadoop-2.1.1/dist/elasticsearch-hadoop-2.1.1.jar)'
我实验了不少方法一直不知道引发原因,只好创建了这样一个目录:

yay@yay-ThinkPad-T470-W10DG:~$ hdfs dfs -mkdir -p /home/yay/software/elasticsearch-hadoop-2.1.1/dist/
yay@yay-ThinkPad-T470-W10DG:~$ hdfs dfs -put /home/yay/software/elasticsearch-hadoop-2.1.1/dist/elasticsearch-hadoop-2.1.1.jar /home/yay/software/elasticsearch-hadoop-2.1.1/dist
yay@yay-ThinkPad-T470-W10DG:~$ 

然后执行成功

图片.png 图片.png

相关文章

  • 数据仓库Hive

    Hive产生背景 Hive概述 HIve体系架构 Hive部署架构 Hive和RDBMS区别 Hive部署以及快速...

  • 数据查询-Hive基础

    outline 什么是Hive 为什么需要Hive Hive的架构 Hive的常用操作 什么是Hive Hive由...

  • 大数据知识 | hive初识

    hive简介 hive架构 hive是什么 官网这样说:https://hive.apache.org/ hive...

  • Hive | Hive 安装详解

    一、Hive 介绍 二、准备工作 三、Hive下载 四、Hive 安装 五、Hive 启动 一、Hive 介绍 H...

  • Hive日常使用

    hive 创建表: hive 执行: =========================hive 调用Python...

  • Hive常用的几种交互操作

    查看hive下的交互命令方式 -help(hive 外) 命令:bin/hive -helpusage: hive...

  • 【Hive】

    Hive的安装 Hive官网地址 http://hive.apache.org/[http://hive.apac...

  • Hive进阶

    hive配置,命令 hive查询显示列名 hive默认分隔符 \001 hive命令行中查看当前hive环境变量 ...

  • Hive 入门

    Hive官网 Hive概述 Hive 的底层执行引擎有 :MapReduce,Tez,Spark- Hive on...

  • 大数据开发之Hive优化篇2-Hive的explain命令

    备注:Hive 版本 2.1.1 一.Hive explain命令概述 Hive的explain命令用来看Hive...

网友评论

      本文标题:Hive

      本文链接:https://www.haomeiwen.com/subject/jkxmlktx.html