Hive

作者: 逸章 | 来源:发表于2020-07-25 15:22 被阅读0次

    一、安装

    1.1、安装Mysql

    https://www.jianshu.com/p/d6a275ea554d这里有介绍
    进入mysql,增加hive/hive用户:

    mysql> CREATE DATABASE IF NOT EXISTS hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
    Query OK, 1 row affected (0.00 sec)
    
    mysql> grant all on hive.* to 'hive'@'%' identified by 'hive';
    Query OK, 0 rows affected, 1 warning (0.00 sec)
    
    mysql> flush privileges;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> exit;
    

    1.2、安装HIVE

    1 解压缩

    yay@yay-ThinkPad-T470-W10DG:~/下载$ tar -zxvf apache-hive-2.3.7-bin.tar.gz -C ~/software
    
    yay@yay-ThinkPad-T470-W10DG:~/下载$ cd ../software
    yay@yay-ThinkPad-T470-W10DG:~/software$ mv apache-hive-2.3.7-bin hive237
    

    编辑/etc/profile

    #hive
    export HIVE_HOME=/home/yay/software/hive237
    export PATH=$HIVE_HOME/bin:$PATH
    
    export CLASSPATH=$CLASSPATH:/home/yay/software/hive237/lib/*:.
    export HADOOP_USER_CLASSPATH_FIRST=true
    
    //下面这行很重要,否则在hive中易出现FAILED: SemanticException Cannot find class 'org.elasticsearch.hadoop.hive.EsStorageHandler'这个错误
    export HIVE_AUX_JARS_PATH=/home/yay/software/elasticsearch-hadoop-2.1.1/dist/elasticsearch-hadoop-2.1.1.jar
    

    2 配置

    yay@yay-ThinkPad-T470-W10DG:~/software$ cd hive237/conf
    yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ cp hive-env.sh.template hive-env.sh
    yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ cp hive-default.xml.template hive-site.xml
    yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ cp hive-log4j2.properties.template hive-log4j2.properties
    yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
    

    启动hdfs后配置下面内容(start-dfs.sh start-yarn.sh)

    yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -mkdir -p /usr/hive/warehouse
    yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -mkdir -p /usr/hive/tmp
    yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -mkdir -p /usr/hive/log
    yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -chmod g+w /usr/hive/warehouse
    yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -chmod g+w /usr/hive/tmp
    yay@yay-ThinkPad-T470-W10DG:~/software/hive237/conf$ hdfs dfs -chmod g+w /usr/hive/log
    
    

    编辑conf/hive-env.sh:

    export JAVA_HOME=/home/yay/software/java/jdk1.8.0_191
    export HIVE_HOME=/home/yay/software/hive237
    export HADOOP_HOME=/home/yay/software/hadoop-2.6.0
    export HIVE_CONF_DIR=/home/yay/software/hive237/conf
    

    编辑/home/yay/software/hive237/conf/hive-site.xml:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>  
      <!--修改-->
      <property>
        <name>hive.exec.local.scratchdir</name>
        <value>/home/yay/hivedata/local/hive</value>
        <description>Local scratch space for Hive jobs</description>
      </property>
      <property>
        <name>hive.downloaded.resources.dir</name>
        <value>/home/yay/hivedata/local/hive/hive-downloaded-addDir/</value>
        <description>Temporary local directory for added resources in the remote file system.</description>
      </property>
    
       <!--修改-->
      <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
        <description>Username to use against metastore database</description>
      </property>   
    
      <!--修改-->
      <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hive</value>
        <description>password to use against metastore database</description>
      </property>
    
        <!--修改-->
        <property>
            <name>javax.jdo.option.ConnectionURL</name>
            <value>jdbc:mysql://localhost:3306/hive</value>
            <description>JDBC connect string for a JDBC metastore</description>
        </property>
    
      <!--修改-->
        <property>
            <name>javax.jdo.option.ConnectionDriverName</name>
            <value>com.mysql.jdbc.Driver</value>
            <description>Driver class name for a JDBC metastore</description>
        </property>  
      
    
      <!--修改-->
      <property>
        <name>hive.querylog.location</name>
        <value>/home/yay/hivedata/local/hive/querylog-location-addDir/</value>
        <description>Location of Hive run time structured log file</description>
      </property>    
    
      <!--修改-->
      <property>
        <name>hive.server2.logging.operation.log.location</name>
        <value>/home/yay/hivedata/local/hive/hive-logging-operation-log-addDir</value>
        <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
      </property>  
     
    </configuration>
    

    下载https://mirrors.tuna.tsinghua.edu.cn/mysql/downloads/Connector-J/mysql-connector-java-5.1.49.tar.gz ,解压后把里面的mysql-connector-java-5.1.49-bin.jar拷贝到/home/yay/software/hive237/lib

    数据库初始化:

    yay@yay-ThinkPad-T470-W10DG:~/software/hive237/bin$ ./schematool -initSchema -dbType mysql
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/yay/software/hive237/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/yay/software/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Metastore connection URL:    jdbc:mysql://localhost:3306/hive
    Metastore Connection Driver :    com.mysql.jdbc.Driver
    Metastore connection User:   hive
    Sat Jul 25 22:54:06 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
    Starting metastore schema initialization to 2.3.0
    Initialization script hive-schema-2.3.0.mysql.sql
    Sat Jul 25 22:54:06 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
    Initialization script completed
    Sat Jul 25 22:55:02 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
    schemaTool completed
    yay@yay-ThinkPad-T470-W10DG:~/software/hive237/bin$ 
    
    

    二、和Elasticsearch的互操作

    2.1 从HDFS上的CSV向ElasticSearch导入数据

    为CSV文件创建外部表

    hive> create external table source (id STRING, caseNumber STRING, eventDate  DATE,block STRING,iucr STRING,  primaryType STRING,description STRING,location STRING,arrest BOOLEAN,domestic BOOLEAN,lat DOUBLE,lon DOUBLE)  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerds' stored as TEXTFILE location '/ch07';
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.serde2.OpenCSVSerds
    hive> create external table source (id STRING, caseNumber STRING, eventDate  DATE,block STRING,iucr STRING,  primaryType STRING,description STRING,location STRING,arrest BOOLEAN,domestic BOOLEAN,lat DOUBLE,lon DOUBLE)  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' stored as TEXTFILE location '/ch07';
    OK
    Time taken: 1.676 seconds
    hive> 
    
    

    再创建一个外部目标表:

    hive>  create external table crimes (id STRING, caseNumber STRING, eventDate  DATE,block STRING,iucr STRING,      primaryType STRING,description STRING,location STRING,arrest BOOLEAN,domestic BOOLEAN,geoLocation STRUCT<lat:DOUBLE, lon:DOUBLE>) stored BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='esh_hive/crimes')
    

    把source获取的数据插入crimes表:

    hive> insert overwrite table crimes select s.id , s.caseNumber , s.eventDate  ,s.block ,s.iucr ,s.primaryType ,s.description ,s.location ,s.arrest ,s.domestic,named_struct('lat',cast(a.lat as double),"lon",cast(a.lon as double)) from source s;
    FAILED: SemanticException [Error 10004]: Line 1:177 Invalid table alias or column reference 'a': (possible column names are: id, casenumber, eventdate, block, iucr, primarytype, description, location, arrest, domestic, lat, lon)
    hive> 
    

    注意:如果上面的insert语句执行时候出现如下错误提示:Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://localhost:9000/home/yay/software/elasticsearch-hadoop-2.1.1/dist/elasticsearch-hadoop-2.1.1.jar)'
    我实验了不少方法一直不知道引发原因,只好创建了这样一个目录:

    yay@yay-ThinkPad-T470-W10DG:~$ hdfs dfs -mkdir -p /home/yay/software/elasticsearch-hadoop-2.1.1/dist/
    yay@yay-ThinkPad-T470-W10DG:~$ hdfs dfs -put /home/yay/software/elasticsearch-hadoop-2.1.1/dist/elasticsearch-hadoop-2.1.1.jar /home/yay/software/elasticsearch-hadoop-2.1.1/dist
    yay@yay-ThinkPad-T470-W10DG:~$ 
    

    然后执行成功

    图片.png 图片.png

    相关文章

      网友评论

          本文标题:Hive

          本文链接:https://www.haomeiwen.com/subject/jkxmlktx.html