Hive 远程模式

作者: 金刚_30bf | 来源:发表于2018-05-22 20:15 被阅读50次

    版本: 2.3.3

    1. 配置mysql数据库:
      <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://10.30.16.201:3306/hivemetaremote?createDatebaseIfNotExist=true</value>
      </property>
    
      <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
      </property>
      <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hiveuser</value>
      </property>
      <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hive123</value>
      </property>
    <property>
    <!-- 开启校验schema的版本 -->
     <property>
       <name>hive.metastore.schema.verification</name>
       <value>true</value>
     </property>
    
    1. 配置metastore thrift :
    <property>
     <name>hive.metastore.uris</name>
     <value>thrift://node203.hmbank.com:9083</value>
     <description>Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.</description>
    </property>
    
    

    3.开启允许并发执行

    <property>
      <name>hive.support.concurrency</name>
      <description>Enable Hive's Table Lock Manager Service</description>
      <value>true</value>
    </property>
    
    1. HiveServer2的配置
      <property>
        <name>hive.server2.authentication</name>
        <value>NONE</value>
      </property>
      <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>node203.hmbank.com</value>
        <description>Bind host on which to run the HiveServer2 Thrift service.</description>
      </property>
    
      <property>
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
        <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is ‘binary’.</description>
      </property>
    
      <property>
        <name>hive.server2.thrift.http.port</name>
        <value>10001</value>
        <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is ‘http’.</description>
      </property>
    
      <property>
        <name>hive.server2.thrift.client.user</name>
        <value>hadoop</value>
        <description>Username to use against thrift client</description>
      </property>
      <property>
        <name>hive.server2.thrift.client.password</name>
        <value>hadoop</value>
        <description>Password to use against thrift client</description>
      </property>
    
    
    1. 使用schematool 初始化metastore
    2. 启动:
    1. 先启动 metastore  
          hive --service metastore  
    2. 再启动hiveserver2 
        hiveserver2 
    
    1. 验证,使用beeline :
    -bash-4.1$ beeline 
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/lib/apacheori/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Beeline version 2.3.3 by Apache Hive
    
    beeline> !connect jdbc:hive2://localhost:10000
    Connecting to jdbc:hive2://localhost:10000
    Enter username for jdbc:hive2://localhost:10000: hive
    Enter password for jdbc:hive2://localhost:10000: 
    Connected to: Apache Hive (version 2.3.3)
    Driver: Hive JDBC (version 2.3.3)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    0: jdbc:hive2://localhost:10000> show databases;
    +----------------+
    | database_name  |
    +----------------+
    | default        |
    +----------------+
    1 row selected (1.334 seconds)
    
    

    可能出现的错误:

    beeline> !connect jdbc:hive2://localhost:10000
    Connecting to jdbc:hive2://localhost:10000
    Enter username for jdbc:hive2://localhost:10000: hadoop
    Enter password for jdbc:hive2://localhost:10000: ******
    18/05/22 14:33:21 [main]: WARN jdbc.HiveConnection: Failed to connect to localhost:10000
    Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000: Failed to open new session:
     java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
     User: root is not allowed to impersonate hadoop (state=08S01,code=0)
    
    

    错误原因: 由于以root用户启动hive服务, 当hive与hdfs交互时, 其使用的用户是root, 而connect时,输入的用户为hadoop , 对于hdfs来说,它不允许root来代表hadoop用户。
    看到这里, 如果我们在!connect时, 输入的用户为root是不是就可以了呢?实践发现也会报错:

    User: root is not allowed to impersonate root(state=08S01,code=0)
    

    无论我们以什么用户去connect 都会报上述错误。

    原因:hdfs系统不认root用户。
    修改hdfs配置文件 : core-site.xml

       <!-- 配置root代理用户 -->
       <property>
          <name>hadoop.proxyuser.root.groups</name>
          <value>*</value>
       </property>
    
       <property>
          <name>hadoop.proxyuser.root.hosts</name>
          <value>*</value>
       </property>
    
    

    上述配置即: 任何host上的用户提交的作业,都会被认为代理root用户进行执行。这样在hdfs系统中显示的用户是提交作业的用户。
    参考:hadoop的用户代理机制

    另外, 由于配置的认证方式为NONE , 所以输入用户后,不用输入密码即可连接成功。

    1. 使用beeline 创建表:
    create table test3(sid int , sname string);
    
    1. 使用beeline插入数据:
    0: jdbc:hive2://localhost:10000> insert into test3 values(1, 'xx');
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. 
    Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    
    

    提示为基于MapReduce的Hive在当前版本已废弃,可能不会成功。 建议使用spark 或tez , 或者使用Hive 1.x版本。

    (插入数据没能成功!)

    1. 使用hive CLI :
    hive> insert into test2 values( 1, 'xx', 3.01);
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = root_20180522200745_a5738c95-2cc0-47e2-b3b1-8a7ac496a701
    Total jobs = 3
    Launching Job 1 out of 3
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1525767620603_0046, Tracking URL = http://node203.hmbank.com:54315/proxy/application_1525767620603_0046/
    Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1525767620603_0046
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    2018-05-22 20:07:54,306 Stage-1 map = 0%,  reduce = 0%
    2018-05-22 20:08:00,724 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.72 sec
    MapReduce Total cumulative CPU time: 1 seconds 720 msec
    Ended Job = job_1525767620603_0046
    Stage-4 is selected by condition resolver.
    Stage-3 is filtered out by condition resolver.
    Stage-5 is filtered out by condition resolver.
    Moving data to directory hdfs://hmcluster/user/hive/warehouse/hivecluster.db/test2/.hive-staging_hive_2018-05-22_20-07-45_383_6231255496761759560-1/-ext-10000
    Loading data to table hivecluster.test2
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1   Cumulative CPU: 1.72 sec   HDFS Read: 4510 HDFS Write: 83 SUCCESS
    Total MapReduce CPU Time Spent: 1 seconds 720 msec
    OK
    Time taken: 17.491 seconds
    
    

    使用hive CLI ,也会提示基于MapReduce的Hive 在2.x版本已废弃, 建议使用1.x版本。
    但是insert 操作可以执行成功。

    1. 使用hive时在hdfs文件系统中显示的用户名:


      图片.png
    • 当使用beeline时, 用户owner是在connect时输入的用户名。
    • 当使用hive时, 用户owner 是执行hive CLI的操作系统用户名。
    • 从上图可以看到hive创建的文件权限都是rwxrwxrwx ,可以换用户执行操作。
    1. 远程模式与本地和内嵌的区别:
    • 远程模式时需要先创建database , 然后use database , 然后才能进行表操作。

    相关文章

      网友评论

        本文标题:Hive 远程模式

        本文链接:https://www.haomeiwen.com/subject/nttcjftx.html