美文网首页
hive on spark配置

hive on spark配置

作者: 雨中星辰0 | 来源:发表于2018-12-01 16:08 被阅读0次

    作者:刘权

    时间:2016-08-18

    背景介绍

    1.软件版本


    • hive : hive-0.13.1
    • spark : spark-1.2.0-bin-hadoop2.3
    • hadoop : hadoop-2.2.0
    • phoenix : phoenix-4.4.0-HBase-0.98
    • hbase : hbase-0.98.0-hadoop2

    2. 准备工作


    1. 保证hdfs可用
    2. 保证hive可用

    3. 配置spark

    3.1 设置SPARK_CLASSPATH

    vi $SPARK_HOME/conf/spark-env.sh

    export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/yarn/spark/lib/mysql-connector-java-5.1.17.jar:/home/yarn/hadoop-2.2.0/lib/*:/home/yarn/hbase-0.98.0-hadoop2/lib/hbase-protocol-0.98.0-hadoop2.jar
    

    3.2 拷贝hdfs-site.xml hive-site.xml

    cp $HADOOP_HOME/hdfs-site.xml  $SPARK_HOME/conf/
    cp $HIVE_HOME/conf/hive-site.xml $SPARK_HOME/conf/
    

    4. 常见问题:

    4.1. hive连接元数据报错

    检查:mysql中hive的元数据字符集,hive要求元数据库字符集必须为:latin1
    如果不是:使用以下命令修改字符集编码

    alter database hive character set latin1;
    ALTER TABLE hive.* DEFAULT CHARACTER SET latin1;
    

    4.2.注意:spark.driver.extraClassPath和SPARK_CLASSPATH这两个设置同时存在一个

    4.3. IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString

    问题原因:

    这个问题的发生是由于优化了HBASE-9867 引起的,无意间引进了一个依赖类加载器。
    它影响使用-libjars参数和使用 fat jar两种模式的job. 
    
    fat jar模式Hadoop的一个特殊功能:
    可以读取操作目录中/lib目录下包含的所有库的JAR文件,
    把运行job依赖的jar放在jar中的lib目录下。
    

    解决方案:

    将hbase-protocol-0.98.0-hadoop2.jar加入SPARK_CLASSPATH中
    

    4.4. MetaException(message:java.lang.ClassNotFoundException Class org.apache.phoenix.hive.PhoenixSerde

    尝试方案:

    1. 将phoenix-hive-4.2.2-jar-with-dependencies.jar加入SPARK_CLASSPATH ==尝试结果:失败==
    2. 将phoenix-hive-4.2.2-jar-with-dependencies.jar设置到$SPARK_CLASSPATH/conf/hive-site.xml ==尝试结果:失败==

    最终解决方案:

    在执行spark-sql或spark-shell时候加入参数:
    --jars /home/yarn/hive/lib/phoenix-hive-4.2.2-jar-with-dependencies.jar
    
    如下:
    
    spark-sql --master spark://big147:7077 --executor-memory 20G --total-executor-cores 2  --jars /home/yarn/hive/lib/phoenix-hive-4.2.2-jar-with-dependencies.jar
    

    4.5 mysql驱动问题

    问题详情:

    Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
            at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
            at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:101)
            at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:606)
            at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
            at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
            at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
            at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)
            at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)
            at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
            at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
            at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
            at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
            ... 9 more
    Caused by: java.lang.reflect.InvocationTargetException
            at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
            at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
            at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
            at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
            at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
            ... 14 more
    Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
    NestedThrowables:
    java.lang.reflect.InvocationTargetException
            at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587)
            at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:788)
            at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
            at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:606)
            at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
            at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
            at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
            at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
            at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:310)
            at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:339)
            at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:248)
            at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:223)
            at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
            at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
            at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
            at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
            at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:497)
            at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:475)
            at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:523)
            at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:397)
            at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:356)
            at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)
            at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
            at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4944)
            at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:171)
            ... 19 more
    Caused by: java.lang.reflect.InvocationTargetException
            at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
            at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
            at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
            at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
            at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
            at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
            at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
            at org.datanucleus.store.AbstractStoreManager.<init>(AbstractStoreManager.java:240)
            at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:286)
            at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
            at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
            at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
            at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
            at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
            at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
            at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
            at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
            at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
            ... 48 more
    Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "dbcp-builtin" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
            at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:259)
            at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
            at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:85)
            ... 66 more
    Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
            at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:58)
            at org.datanucleus.store.rdbms.connectionpool.DBCPBuiltinConnectionPoolFactory.createConnectionPool(DBCPBuiltinConnectionPoolFactory.java:49)
            at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
            ... 68 more
    

    解决方案:

    方案1.
    执行spark-sql或spark-shell加入参数 --driver-class-path执行驱动位置

     spark-shell --master spark://big147:7077 --executor-memory 20G --total-executor-cores 2 --driver-class-path /home/yarn/hive/lib/mysql-connector-java-5.1.26.jar
    

    方案2.
    将驱动包加入SPARK_CLASSPATH:如下

    vi $SPARK_HOME/conf/spark-env.sh 
    export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/yarn/hadoop-2.2.0/lib/*:/home/yarn/hbase-0.98.0-hadoop2/lib/hbase-protocol-0.98.0:/home/yarn/spark/lib/mysql-c
    onnector-java-5.1.17.jar
    
    

    相关文章

      网友评论

          本文标题:hive on spark配置

          本文链接:https://www.haomeiwen.com/subject/iokrcqtx.html