美文网首页
ambari 2.7.8 Hive on spark 配置问题解

ambari 2.7.8 Hive on spark 配置问题解

作者: 圆企鹅i | 来源:发表于2024-03-06 15:55 被阅读0次

    ambari 2.7.8 Hive on spark 配置问题解决

    因为ambari默认的引擎为tez,所以建议直接使用tez

    ambari绑定的hive on spark 适配并不是特别好 记录一下解决的问题

    版本

    Hadoop 3.1.1

    Hive 3.1.0

    Spark2 2.3.0

    综合解决

    #解决hive 启动使用spark引擎直接报错 将spark包放到hive路径下 使hive可以使用spark引擎
    cp /usr/hdp/current/spark2-client/jars/spark-core_*.jar /usr/hdp/current/hive-server2-hive/lib/
    cp /usr/hdp/current/spark2-client/jars/scala-library*.jar /usr/hdp/current/hive-server2-hive/lib/
    cp /usr/hdp/current/spark2-client/jars/spark-network-common*.jar /usr/hdp/current/hive-server2-hive/lib/
    cp /usr/hdp/current/spark2-client/jars/spark-unsafe*.jar /usr/hdp/current/hive-server2-hive/lib/
    
    cp /usr/hdp/current/spark2-client/jars/scala-reflect-*.jar /usr/hdp/current/hive-server2-hive/lib/
    cp /usr/hdp/current/spark2-client/jars/spark-launcher*.jar /usr/hdp/current/hive-server2-hive/lib/
    cp /usr/hdp/current/spark2-client/jars/spark-yarn*.jar /usr/hdp/current/hive-server2-hive/lib/
    
    #Custome hive-site配置 (不这样做 需要更改每台机器的spark lib 更加麻烦)
    spark.yarn.jars=hdfs://dwh-test01:8020/spark-jars/*
    
    #上传spark jar到hdfs
    sudo -u hdfs hdfs dfs -mkdir /spark-jars
    sudo -u hdfs hdfs dfs -chmod 777 /spark-jars
    hadoop fs -put /usr/hdp/current/spark2-client/jars/*.jar /spark-jars/
    
    # 执行hdp-select versions将结果配置到 Custom mapred-site 网上有人配置在yarn,hive,spark 但是我配了都没生效
    hdp.version=3.1.5.0-152
    
    #删除spark自带的错误的hive包
    hadoop fs -rm /spark-jars/hive*.jar
    hadoop fs -rm /spark-jars/spark-hive*.jar
    #说明:(影响 HIVE ON YARN 的 INSERT语法)
    hadoop fs -rm /spark-jars/hive-exec-1.21.2.3.1.5.0-152.jar
    #说明:(影响 HIVE ON YARN 的 GROUP BY语法)
    hadoop fs -rm /spark-jars/orc-core-1.4.4-nohive.jar
    
    # hive-site取消勾选(影响 HIVE ON YARN 的 JOIN ON语法)
    hive.mapjoin.optimized.hashtable=false;
    
    #todo 指定是用spark作为引擎(不生效 暂时只能在sql里手动set)
    hive.execution.engine=spark
    #Advanced hive-interactive-site 关闭 删除hive.execution.engine限制
    Restricted session configs=hive.execution.mode
    
    #测试:SQL执行 手动测试
    set hive.execution.engine=spark;
    set hive.mapjoin.optimized.hashtable=false;
    
    

    Error while processing statement: FAILED: Execution Error, return code 3

    [42000][3] Error while processing statement: FAILED: 
    Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
    Spark job failed during runtime. Please check stacktrace for the root cause.
    

    解决

    1. 如果任务已经跑在yarn上 想办法查看spark-history的日志 再进一步排查
    2. 如果hive还无法跑在yarn上,查看ambari集成的服务的默认日志路径 /var/log/hive/.. 再进一步排查

    Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session b1d48791-b28a-446c-9900-2dc48e2c751a)'

    024-03-05T18:02:35,272 ERROR [HiveServer2-Background-Pool: Thread-177]: operation.Operation (:()) - Error running hive query: 
    2024-03-05T18:07:09,694 ERROR [HiveServer2-Background-Pool: Thread-104]: client.SparkClientImpl (:()) - Error while waiting for client to connect.
    2024-03-05T18:07:09,715 ERROR [HiveServer2-Background-Pool: Thread-104]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session b1d48791-b28a-446c-9900-2dc48e2c751a)'
    2024-03-05T18:07:09,715 ERROR [HiveServer2-Background-Pool: Thread-104]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session b1d48791-b28a-446c-9900-2dc48e2c751a)'
    2024-03-05T18:07:09,715 ERROR [HiveServer2-Background-Pool: Thread-104]: ql.Driver (:()) - FAILED: command has been interrupted: during query execution: 
    2024-03-05T18:09:07,775 ERROR [HiveServer2-Background-Pool: Thread-136]: client.SparkClientImpl (:()) - Timed out waiting for client to connect.
    2024-03-05T18:09:07,779 ERROR [HiveServer2-Background-Pool: Thread-136]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 7e9ae27f-5a9a-4200-b8de-b6fff293612f)'
    2024-03-05T18:09:07,779 ERROR [HiveServer2-Background-Pool: Thread-136]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 7e9ae27f-5a9a-4200-b8de-b6fff293612f)'
    2024-03-05T18:09:07,780 ERROR [HiveServer2-Background-Pool: Thread-136]: ql.Driver (:()) - FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 7e9ae27f-5a9a-4200-b8de-b6fff293612f
    2024-03-05T18:09:07,792 ERROR [HiveServer2-Background-Pool: Thread-136]: operation.Operation (:()) - Error running hive query: 
    2024-03-05T18:21:49,052 ERROR [HiveServer2-Background-Pool: Thread-109]: client.SparkClientImpl (:()) - Timed out waiting for client to connect.
    2024-03-05T18:21:49,072 ERROR [HiveServer2-Background-Pool: Thread-109]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session b62a3365-ca4a-4ac8-aece-0a8db5a90cdf)'
    2024-03-05T18:21:49,072 ERROR [HiveServer2-Background-Pool: Thread-109]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session b62a3365-ca4a-4ac8-aece-0a8db5a90cdf)'
    

    解决

    #解决hive 启动使用spark引擎直接报错 将spark包放到hive路径下 使hive可以使用spark引擎
    cp /usr/hdp/current/spark2-client/jars/spark-core_*.jar /usr/hdp/current/hive-server2-hive/lib/
    cp /usr/hdp/current/spark2-client/jars/scala-library*.jar /usr/hdp/current/hive-server2-hive/lib/
    cp /usr/hdp/current/spark2-client/jars/spark-network-common*.jar /usr/hdp/current/hive-server2-hive/lib/
    cp /usr/hdp/current/spark2-client/jars/spark-unsafe*.jar /usr/hdp/current/hive-server2-hive/lib/
    

    java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS

    24/03/06 17:54:33 ERROR ApplicationMaster: Uncaught exception: 
    org.apache.spark.SparkException: Exception thrown in awaitResult: 
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
        at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:498)
        at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
        at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
        at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
    Caused by: java.util.concurrent.ExecutionException: Boxed Error
        at scala.concurrent.impl.Promise$.resolver(Promise.scala:59)
        at scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:51)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
        at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
        at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:157)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:739)
    Caused by: java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS
        at org.apache.hive.spark.client.rpc.RpcConfiguration.<clinit>(RpcConfiguration.java:48)
        at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:138)
        at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:536)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)
    24/03/06 17:54:33 INFO ApplicationMaster: Deleting staging directory hdfs://HA-Namespace/user/hive/.sparkStaging/application_1709716929078_0004
    24/03/06 17:54:33 INFO ShutdownHookManager: Shutdown hook called
    

    解决

    ##Ambari web 设置 Custome hive-site配置hive on spark 使用hdfs的jar包
    spark.yarn.jars=hdfs://dwh-test01:8020/spark-jars/*
    
    #上传 spark2的jar到hdfs
    hadoop fs -put /usr/hdp/current/spark2-client/jars/*.jar /spark-jars/
    
    #删除spark2自带的错误的hive包
    hadoop fs -rm /spark-jars/hive*.jar
    hadoop fs -rm /spark-jars/spark-hive*.jar
    

    java.lang.NoSuchMethodError: org.apache.orc.OrcFile

    24/03/07 12:00:33 ERROR RemoteDriver: Failed to run job 3ecd93be-704b-4f42-aa50-6fa7aec5d9cd
    java.lang.NoSuchMethodError: org.apache.orc.OrcFile$ReaderOptions.useUTCTimestamp(Z)Lorg/apache/orc/OrcFile$ReaderOptions;
        at org.apache.hadoop.hive.ql.io.orc.OrcFile$ReaderOptions.useUTCTimestamp(OrcFile.java:94)
        at org.apache.hadoop.hive.ql.io.orc.OrcFile$ReaderOptions.<init>(OrcFile.java:70)
        at org.apache.hadoop.hive.ql.io.orc.OrcFile.readerOptions(OrcFile.java:100)
        at org.apache.hadoop.hive.ql.io.AcidUtils$MetaDataFile.isRawFormatFile(AcidUtils.java:2344)
        at org.apache.hadoop.hive.ql.io.AcidUtils$MetaDataFile.isRawFormat(AcidUtils.java:2339)
        at org.apache.hadoop.hive.ql.io.AcidUtils.parsedDelta(AcidUtils.java:1037)
        at org.apache.hadoop.hive.ql.io.AcidUtils.parseDelta(AcidUtils.java:1028)
        at org.apache.hadoop.hive.ql.io.AcidUtils.getChildState(AcidUtils.java:1347)
        at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:1163)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.processForWriteIds(HiveInputFormat.java:641)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.processPathsForMmRead(HiveInputFormat.java:605)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:495)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:789)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:552)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:267)
        at org.apache.spark.api.java.JavaRDDLike$class.getNumPartitions(JavaRDDLike.scala:65)
        at org.apache.spark.api.java.AbstractJavaRDDLike.getNumPartitions(JavaRDDLike.scala:45)
        at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:215)
        at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:142)
        at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:114)
        at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:359)
        at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:378)
        at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:343)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
    

    解决

    解决办法 参考上一步

    错误原因:spark2自带的错误的版本的hive包 产生了jar包冲突
    删掉hdfs上的orc-core-1.4.4-nohive.jar即可(前提是已经配置了hive-site配置了spark.yarn.jars)
    

    org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException

    java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer cannot be cast to org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer

    24/03/07 10:31:32 INFO DAGScheduler: ResultStage 9 (Map 1) failed in 0.178 s due to Job aborted due to stage failure: Task 0 in stage 9.0 failed 4 times, most recent failure: Lost task 0.3 in stage 9.0 (TID 27, dwh-test03, executor 2): java.lang.IllegalStateException: Hit error while closing operators - failing tree: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
        at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:203)
        at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
        at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:96)
        at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
        at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
        at org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:2190)
        at org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:2190)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerStringOperator.process(VectorMapJoinInnerStringOperator.java:384)
        at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
        at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
        at org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:136)
        at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:990)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:732)
        at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:180)
        ... 15 more
    Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerGenerateResultOperator.commonSetup(VectorMapJoinInnerGenerateResultOperator.java:119)
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerStringOperator.process(VectorMapJoinInnerStringOperator.java:109)
        ... 27 more
    
    24/03/07 10:31:32 WARN TaskSetManager: Lost task 1.0 in stage 9.0 (TID 22, dwh-test02, executor 1): java.lang.RuntimeException: Map operator initialization failed: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer cannot be cast to org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer
        at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:124)
        at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55)
        at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
    Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer cannot be cast to org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinTableContainer
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.setUpHashTable(VectorMapJoinCommonOperator.java:493)
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.completeInitializationOp(VectorMapJoinCommonOperator.java:462)
        at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:469)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:399)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:572)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:524)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:115)
    

    解决

    发现报错的源码在:src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java

    private void setUpHashTable() {

    HashTableImplementationType hashTableImplementationType = vectorDesc.getHashTableImplementationType();
    switch (vectorDesc.getHashTableImplementationType()) {
    case OPTIMIZED:
      {
        // Create our vector map join optimized hash table variation *above* the
        // map join table container.
        vectorMapJoinHashTable = VectorMapJoinOptimizedCreateHashTable.createHashTable(conf,
                mapJoinTables[posSingleVectorMapJoinSmallTable]);
      }
      break;
    
    case FAST:
      {
        // Get our vector map join fast hash table variation from the
        // vector map join table container.
        VectorMapJoinTableContainer vectorMapJoinTableContainer =
                (VectorMapJoinTableContainer) mapJoinTables[posSingleVectorMapJoinSmallTable];
        vectorMapJoinHashTable = vectorMapJoinTableContainer.vectorMapJoinHashTable();
      }
      break;
    default:
      throw new RuntimeException("Unknown vector map join hash table implementation type " + hashTableImplementationType.name());
    }
    LOG.info("Using " + vectorMapJoinHashTable.getClass().getSimpleName() + " from " + this.getClass().getSimpleName());
    

    case FAST 的代码是有问题的,发现可以通过修改参数 可以通过修改配置不走FAST

    set hive.mapjoin.optimized.hashtable=false;
    

    查看ambari web的参数 发现了这个参数在web上有

    备注是:hive.mapjoin.optimized.hashtable
    Whether Hive should use memory-optimized hash table for MapJoin.Only works on Tez,
    because memory-optimized hashtable cannot be serialized.

    翻译:hive.mapjoin.optimized.hashtable
    Hive是否应该为MapJoin使用内存优化的哈希表。仅适用于Tez,
    因为内存优化的hashtable无法序列化。

    是专门给tez引擎的,直接取消勾选,重启hive即可

    参考

    hive on spark 官方文档

    有一定指导作用,但也不能完全解决问题

    相关文章

      网友评论

          本文标题:ambari 2.7.8 Hive on spark 配置问题解

          本文链接:https://www.haomeiwen.com/subject/zevjzdtx.html