美文网首页
CDH5.13的HiveMetaStore因Sentry同步导致

CDH5.13的HiveMetaStore因Sentry同步导致

作者: gregocean | 来源:发表于2019-06-11 17:44 被阅读0次

解决方法:
https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_hive_ki.html

屏幕快照 2019-06-11 下午5.42.53.png

一、现象说明

Hive metastore server多线程操作元数据库,导致数据表死锁而引起的服务暂停

二、问题排查

1、hive metastore service在每日活动监控中,发现有部分时间内,会出现一些错误信息 (Caused by: org.apache.thrift.transport.TTransportException),sentry同步hive metastore信息时所产生,但该状况很快会被自动修复

2、在hive metastore service 做数据交互时,由于并发量较大,mysql数据表产生死锁(Lock wait timeout exceeded; try restarting transaction),数据库表锁没有及时释放

3、由于Hive Metastore死锁导致使用Sentry + Hive时高度并发写入工作负载下的查询缓慢或停滞

metastore日志分析以及airflow调度任务并发量分析

a) 提取错误节点日志,获取关键错误信息

b) 检查任务并发量,并发任务执行时,所建立TCP连接数量

c) hive metastore源码解读

主要错误节点信息:可以看到是对HMS中notification_sequence的更新操作报错。

2019-05-05 01:18:11,363 ERROR DataNucleus.Datastore.Persist: [pool-5-thread-178]: Update of object "org.apache.hadoop.hive.metastore.model.MNotificationNextId@58252dbf" using statement "UPDATE `NOTIFICATION_SEQUENCE` SET `NEXT_EVENT_ID`=? WHERE `NNI_ID`=?" failed : com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
    at com.mysql.jdbc.Util.getInstance(Util.java:408)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
    at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
    at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
    at com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
    at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1998)
    at com.jolbox.bonecp.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:205)
    at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeUpdate(ParamLoggingPreparedStatement.java:399)
    at org.datanucleus.store.rdbms.SQLController.executeStatementUpdate(SQLController.java:439)
    at org.datanucleus.store.rdbms.request.UpdateRequest.execute(UpdateRequest.java:374)
    at org.datanucleus.store.rdbms.RDBMSPersistenceHandler.updateTable(RDBMSPersistenceHandler.java:417)
    at org.datanucleus.store.rdbms.RDBMSPersistenceHandler.updateObject(RDBMSPersistenceHandler.java:390)
    at org.datanucleus.state.JDOStateManager.flush(JDOStateManager.java:5027)
    at org.datanucleus.flush.FlushOrdered.execute(FlushOrdered.java:106)
    at org.datanucleus.ExecutionContextImpl.flushInternal(ExecutionContextImpl.java:4119)
    at org.datanucleus.ExecutionContextThreadedImpl.flushInternal(ExecutionContextThreadedImpl.java:450)
    at org.datanucleus.store.query.Query.prepareDatastore(Query.java:1575)
    at org.datanucleus.store.query.Query.executeQuery(Query.java:1760)
    at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672)
    at org.datanucleus.store.query.Query.execute(Query.java:1654)
    at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:221)
    at org.apache.hadoop.hive.metastore.ObjectStore.addNotificationEvent(ObjectStore.java:7754)
    at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:103)
    at com.sun.proxy.$Proxy7.addNotificationEvent(Unknown Source)
    at org.apache.hive.hcatalog.listener.DbNotificationListener.enqueue(DbNotificationListener.java:335)
    at org.apache.hive.hcatalog.listener.DbNotificationListener.onDropPartition(DbNotificationListener.java:183)
    at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$9.notify(MetaStoreListenerNotifier.java:93)
    at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:167)
    at org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:197)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:3134)
    at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
    at com.sun.proxy.$Proxy10.drop_partitions_req(Unknown Source)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10650)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10634)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:752)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:747)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:747)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
 
2019-05-05 01:18:11,368 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-178]: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
    at java.util.ArrayList.rangeCheck(ArrayList.java:653)
    at java.util.ArrayList.get(ArrayList.java:429)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:3175)
    at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
    at com.sun.proxy.$Proxy10.drop_partitions_req(Unknown Source)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10650)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10634)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:752)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:747)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:747)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
 
2019-05-05 01:18:11,368 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [pool-5-thread-178]: </PERFLOG method=drop_partitions_req start=1556990240187 end=1556990291368 duration=51181 from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=168 retryCount=-1 error=true>
2019-05-05 01:18:11,368 ERROR org.apache.thrift.server.TThreadPoolServer: [pool-5-thread-178]: Error occurred during processing of message.
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
    at java.util.ArrayList.rangeCheck(ArrayList.java:653)
    at java.util.ArrayList.get(ArrayList.java:429)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:3175)
    at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
    at com.sun.proxy.$Proxy10.drop_partitions_req(Unknown Source)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10650)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10634)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:752)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:747)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:747)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)

阿里云RDS数据库服务TCP连接以及慢日志信息分析

a) image.png

b)通过对RDS慢日志信息回查,对于数据库表锁信息没有找到相对应的结果

三、问题解决方案

a)修改msyql在innodb引擎下的数据库锁等待时间:

set global innodb_lock_wait_timeout=180

b)目前受影响的版本有CDH 5.13.0,5.13.1,5.13.2,5.14.0和5.14.1,并且在平台中Hive和Sentry一起运行,可以采取缓解措施:

在Cloudera Manager中,转至群集> Hive服务> 配置,然后搜索hive-site.xml的Hive Metastore Server高级配置代码段(安全阀),单击加号(+)两次,然后添加以下值:
设置hive.metastore.transactional.event.listeners为null
设置hive.metastore.event.listeners为org.apache.hive.hcatalog.listener.DbNotificationListener

应用此变通办法可能会导致某些DDL查询(例如,删除表的查询以及使用相同名称创建的新表)导致意外失败,可能会报出<samp class="ph codeph">没有有效的权限</samp>错误。重新运行此类查询以解决此错误。

c)上述两种方案都是临时缓解而做的,可以通过升级CDH版本,添加修复补丁,进行完整修复

解决问题参考:https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_hive_ki.html#tsb_2018_294

相关文章

网友评论

      本文标题:CDH5.13的HiveMetaStore因Sentry同步导致

      本文链接:https://www.haomeiwen.com/subject/jgfcfctx.html