hbase DroppedSnapshotException

作者: sunTengSt | 来源:发表于2018-10-26 17:41 被阅读12次

hbase DroppedSnapshotException
Hbase运行机制
数据存储-Hbase基础
HBase集群的搭建
2018-07-09 HBase shell commands
HBase基础
HBase读写数据流程
Phoenix&HBase数据类型转换
Hbase(二) HBase快速入门
云平台配置4：配置完全分布式的HBase

hbase夯机问题：

 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server : Replay of WAL required. Forcing server shutdown?
 Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync result after  ms for ringBufferSequence=, WAL system stuck?

解决思路：

一：Memstore Flush触发条件

HBase会在如下几种情况下触发flush操作，需要注意的是MemStore的最小flush单元是HRegion而不是单个MemStore。可想而知，如果一个HRegion中Memstore过多，每次flush的开销必然会很大，因此我们也建议在进行表设计的时候尽量减少ColumnFamily的个数。

Memstore级别限制：当Region中任意一个MemStore的大小达到了上限（hbase.hregion.memstore.flush.size，默认128MB），会触发Memstore刷新。

Region级别限制：当Region中所有Memstore的大小总和达到了上限（hbase.hregion.memstore.block.multiplier * hbase.hregion.memstore.flush.size，默认 2* 128M = 256M），会触发memstore刷新。

Region Server级别限制：当一个Region Server中所有Memstore的大小总和达到了上限（hbase.regionserver.global.memstore.upperLimit ＊ hbase_heapsize，默认 40%的JVM内存使用量），会触发部分Memstore刷新。Flush顺序是按照Memstore由大到小执行，先Flush Memstore最大的Region，再执行次大的，直至总体Memstore内存使用量低于阈值（hbase.regionserver.global.memstore.lowerLimit ＊ hbase_heapsize，默认 38%的JVM内存使用量）。

当一个Region Server中HLog数量达到上限（可通过参数hbase.regionserver.maxlogs配置）时，系统会选取最早的一个 HLog对应的一个或多个Region进行flush

HBase定期刷新Memstore：默认周期为1小时，确保Memstore不会长时间没有持久化。为避免所有的MemStore在同一时间都进行flush导致的问题，定期的flush操作有20000左右的随机延时。

手动执行flush：用户可以通过shell命令 flush ‘tablename’或者flush ‘region name’分别对一个表或者一个Region进行flush。

二：hbase源码;

> /**
>    * Flush a region.
>    * @param region Region to flush.
>    * @param emergencyFlush Set if we are being force flushed. If true the region
>    * needs to be removed from the flush queue. If false, when we were called
>    * from the main flusher run loop and we got the entry to flush by calling
>    * poll on the flush queue (which removed it).
>    * @param forceFlushAllStores whether we want to flush all store.
>    * @return true if the region was successfully flushed, false otherwise. If
>    * false, there will be accompanying log messages explaining why the region
was
>    * not flushed.
>    */
>   private boolean flushRegion(HRegion region, boolean emergencyFlush, boolean forceFlushAllStores,
>       FlushLifeCycleTracker tracker) {
>     synchronized (this.regionsInQueue) {
>       FlushRegionEntry fqe = this.regionsInQueue.remove(region);
>       // Use the start time of the FlushRegionEntry if available
>       if (fqe != null && emergencyFlush) {
>         // Need to remove from region from delay queue. When NOT an
>         // emergencyFlush, then item was removed via a flushQueue.poll.
>         flushQueue.remove(fqe);
>       }
>     }
>     tracker.beforeExecution();
>     lock.readLock().lock();
>     try {
>       notifyFlushRequest(region, emergencyFlush);
>       FlushResult flushResult = region.flushcache(forceFlushAllStores, false, tracker);
>       boolean shouldCompact = flushResult.isCompactionNeeded();
>       // We just want to check the size
>       boolean shouldSplit = region.checkSplit() != null;
>       if (shouldSplit) {
>         this.server.compactSplitThread.requestSplit(region);
>       } else if (shouldCompact) {
>         server.compactSplitThread.requestSystemCompaction(region, Thread.currentThread().getName());
>       }
>     } catch (DroppedSnapshotException ex) {
>       // Cache flush can fail in a few places. If it fails in a critical
>       // section, we get a DroppedSnapshotException and a replay of wal
>       // is required. Currently the only way to do this is a restart of
>       // the server. Abort because hdfs is probably bad (HBASE-644 is a case
>       // where hdfs was bad but passed the hdfs check).
>       server.abort("Replay of WAL required. Forcing server shutdown", ex);
>       return false;
>     } catch (IOException ex) {
>       ex = ex instanceof RemoteException ? ((RemoteException) ex).unwrapRemoteException() : ex;
>       LOG.error(
>         "Cache flush failed"
>             + (region != null ? (" for region " +
>                 Bytes.toStringBinary(region.getRegionInfo().getRegionName()))
>               : ""), ex);
>       if (!server.checkFileSystem()) {
>         return false;
>       }
>     } finally {
>       lock.readLock().unlock();
>       wakeUpIfBlocking();
>       tracker.afterExecution();
>     }
>     return true;
>   }
>

三：其中失败代码：

catch (DroppedSnapshotException ex) {

       // Cache flush can fail in a few places. If it fails in a critical
       // section, we get a DroppedSnapshotException and a replay of wal
      // is required. Currently the only way to do this is a restart of
      // the server. Abort because hdfs is probably bad (HBASE-644 is a case
       // where hdfs was bad but passed the hdfs check).
       server.abort("Replay of WAL required. Forcing server shutdown", ex);
      return false;
    }

四：解决方案：

1.设置memstore大小；HloG数量设置；

2.check hdfs 并且修复

3.重启server。

网友评论

BigData技术学习

本文标题：hbase DroppedSnapshotException

本文链接：https://www.haomeiwen.com/subject/zosltqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

hbase DroppedSnapshotException

hbase夯机问题：

解决思路：

一：Memstore Flush触发条件

二：hbase源码;

三：其中失败代码：

四：解决方案：

相关文章

hbase DroppedSnapshotException

Hbase运行机制

数据存储-Hbase基础

HBase集群的搭建

2018-07-09 HBase shell commands

HBase基础

HBase读写数据流程

Phoenix&HBase数据类型转换

Hbase(二) HBase快速入门

云平台配置4：配置完全分布式的HBase

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

BigData技术学习