美文网首页BigData技术学习
hbase DroppedSnapshotException

hbase DroppedSnapshotException

作者: sunTengSt | 来源:发表于2018-10-26 17:41 被阅读12次

    hbase夯机问题:

     FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server : Replay of WAL required. Forcing server shutdown?
     Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync result after  ms for ringBufferSequence=, WAL system stuck?
    

    解决思路:

    一:Memstore Flush触发条件

    HBase会在如下几种情况下触发flush操作,需要注意的是MemStore的最小flush单元是HRegion而不是单个MemStore。可想而知,如果一个HRegion中Memstore过多,每次flush的开销必然会很大,因此我们也建议在进行表设计的时候尽量减少ColumnFamily的个数。

    Memstore级别限制:当Region中任意一个MemStore的大小达到了上限(hbase.hregion.memstore.flush.size,默认128MB),会触发Memstore刷新。

    Region级别限制:当Region中所有Memstore的大小总和达到了上限(hbase.hregion.memstore.block.multiplier * hbase.hregion.memstore.flush.size,默认 2* 128M = 256M),会触发memstore刷新。

    Region Server级别限制:当一个Region Server中所有Memstore的大小总和达到了上限(hbase.regionserver.global.memstore.upperLimit * hbase_heapsize,默认 40%的JVM内存使用量),会触发部分Memstore刷新。Flush顺序是按照Memstore由大到小执行,先Flush Memstore最大的Region,再执行次大的,直至总体Memstore内存使用量低于阈值(hbase.regionserver.global.memstore.lowerLimit * hbase_heapsize,默认 38%的JVM内存使用量)。

    当一个Region Server中HLog数量达到上限(可通过参数hbase.regionserver.maxlogs配置)时,系统会选取最早的一个 HLog对应的一个或多个Region进行flush

    HBase定期刷新Memstore:默认周期为1小时,确保Memstore不会长时间没有持久化。为避免所有的MemStore在同一时间都进行flush导致的问题,定期的flush操作有20000左右的随机延时。

    手动执行flush:用户可以通过shell命令 flush ‘tablename’或者flush ‘region name’分别对一个表或者一个Region进行flush。

    二:hbase源码;

    > /**
    >    * Flush a region.
    >    * @param region Region to flush.
    >    * @param emergencyFlush Set if we are being force flushed. If true the region
    >    * needs to be removed from the flush queue. If false, when we were called
    >    * from the main flusher run loop and we got the entry to flush by calling
    >    * poll on the flush queue (which removed it).
    >    * @param forceFlushAllStores whether we want to flush all store.
    >    * @return true if the region was successfully flushed, false otherwise. If
    >    * false, there will be accompanying log messages explaining why the region
    was
    >    * not flushed.
    >    */
    >   private boolean flushRegion(HRegion region, boolean emergencyFlush, boolean forceFlushAllStores,
    >       FlushLifeCycleTracker tracker) {
    >     synchronized (this.regionsInQueue) {
    >       FlushRegionEntry fqe = this.regionsInQueue.remove(region);
    >       // Use the start time of the FlushRegionEntry if available
    >       if (fqe != null && emergencyFlush) {
    >         // Need to remove from region from delay queue. When NOT an
    >         // emergencyFlush, then item was removed via a flushQueue.poll.
    >         flushQueue.remove(fqe);
    >       }
    >     }
    >     tracker.beforeExecution();
    >     lock.readLock().lock();
    >     try {
    >       notifyFlushRequest(region, emergencyFlush);
    >       FlushResult flushResult = region.flushcache(forceFlushAllStores, false, tracker);
    >       boolean shouldCompact = flushResult.isCompactionNeeded();
    >       // We just want to check the size
    >       boolean shouldSplit = region.checkSplit() != null;
    >       if (shouldSplit) {
    >         this.server.compactSplitThread.requestSplit(region);
    >       } else if (shouldCompact) {
    >         server.compactSplitThread.requestSystemCompaction(region, Thread.currentThread().getName());
    >       }
    >     } catch (DroppedSnapshotException ex) {
    >       // Cache flush can fail in a few places. If it fails in a critical
    >       // section, we get a DroppedSnapshotException and a replay of wal
    >       // is required. Currently the only way to do this is a restart of
    >       // the server. Abort because hdfs is probably bad (HBASE-644 is a case
    >       // where hdfs was bad but passed the hdfs check).
    >       server.abort("Replay of WAL required. Forcing server shutdown", ex);
    >       return false;
    >     } catch (IOException ex) {
    >       ex = ex instanceof RemoteException ? ((RemoteException) ex).unwrapRemoteException() : ex;
    >       LOG.error(
    >         "Cache flush failed"
    >             + (region != null ? (" for region " +
    >                 Bytes.toStringBinary(region.getRegionInfo().getRegionName()))
    >               : ""), ex);
    >       if (!server.checkFileSystem()) {
    >         return false;
    >       }
    >     } finally {
    >       lock.readLock().unlock();
    >       wakeUpIfBlocking();
    >       tracker.afterExecution();
    >     }
    >     return true;
    >   }
    >
    

    三:其中失败代码:

    catch (DroppedSnapshotException ex) {
    
           // Cache flush can fail in a few places. If it fails in a critical
           // section, we get a DroppedSnapshotException and a replay of wal
          // is required. Currently the only way to do this is a restart of
          // the server. Abort because hdfs is probably bad (HBASE-644 is a case
           // where hdfs was bad but passed the hdfs check).
           server.abort("Replay of WAL required. Forcing server shutdown", ex);
          return false;
        }
    

    四:解决方案:

    1.设置memstore大小;HloG数量设置;

    2.check hdfs 并且修复

    3.重启server。

    相关文章

      网友评论

        本文标题:hbase DroppedSnapshotException

        本文链接:https://www.haomeiwen.com/subject/zosltqtx.html