hbase snapshot源码分析

作者: sunTengSt | 来源:发表于2017-11-24 15:39 被阅读69次

    snapshot操作在硬盘上形式:

    /hbase/.snapshots
           /.tmp                <---- working directory
           /[snapshot name]     <----- completed snapshot
    

    当snapshot完成时的形式展示:

         /hbase/.snapshots/[snapshot name]
                    .snapshotinfo          <--- Description of the snapshot
                    .tableinfo             <--- Copy of the tableinfo
                   /.logs
                         /[server_name]
                             /... [log files]
                          ...
                    /[region name]           <---- All the region's information
                    .regioninfo              <---- Copy of the HRegionInfo
                       /[column family name]
                           /[hfile name]     <--- name of the hfile in the real region
                           ...
                       ...
    

    snapshot基本步骤:

    1.执行前会枷锁操作,不允许删除添加操作;

    2.在hdfs在创建指定目录,写入相关的信息进去;

    3.刷新memstore中的数据到hfile,

    4.为hfile文件创建引用指针.

    以下是大体的代码流程。

    hbaseAdmin执行发起的snapshot:

        public void snapshot(final String snapshotName, final TableName tableName, SnapshotDescription.Type type) throws IOException,       SnapshotCreationException, IllegalArgumentException {
            SnapshotDescription.Builder builder = SnapshotDescription.newBuilder();
            builder.setTable(tableName.getNameAsString());
            builder.setName(snapshotName);
            builder.setType(type);
            snapshot(builder.build());
        }
    

    执行快照并等待服务器完成该快照(阻止)。HBase实例一次只能有一个快照,或者结果可能是未定义(你可以告诉多个HBase集群同时快照,但只有一个在单个群集同时)。

        public void snapshot(SnapshotDescription snapshot) throws IOException, SnapshotCreationException, IllegalArgumentException {
            // actually take the snapshot
            SnapshotResponse response = takeSnapshotAsync(snapshot);
    

    MasterRpcService:异步触发并完成一次snapshot:

            `master.snapshotManager.takeSnapshot(snapshot);`
    

    SnapshotManager类:完成一次snapshot需要根据表的状态:disabled或者enabled

        if (assignmentMgr.getTableStateManager().isTableState(snapshotTable, ZooKeeperProtos.Table.State.ENABLED)) {
                LOG.debug("Table enabled, starting distributed snapshot.");
                snapshotEnabledTable(snapshot);
                LOG.debug("Started snapshot: " + ClientSnapshotDescriptionUtils.toString(snapshot));
            }
            // For disabled table, snapshot is created by the master
            else if (assignmentMgr.getTableStateManager().isTableState(snapshotTable, ZooKeeperProtos.Table.State.DISABLED)) {
                LOG.debug("Table is disabled, running snapshot entirely on master.");
                snapshotDisabledTable(snapshot);
                LOG.debug("Started snapshot: " + ClientSnapshotDescriptionUtils.toString(snapshot));
            } 
    
            private synchronized void snapshotEnabledTable(SnapshotDescription snapshot) throws HBaseSnapshotException {
            // setup the snapshot
            prepareToTakeSnapshot(snapshot);
    
            // Take the snapshot of the enabled table
            EnabledTableSnapshotHandler handler = new EnabledTableSnapshotHandler(snapshot, master, this);
            snapshotTable(snapshot, handler);
        }
    

    enabled状态下执行表的snapshot:

            // setup the snapshot
            准备工作
            prepareToTakeSnapshot(snapshot);
    
            // Take the snapshot of the enabled table
            EnabledTableSnapshotHandler handler = new EnabledTableSnapshotHandler(snapshot, master, this);
            开始执行snapshot
            snapshotTable(snapshot, handler);
        }
    

    snapshot开始之前的设置准备:检查是否有一个在运行的snapshot工作以及还原snapshot工作的请求存在。#

            // make sure we aren't already running a snapshot 
            if (isTakingSnapshot(snapshot)) {
                SnapshotSentinel handler = this.snapshotHandlers.get(snapshotTable);
                throw new SnapshotCreationException("Rejected taking " + ClientSnapshotDescriptionUtils.toString(snapshot) + " because we are already running another snapshot " + (handler != null ? ("on the same table " + ClientSnapshotDescriptionUtils.toString(handler.getSnapshot())) : "with the same name"), snapshot);
            }
    
            // make sure we aren't running a restore on the same table
            if (isRestoringTable(snapshotTable)) {
                SnapshotSentinel handler = restoreHandlers.get(snapshotTable);
                throw new SnapshotCreationException("Rejected taking " + ClientSnapshotDescriptionUtils.toString(snapshot) + " because we are already have a restore in progress on the same snapshot " + ClientSnapshotDescriptionUtils.toString(handler.getSnapshot()), snapshot);
            }
    
            try {
                // delete the working directory, since we aren't running the snapshot. Likely leftovers
                // from a failed attempt.
                fs.delete(workingDir, true);
    
                // recreate the working directory for the snapshot
                if (!fs.mkdirs(workingDir)) {
                    throw new SnapshotCreationException("Couldn't create working directory (" + workingDir + ") for snapshot", snapshot);
                }
    

    设置准备工作完成就开始进行snapshot用指定的handler进行snapshot工作:

                handler.prepare();
                this.executorService.submit(handler);
                this.snapshotHandlers.put(TableName.valueOf(snapshot.getTable()), handler);
                ...
    

    TakeSnapshotHandler真正开始处理snapshot操作:

    1.将snapshot描述信息写入.snapshotinfo目录

    FsPermission perms = FSUtils.getFilePermissions(fs, fs.getConf(), HConstants.DATA_FILE_UMASK_KEY);
            Path snapshotInfo = new Path(workingDir, SnapshotDescriptionUtils.SNAPSHOTINFO_FILE);
            try {
                FSDataOutputStream out = FSUtils.create(fs, snapshotInfo, perms, true);
                try {
                    snapshot.writeTo(out);
                } finally {
                    out.close();
                }
            }
    

    2.复制表的信息:

    snapshotManifest.addTableDescriptor(this.htd);
    

    3.获取hregionserver上的regions以及位置信息 ##:

    List<Pair<HRegionInfo, ServerName>> regionsAndLocations;
                if (TableName.META_TABLE_NAME.equals(snapshotTable)) {
                    regionsAndLocations = new MetaTableLocator().getMetaRegionsAndLocations(server.getZooKeeper());
                } else {
                    regionsAndLocations = MetaTableAccessor.getTableRegionsAndLocations(server.getZooKeeper(), server.getConnection(), snapshotTable, false);
                }
    

    4.开始执行snapshot操作,上面获取到的region信息及位置信息

     // run the snapshot
    snapshotRegions(regionsAndLocations);
    启动snapshot程序:::
    

    在regionserver上开始snapshot // start the snapshot on the RS所有的snapshot操作的具体细节

        Procedure proc = coordinator.startProcedure(this.monitor, this.snapshot.getName(), this.snapshot.toByteArray(), 
    
        Lists.newArrayList(regionServers));
        if (proc == null) {
            String msg = "Failed to submit distributed procedure for snapshot '" + snapshot.getName() + "'";
            LOG.error(msg);
            throw new HBaseSnapshotException(msg);
        }
    

    等待snapshot完成:

    proc.waitForCompleted();
    

    将下线的region作为disabled处理

    // Take the offline regions as disabled
            for (Pair<HRegionInfo, ServerName> region : regions) {
                HRegionInfo regionInfo = region.getFirst();
                if (regionInfo.isOffline() && (regionInfo.isSplit() || regionInfo.isSplitParent())) {
                    LOG.info("Take disabled snapshot of offline region=" + regionInfo);
                    snapshotDisabledRegion(regionInfo);
                }
            }
    

    5.相关region信息以及servername,用来验证snapshot的有效性

    // extract each pair to separate lists
                Set<String> serverNames = new HashSet<String>();
                for (Pair<HRegionInfo, ServerName> p : regionsAndLocations) {
                    if (p != null && p.getFirst() != null && p.getSecond() != null) {
                        HRegionInfo hri = p.getFirst();
                        if (hri.isOffline() && (hri.isSplit() || hri.isSplitParent()))
                            continue;
                        serverNames.add(p.getSecond().toString());
                    }
                }
    

    6.刷新内存状态,写snapshot-mnifest信息到目录

    // flush the in-memory state, and write the single manifest
                status.setStatus("Consolidate snapshot: " + snapshot.getName());
                snapshotManifest.consolidate();
    

    7.开始验证snapshot的有效性

    // verify the snapshot is valid
                status.setStatus("Verifying snapshot: " + snapshot.getName());
                verifier.verifySnapshot(this.workingDir, serverNames);
    

    8.完成snapshot,转移目录等

    // complete the snapshot, atomically moving from tmp to .snapshot dir.
    completeSnapshot(this.snapshotDir, this.workingDir, this.fs);
    msg = "Snapshot " + snapshot.getName() + " of table " + snapshotTable + " completed";
    status.markComplete(msg);
    LOG.info(msg);
    metricsSnapshot.addSnapshot(status.getCompletionTimestamp() - status.getStartTime());

    相关文章

      网友评论

        本文标题:hbase snapshot源码分析

        本文链接:https://www.haomeiwen.com/subject/gclfbxtx.html