美文网首页大数据技术栈
hdfs append的AlreadyBeingCreatedE

hdfs append的AlreadyBeingCreatedE

作者: aaron1993 | 来源:发表于2017-08-08 20:20 被阅读0次

    Hdfs append调用异常AlreadyBeingCreatedException

    首先抛出的异常如下:

    org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to APPEND_FILE /binlogsync_test/binlog/mock/test/test_1502173606572 for DFSClient_NONMAPREDUCE_-70835360_1 on 127.0.0.1 because DFSClient_NONMAPREDUCE_-70835360_1 is already the current lease holder.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2863)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2664)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2962)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2927)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:652)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
    

    首先解释一下我是在什么情况下出现的这个异常:

    我编写了一个测试用例:模拟在往hdfs文件中写了一条不完整的记录之后,按照之前ack文件回滚到最近一条完整记录,然后接着写。文件操作的调用线路的这样的:

    1. FileSystem # create打开文件
    1. FsDataoutputStream # write(写若干条不完整记录,并且调用hsync确保落盘) 
    2. rollBack(根据ack文件回滚到最近一次的完整记录,这个过程中调用了FsDataoutputStream # close关闭流,然后FileSystem # truncate截断文件)
    3. 以append的方式再次打开文件,写若干条完整记录,hsync落盘,并且更新ack文件。(以上异常出现在这一步)
    4. 对文件内容和预期内容做相等断言
    

    以上所有操作都是在一个线程中完成,使用同一个FileSystem的实例对象,因此dfs client是同一个,之所以提到这个是因为hdfs的租约管理是以dfs client以及inode id来定位的。

    异常原因

    这个异常的是由lease机制导致的,由namenode上rpc server(对应NameNodeRpcServer的实现)抛出来的。

    hdfs通过lease机制来保证同一个文件某一时刻只会有一个客户端执行写操作,client端调用append和create方法或者是FileSystem # truncate时,rpc server端都会添加一个新的lease。当前client调用create之后获得了lease,就不可以再调用append了,可以看看下面的抛出异常位置的代码:

    //这个方法rpc server端相应create和append时都会调用
    boolean recoverLeaseInternal(RecoverLeaseOp op, INodesInPath iip,
          String src, String holder, String clientMachine, boolean force)
          throws IOException {
        assert hasWriteLock();
        INodeFile file = iip.getLastINode().asFile();
         //isUnderConstruction(),UnderConstruction状态表示文件正在被某个客户端写(append,truncate,write)
        if (file.isUnderConstruction()) {
          //holder是lease拥有者,也就是尝试写文件的客户端的name,对应DFSClient的name字段。
          Lease lease = leaseManager.getLease(holder);
    
          if (!force && lease != null) {
            /*获得当前写操作的文件的lease,如果这个lease和holder拥有的lease是一样的,表示这个文件之前已经由holder这个客户端写,之前的那次写请求使他获得了lease,那么这个时候再调用一次写就会抛这个异常(也就是本文开始的异常).
            这是可以理解的,即便是同一个用户也不应该同时使用多个写接口去写文件,这样显然会导致写的内容不正确
            */
            Lease leaseFile = leaseManager.getLease(file);
            if (leaseFile != null && leaseFile.equals(lease)) {
              // We found the lease for this file but the original
              // holder is trying to obtain it again.
              throw new AlreadyBeingCreatedException(
                  op.getExceptionMessage(src, holder, clientMachine,
                      holder + " is already the current lease holder."));
            }
          }
          //
          // Find the original holder.
          //
          FileUnderConstructionFeature uc = file.getFileUnderConstructionFeature();
          String clientName = uc.getClientName();
          lease = leaseManager.getLease(clientName);
          //这是另外一种情况,尝试写文件的用户确没有lease(可能是lease过期了),那用户就不能写文件。
          if (lease == null) {
            throw new AlreadyBeingCreatedException(
                op.getExceptionMessage(src, holder, clientMachine,
                    "the file is under construction but no leases found."));
          }
          if (force) {
            // close now: no need to wait for soft lease expiration and 
            // close only the file src
            LOG.info("recoverLease: " + lease + ", src=" + src +
              " from client " + clientName);
            return internalReleaseLease(lease, src, iip, holder);
          } else {
            assert lease.getHolder().equals(clientName) :
              "Current lease holder " + lease.getHolder() +
              " does not match file creator " + clientName;
            //
            // If the original holder has not renewed in the last SOFTLIMIT 
            // period, then start lease recovery.
            //
            if (lease.expiredSoftLimit()) {
              LOG.info("startFile: recover " + lease + ", src=" + src + " client "
                  + clientName);
              if (internalReleaseLease(lease, src, iip, null)) {
                return true;
              } else {
                throw new RecoveryInProgressException(
                    op.getExceptionMessage(src, holder, clientMachine,
                        "lease recovery is in progress. Try again later."));
              }
            } else {
              final BlockInfo lastBlock = file.getLastBlock();
              if (lastBlock != null
                  && lastBlock.getBlockUCState() == BlockUCState.UNDER_RECOVERY) {
                throw new RecoveryInProgressException(
                    op.getExceptionMessage(src, holder, clientMachine,
                        "another recovery is in progress by "
                            + clientName + " on " + uc.getClientMachine()));
              } else {
                throw new AlreadyBeingCreatedException(
                    op.getExceptionMessage(src, holder, clientMachine,
                        "this file lease is currently owned by "
                            + clientName + " on " + uc.getClientMachine()));
              }
            }
          }
        } else {
          return true;
         }
      }
    

    再回顾我的调用链:

    FileSystem # create -> 获得lease

    FsDataOutputStream # close -> lease失效

    FileSystem # truncate -> 获得lease

    FileSystem # append -> 获得lease的情况下,再次尝试新的接口写,抛出异常。

    但是也有例外,比如下面这样的调用就不会出现这种异常:

    1. truncate(path, 1) -> truncate(path, 1) . 连续两次的trunc成同样长度,是不会走到recoverLeaseInternal方法调用的,因为判断文件已经是那么大之后就直接返回,不做trunc。
    2. create(path, true),指定true表示文件存在时overwrite,这个时候就算之前有client获得lease,指定overwrite会把之前文件删除,一并清除lease,所以也不会报错。

    后续: 尽管知道了是因为truncate导致了append的异常,但是却不知道怎么remove掉lease,最后的解决办法居然是等待lease过期,因为truncate操作不会不停的renew lease(续约)。

    相关文章

      网友评论

        本文标题:hdfs append的AlreadyBeingCreatedE

        本文链接:https://www.haomeiwen.com/subject/tvjxrxtx.html