美文网首页
HBase数据导入-Bulkload问题

HBase数据导入-Bulkload问题

作者: 大闪电啊 | 来源:发表于2018-12-14 11:10 被阅读0次

    使用bulkload导入,执行以下命令

    hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/pingz/data/$logDate $tableName
    

    最终执行速度特别慢,300G数据需要半个小时左右,原因是生成的hfile数据没有直接move,而是采用了copy的方式,主要跟集群是federation方式有关。

    具体 参照强哥提交的issues https://issues.apache.org/jira/browse/HBASE-17429?jql=project%20%3D%20HBASE%20AND%20text%20~%20viewfs

    最终调用的是HRegionFileSystem的bulkLoadStoreFile方法

    Path bulkLoadStoreFile(final String familyName, Path srcPath, long seqNum)
          throws IOException {
        // Copy the file if it's on another filesystem
        FileSystem srcFs = srcPath.getFileSystem(conf);
        FileSystem desFs = fs instanceof HFileSystem ? ((HFileSystem)fs).getBackingFs() : fs;
    
        // We can't compare FileSystem instances as equals() includes UGI instance
        // as part of the comparison and won't work when doing SecureBulkLoad
        // TODO deal with viewFS
        if (!FSHDFSUtils.isSameHdfs(conf, srcFs, desFs)) {
          LOG.info("Bulk-load file " + srcPath + " is on different filesystem than " +
              "the destination store. Copying file over to destination filesystem.");
          Path tmpPath = createTempName();
          FileUtil.copy(srcFs, srcPath, fs, tmpPath, false, conf);
          LOG.info("Copied " + srcPath + " to temporary path on destination filesystem: " + tmpPath);
          srcPath = tmpPath;
        }
    return commitStoreFile(familyName, srcPath, seqNum, true);
      }
    

    数据生成的结果目录是放在/user目录下,该目录挂载到XXNameNode2,而HBase目录是挂载到XXNameNode,

    具体解决办法一个是升级HBase或者打一个patch,或者是新建一个目录,挂载到和HBase同一个namespace下

        <property>
           <name>fs.viewfs.mounttable.XX.link./user</name>
           <value>hdfs://XXNameNode2:8020/user</value>
        </property>
    
        <property>
           <name>fs.viewfs.mounttable.XX.link./hbase</name>
           <value>hdfs://XXNameNode:8020/hbase</value>
        </property>
    
        <property>
           <name>fs.viewfs.mounttable.XX.link./bulkload</name>
           <value>hdfs://XXNameNode:8020/bulkload</value>
        </property>
    
    

    同时修改脚本

    hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles hdfs://XXNameNode:8020/bulkload/appdata/$logDate $tableName
    

    参考1

    相关文章

      网友评论

          本文标题:HBase数据导入-Bulkload问题

          本文链接:https://www.haomeiwen.com/subject/dbbwhqtx.html