使用bulkload导入,执行以下命令
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/pingz/data/$logDate $tableName
最终执行速度特别慢,300G数据需要半个小时左右,原因是生成的hfile数据没有直接move,而是采用了copy的方式,主要跟集群是federation方式有关。
具体 参照强哥提交的issues https://issues.apache.org/jira/browse/HBASE-17429?jql=project%20%3D%20HBASE%20AND%20text%20~%20viewfs
最终调用的是HRegionFileSystem的bulkLoadStoreFile方法
Path bulkLoadStoreFile(final String familyName, Path srcPath, long seqNum)
throws IOException {
// Copy the file if it's on another filesystem
FileSystem srcFs = srcPath.getFileSystem(conf);
FileSystem desFs = fs instanceof HFileSystem ? ((HFileSystem)fs).getBackingFs() : fs;
// We can't compare FileSystem instances as equals() includes UGI instance
// as part of the comparison and won't work when doing SecureBulkLoad
// TODO deal with viewFS
if (!FSHDFSUtils.isSameHdfs(conf, srcFs, desFs)) {
LOG.info("Bulk-load file " + srcPath + " is on different filesystem than " +
"the destination store. Copying file over to destination filesystem.");
Path tmpPath = createTempName();
FileUtil.copy(srcFs, srcPath, fs, tmpPath, false, conf);
LOG.info("Copied " + srcPath + " to temporary path on destination filesystem: " + tmpPath);
srcPath = tmpPath;
}
return commitStoreFile(familyName, srcPath, seqNum, true);
}
数据生成的结果目录是放在/user目录下,该目录挂载到XXNameNode2,而HBase目录是挂载到XXNameNode,
具体解决办法一个是升级HBase或者打一个patch,或者是新建一个目录,挂载到和HBase同一个namespace下
<property>
<name>fs.viewfs.mounttable.XX.link./user</name>
<value>hdfs://XXNameNode2:8020/user</value>
</property>
<property>
<name>fs.viewfs.mounttable.XX.link./hbase</name>
<value>hdfs://XXNameNode:8020/hbase</value>
</property>
<property>
<name>fs.viewfs.mounttable.XX.link./bulkload</name>
<value>hdfs://XXNameNode:8020/bulkload</value>
</property>
同时修改脚本
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles hdfs://XXNameNode:8020/bulkload/appdata/$logDate $tableName
网友评论