Hbase 架构

image.png
读取数据的顺序

client先去zookeeper中找表和rowkey的位置
Hbase 自带的data/default/meta 表中保存了,系统自带的元数据空间
store：
- memstore 中读，再去storefile（对hfile进行轻量级包装，数据存储）拿数据
- hlog（hbase中预写日志的存储格式）相当于Hadoop中的sequence file
regionserver
- 内部管理了一系列的HRegion对象，每个HRegion中有多个Hstore，每个Hstore对应table中的一个column family存储
- Hstore是Hbase的核心，有两部分组成，一部分是Memstore，一部分是storeFile，，用户写入的数据先放入Memstore，当Memstore满了以后会flush成一个StoreFile（底层实现是HFile）
Memstore&StoreFile
- client 写入，存入Memstore->满 flush成一个StoreFile-> 直至增加到一定阈值->触发compact合并操作，多个StoreFile合并成一个StoreFile,同时进行版本合并和数据删除，单个StoreFile达到一定程度以后，触发split操作，将当前region slit成两个region，使得原先压力缓解
- HBase只是增加数据，有更新和删除的操作都是在compact中实现的，所以保证I/O高性能
Hlog文件结构
- 类似于mysql中的binlog

HBase架构剖析

客户端client
- 整个HBase集群的访问入口
- 使用HBase RPC机制与HMaster和HRegionServer进行通信
- 与HMaster进行通信进行管理类操作
- 与HRegionServer进行读写类操作
- 包含访问HBase的接口，并维护cache来加快对HBase的访问
zookeeper
- 保证任何时候集群中只有一个HMaster
- 存贮所有HRegion的寻址入口
- 实时监控HRegionServer的上线和下线信息，并通知给HMaster
- 存储HBase的schema和元数据信息
- zookeeper 存储meta地址和HMaster地址
主节点HMaster
- 没有单点问题，可以启动多个，由于zookeeper的存在，只会有一个活动的节点
- 管理用户的对table的增删改查操作
- 管理HRegionServer的负载均衡，调整Region的分布
- region split后，负责新region的分布
- 在HRegionServer停机后，负责对HRegionServer上region的迁移工作
从节点 HRegionServer
- 维护HRegion，负责HRegion的请求，想HDFS文件系统中读写数据
- 负责切分运行过程中变得过大的hregion
- client访问hbase上的数据并不需要master参与，寻址访问zookeeper和HRegionServer，数据读写访问HRegionServer，HMaster仅仅维护table和HRegion元数据信息，负载很低

HBase java API 使用

在本地打开eclipse，首先对pom.xml 文件加上依赖

<groupId>org.aphche.hbase</groupId>
   <artifactId>hbase-server</artifactId>
   <version>${hbase.version}</version>
</dependency>

<dependency>
<groupId>org.aphche.hbase</groupId>
   <artifactId>hbase-client</artifactId>
   <version>${hbase.version}</version>
</dependency>

新建包-Hbase，新建HbaseOperation
将hadoop中的配置文件拖到source目录中,刷新

hdfs-site.xml
core-site.xml
hbase-site.xml

public class HBaseOperation{

    public static getHTableByTableName(string tableName) throws Exception {
        Configuration configuration =HBaseConfiguration.create();
        HTable table=new HTable(configuration ,tableName);
        return table;
  }
    public static void main(string[] args){
        String tablename='user';
        HTable table=getHTableByTableName(tableName);
        Put put =new put(Bytes.toBytes("10004"));
        put.add(Bytes.toBytes("info"),
        put.add(Bytes.toBytes("name"),
        put.add(Bytes.toBytes("gallin")
);
        Get get =new get (Bytes.toBytes("10002"));
        result result=table.get(get);
        for (Cell cell :result.rawCells(){
        system.out.println(
            Bytes.tostring（cellUtil.cloneFamily(cell)）+":"
            +Bytes.tostring（cellUtil.cloneQualifier(cell)）+"->"
            Bytes.tostring（cellUtil.cloneValue(cell)）;

}
        table.close（）
  }
}

以上命令需要为速记，为get查询和put插入
scan

    public static void main(string[] args) throws Exception{
       String tablename='user';
       HTable table=null;
       try{
            table=getHTableByTableName(tableName);
       Scan scan=new Scan();
//Range 
       scan.setStartRow(Bytes.toBytes("10001"));
       scan.setStopRow(Bytes.toBytes("10003"));
       scan.setFilter(filter);//比较麻烦，可能会减慢速度
       for(result result:resulrScanner){
             system.out.println(Bytes.tostring(result.getRow()));
             system.out.println(result);
}
}catch(Exception e ){
         e.printStackTrace();
}finally {
         IOUtils.closeStream(resultScanner);
         IOUtils.closeStream(table);
}
}