参考 HBase: Designed for Distribution, Scale, and Speed
1. RDBMS难以扩展,分布式的join操作使得事务操作困难。但是hbase将需要访问的数据都放到一起,不需要做join操作,所以可以横向扩展
HBase was designed to scale due to the fact that data that is accessed together is stored together
2. hbase是google bigtable的实现
HBase is actually an implementation of the BigTable storage architecture, which is a distributed storage system developed by Google that’s used to manage structured data that is designed to scale to a very large size.
3. column families分别存储在在不同的存储文件中,读取数据的时候是分开来访问的
Column families are stored in separate files, which can be accessed separately.
4. hbase里数据存储的logical model(数据存储在hbase表的cell 里,如果value是有值的话,每一个cell里都包含rowkey,column family,column name,timestamp和value)注:coordinates意思是坐标
5. logical model和physical storage的关系
6. 表里存储的数据是稀疏的,即列里面的对应的value是没有值的话,数据是不存储在数据文件中的,而且每个有值的数据是有版本(version)的。put操作是数据的创建和更新操作,会为数据增加一个verison,删除操作会为数据增加一个tombstone marker,当数据被查询的时候,tombstone marker会阻止数据的返回
网友评论