HBase的依赖
这里只讨论分布式环境。
HDFS(实际上是非必须的,比如运行在Amazon S3上)、JDK、Zookeeper(注意HBase2.0正在试图去除它对ZooKeeper的依赖,它准备用HMaster来取代当前Zookeeper做的track some of the information工作)。
部署了HBase的节点都需要运行HDFS,但是它不要求所有运行HDFS的节点上都运行HBase,但是推荐也运行,不然可能引起 unbalanced 情况
一、HBase 原则
表的格式
-
HBase里面有两种表: the system tables 和 the user tables.
- Systems tables are used internally by HBase to keep track of meta information like the table’s access control lists (ACLs), metadata for the tables and regions, name‐spaces, and so on. 这部分知识如果你不了解问题不是太大
- User tables are what you will create for your use cases. They will belong to the default name‐space unless you create and use a specific one.
-
统一一下术语:
- 一个 row由多个columns构成,这些列都是用相同的key来做索引的
- 一个column和一个key 构成了一个 cell。相同的cell可能会有多个versions,它们只是timestamp不同。
- 一个cell也可以被称呼为一个KeyValue pair。一行是由a group of cells构成的
提示:可以在插入数据的时候动态产生列
- 为了达到快速访问的目的, keys and columns 除了会依据字母顺序存储在table中,也会存储在内存中
表的存储
image.png注意上图所说block和hdfs的block是两回事,HDFS的一个Block可以包含多个HFile Block
-
每个 region都有一个start key 一个end key ,它们定义了这个region的边界,这些信息被存在region中同时也存在hbase:meta table (or .META. , for versions of HBase prior to 0.96)。当region过大 ,这个region会被 split,当然在需要的时候它们也会被merged.
-
对于同一个Region来说,不同的Column Families会把数据存在不同的文件中。比如:如果你要存在一个人的简介和它的图片,则应当用2个列簇,用于简介的需要压缩,而图片列则不压缩;再比如,如果有一种数据你写得多,但是读得少,另外一个是读的多,写的少,则你应当分成2个列簇;对有有相同数据格式和访问模式的数据,你应当把它们放在一列
HFiles
- 当memstores满了,必须要flushed到disk时,就会产生HFiles。 随着时间推移,HFiles 最终被压缩成bigger files。HFiles 由不同类型的blocks组成 (e.g.,index blocks and data blocks),HFiles 是存储在HDFS中的,这样就能得到Hadoop persistence and replication这些好处.
Block
Larger blocks 产生的index values相对来说较少,对表的顺序访问是有利的,而smaller blocks则会产生更多的index values,对表的随机访问比较有利
Block的几种类型说明:
- Data blocks--它存储的是用户KeyValue数据
- Index blocks--当要寻找一个特定row时, index blocks are used by HBase to quickly jump to the right location in an HFile.
- Bloom filter block--存储bloom filter index 相关信息. Bloom filter 是由 Howard Bloom 在 1970 年提出的二进制向量数据结构,它具有很好的空间和时间效率,被用来检测一个元素是不是集合中的一个成员
- Trailer block--主要记录了HFile的基本信息以及各个部分的偏移量等
Cells
each column will be stored individually instead of storing an entire row on its own。Because those values can be inserted at different time, they might end up in different files in HDFS.
HBase 有 key compression机制,At a high level, only the delta between the current and the previous key is stored. 对于所有表来说,compression能节约空间,但是需要从前一个key中rebuild出current key也是需要一点点代价的。由于columns 是分开存储的,对于wide table来说,许多列都会用到相同的key,所以key大小的减少能减少很多空间,这个特性叫做Data block encoding
Data block encoding
这是一种 HBase feature,keys 会基于前一个key被 encoded 和 compressed. One of the encoding options ( FAST_DIFF ) asks HBase to store only the difference between the current key and the previous one. HBase stores each cell individually, with its key and value. When a row has many cells, much space can be consumed by writing the same key for each cell. Therefore, activating the data block encoding can allow important space saving. It is almost always helpful to
activate data block encoding, so if you are not sure, activate
FAST_DIFF
Internal Table Operations
HBase水平扩展基于3个机制:压缩(compactions), 分割(splits 和compaction相反), 和负载均衡( balancing).
Compaction
当memstore full以后,会flush到磁盘上,进而在HDFS上生产很多小的文件,当满足一定条件后,HBase 会选择一些文件,然后把他它们compacted together into 一个大文件
有两种类型的压缩:
- Minor compaction--HBase只选择一部分HFiles进行压缩。默认情况下,当当前Region有3个及以上HFiles时候就会进行压缩。
- Major compaction--all the files are elected to be compacted together. 由于要读和写所有的数据,所以major compactions对I/O消耗比较大,所以对俱全响应时间和SLA有大的影响。因此,推荐不使用automatic major compactions,你可以手工触发或者使用cron在合适的时间执行major compaction
if you really have a very big cluster with many tables and regions, it
is recommended to implement a process to check the number of files per regions and the age of the oldest one and trigger the compactions at the region level only if there are more files than you want or if the oldest one (even if there is just one file) is older than a configured period (a week is a good starting point).
Splits (Auto-Sharding)
Split operations are the opposite of compactions.
在compact过程中,如果not too many values are dropped,则会产生一个大文件,如果输入的文件越大,则parse的时间越长
When one of the column families of a region(一个Region可以包含多个Column Families) reaches this size(HBase0.94默认值为10G), to improve balancing of the load, HBase will trigger a split of the given region into two new regions.(可以理解为每个Column Family占据的空间虽然很小,但是如果其中有个Column Family过大,也会导致整个Region被重新分割)
记住HBase will split all the column families. Even if your first column reached the 10 GB threshold but the second one contains only a few rows or kilobytes, both of them will be split.
同一行的不同Column不会被split到不同region中,所以请注意如果你有很多很多columns ,或者他们都非常大,以至于单行都比the maximum configured size还要大,此时HBase不会split它们
Split是有代价的,当一个Region被分割后,它将会失去locality ,直到下一次compaction。而这会影响read性能,because the client will reach the RegionServer hosting the region, but from there, the data will have to be queried over the network to serve the request. Also, the more regions you have, the more you put pressure on the master, the hbase:meta table, and the region services.
Balancing
Regions的split, servers might fail, and new servers might join the cluster, so at some point the load may no longer be well distributed across all your RegionServers. To help maintain a good distribution on the cluster, every five minutes (default configured schedule time), the HBase Master will run a load balancer to ensure that all the
RegionServers are managing and serving a similar number of regions.(尽量管理相同数量的Region)
image.pngWhen a region is moved by the balancer from one server to a new one, it will be unavailable for a few milliseconds, and it will lose its data locality until it gets major compacted.
HBase Roles
image.pngHMaster
可以在一个集群里面有多个HMaster,HMaster不同与RegionServer,它没有太多负担,你可以在一个内存小,核数不多的机器上安装它。
RegionServers所在机器的disks一般不用配置为RAID or dual power supplies, 但是构建更加可靠的HBase Masters则是必要的,Building HBase Masters (和其他Master服务比如NameNodes, ZooKeeper, etc.) on robust hardware with OS on RAID drives, dual power supply, etc. is highly recommended.
A cluster can survive without a master server as long as there is no RegionServer failing nor regions splitting(只要过程中没有RegionServer挂掉,也没有Region splitting,集群里面可以没有HMaster)
RegionServer
注意:对HBase数据的读写不一定要每次都通过HMaster,更多时候他都可以直接访问RegionServer.
When a client tries to read data from HBase for the first time, it will first go to ZooKeeper to find the master server and locate the hbase:meta region where it will locate the region and RegionServer it is looking for. In subsequent calls from the same client to the same region, all those extra calls are skipped, and the client will talk directly with the related RegionServer. This is why it is important, when possible, to reuse the same client for multiple operations.
RegionServer技术上可以在一台物理主机上,但是通常建议一台物理主机只部署最多一个RegionServer
二、HBase Ecosystem
1、监控工具
Hadoop/HBase通过XML files来做配置的,这样你就可以手工安装,我们一般使用automated configuration management tools比如Puppet or Chef,同时配合监控工具,比如Ganglia 或者 Cacti.
That said(尽管如此), 在Hadoop ecosystem中,有两种工具可以帮助部署HBase集群:Cloudera Manager和Apache Ambari. 这两个工具都能实现deploying、monitoring和managing全部的Hadoop suite
2、SQL
Hadoop市场提供SQL功能的工具大多数都主要为了提供BI(business intelligence)功能 。
2.1 Phoenix
Even with only a couple years in the Apache Foundation, Phoenix has seen a nice adoption rate and is quickly becoming the de facto tool for SQL queries. Phoenix主要竞争对手是Hive和Impala(Impala是Cloudera公司主导开发的新型查询系统,它提供SQL语义,能查询存储在Hadoop的HDFS和HBase中的PB级大数据。已有的Hive系统虽然也提供了SQL语义,但由于Hive底层执行使用的是MapReduce引擎,仍然是一个批处理过程,难以满足查询的交互性。相比之下,Impala的最大特点也是最大卖点就是它的快速)
Phoenix has been able to establish itself as a superior tool through tighter integration by leveraging HBase coprocessors, range scans, and custom filters. Hive and Impala were both built for full file scans in HDFS, which can greatly impact performance because HBase was designed for single point gets and range scans.
Finally, Hive and Apache Impala are storage engines both designed to run full table or partitioned scans against HDFS. Hive and Impala both have HBase storage handlers allowing them to connect to HBase and perform SQL queries. These systems tend to pull more data than the other systems, which will greatly increase query times. Hive or Impala make sense when a small set of reference data lives in HBase, or when the queries are not bound by SLAs.
网友评论