03 HBase配置经验杂谈

作者: 逸章 | 来源:发表于2020-02-07 16:58 被阅读0次

03 HBase配置经验杂谈
云平台配置4：配置完全分布式的HBase
Hbase1.4.12安装配置
《HBase不睡觉书》——客户端API（基础版）
HBase部署
Hbase学习需要注意的几点建议
hbase高可用（HA）配置
4.搭建HBase
Python连接Hbase
HBase 在Linux&Mac 下的安装和配置

--如果存在backup Master ，则make the primary Master fail fast
If the primary Master loses its connection with ZooKeeper, it will fall into a loop where it keeps trying to reconnect. Disable this functionality if you are running more than one Master: i.e. a backup Master. Failing to do so, the dying Master may continue to receive RPCs though another Master has assumed the role of primary（没有必要再让它不断的接收RPC消息）

--ZooKeeper Configuration
zookeeper.session.timeout默认值是3分钟，this means that if a server crashes,it will be three minutes before the Master notices the crash and starts recovery

--HDFS Configurations
dfs.datanode.failed.volumes.tolerated This is the "...number of volumes that are allowed to fail before a DataNode stops offering service. By default any volume failure will cause a datanode to shutdown" from the hdfs-default.xml description. You might want to set this to about half the amount of your available disks

--hbase.regionserver.handler.count
This setting defines the number of threads that are kept open to answer incoming requests to user tables. The rule of thumb is to keep this number low when the payload per request approaches the MB (big puts, scans using a large cache) and high when the payload is small (gets, small puts, ICVs,deletes).
这个设置如果配置的过高是很为危险的，因为the aggregate size of all the puts that are currently happening in a region server may impose too much pressure on its memory, or even trigger an OutOfMemoryError. A RegionServer running on low memory 将会触发它的JVM’s garbage collector to run more frequently up to a point where GC pauses become noticeable

--ColumnFamily Compression
You should consider enabling ColumnFamily compression. There are several options that are near-frictionless and in most all cases boost performance by reducing the size of StoreFiles and thus reducing I/O.

--Configuring the size and number of WAL files
HBase uses wal to recover the memstore data (MemStore 是 HBase 非常重要的组成部分，MemStore 作为 HBase 的写缓存，保存着数据的最近一次更新，同时是HBase能够实现高性能随机读写的重要组成。作为内存缓存，读取数据时会优先检查 MemStore，根据局部性原理，新写入的数据被访问的概率更大。注意，HBase Table 的每个 Column family 维护一个 MemStore，当满足一定条件时 MemStore 会执行一次 flush，文件系统中生成新的 HFile。而每次 Flush 的最小单元是 Region，而不是单个 MemStore)that has not been flushed to disk in case of an RS failure. These WAL files should be configured to be slightly smaller than HDFS block (by default a HDFS block is 64Mb and a WAL file is ~60Mb).

--Managed Splitting
HBase generally handles splitting your regions. A simplistic view of splitting is that when a region grows to hbase.hregion.max.filesize*, it is split. For most use patterns, most of the time, you should use automatic splitting.
当然，我们也可以选择自己管理splitting.

--Managed Compactions
By default, major compactions are scheduled to run once in a 7-day period. Prior to HBase 0.96.x, major compactions were scheduled to happen once per day by default.

--Balancer
The balancer is a periodic operation which is run on the master to redistribute regions on the cluster. It is configured via hbase.balancer.period and defaults to 300000 (5 minutes).

--Better Mean Time to Recover (MTTR)
about configurations that will make servers come back faster after a fail

--JMX
JMX (Java Management Extensions) provides built-in instrumentation that enables you to monitor and manage the Java VM. Currently it supports Master and RegionServer Java VM