集群中的每个节点的conf内容都要保持同步(一致)
注意Linux上最大文件和进程数限制(Limits on Number of Files and Processes ):
1、Apache HBase 是一个数据库,它需要具备一次打开很多文件的能力. 许多Linux版本限制一个user能打开的文件书为 1024 (老一些OS X版本则限制为256个). 你可以用你准备运行HBase的用户登陆后通过运行 ulimit -n 来查看 image.png
Each ColumnFamily has at least one StoreFile, and possibly more than six StoreFiles if the region is under load. The number of open files required depends upon the number of ColumnFamilies and the number of regions. The following is a rough formula for calculating the potential number of open files on a RegionServer:
(StoreFiles per ColumnFamily) x (regions per RegionServer)
2、Another related setting is the number of processes a user is allowed to run at once. In Linux and Unix, the number of processes is set using the ulimit -u command. This should not be confused with the nproc command, which controls the number of CPUs available to a given user. Under load, a ulimit -u that is too low can cause OutOfMemoryError exceptions image.png
For example, assuming that a schema had 3 ColumnFamilies per region with an average of 3 StoreFiles per ColumnFamily, and there are 100 regions per RegionServer, the JVM will open 3 * 3 * 100 = 900 file descriptors, not counting open JAR files, configuration files, and others.
打开一个文件不会 take many resources, 允许用户打开太多文件的风险不是很大
注意上述这两个配置是操作系统的配置,不是在HBase中配置的
注意,在实际使用中,要注意一个HDFS DataNode 一次serve可以拥有的文件数量上限,否则当文件数量过多时可能会发生奇怪的报错,你可以通过配置Hadoop的conf/hdfs-site.xml来设定,至少设置为如下数值:
<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>4096</value>
</property>
HBase 有两种运行模式: standalone 和 distributed。注意,Out of the box, HBase 是运行在standalone模式下的。无论哪种模式,你至少都要编辑conf/hbase-env.sh 里面Java选项,从而告知HBase 使用哪个Java
一、HBase standalone instance 安装
这种模式下,只需要单机,HBase 不使用 HDFS ,它只使用本地文件系统; 这种模式下,所有的HBase daemons 和 a local ZooKeeper都在一个JVM里面运行
1、安装
图片.pngtar -zxvf hbase-1.4.12-bin.tar.gz
图片.png
2、配置
A 配置hbase-env.sh
B 配置/home/yay/software/hbase-1.4.12/conf/hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<!--如果使用HDFS,则设置hbase.rootdir为一个实例目录,比如hdfs://namenode.example.org:8020/hbase-->
<name>hbase.rootdir</name>
<value>file:///home/yay/hbasedata</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/yay/zookeeperdataforhbase</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
<description>
Controls whether HBase will check for stream capabilities (hflush/hsync).
Disable this if you intend to run on LocalFileSystem, denoted by a rootdir
with the 'file://' scheme, but be mindful of the NOTE below.
WARNING: Setting this to false blinds you to potential data loss and
inconsistent system state in the event of process and/or node failures. If
HBase is complaining of an inability to use hsync or hflush it's most
likely not a false positive.
</description>
</property>
</configuration>
3、启动HBase
图片.png如果你管理自己的 ZooKeeper, 你可以手动启动它并确保它运行成功,否则HBase在它启动过程中会启动它自带的ZooKeeper
4、 启动HBase shell
图片.png创建表
yay@yay-ThinkPad-T470:~/software/hbase-1.4.12$ start-hbase.sh
running master, logging to /home/yay/software/hbase-1.4.12/logs/hbase-yay-master-yay-ThinkPad-T470.out
yay@yay-ThinkPad-T470:~/software/hbase-1.4.12$ hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.12, r6ae4a77408ad35d6a7a4e5cebfd401fc4b72b5ec, Sun Nov 24 13:25:41 CST 2019
hbase(main):001:0> create 'test','cf'
0 row(s) in 1.5780 seconds
=> Hbase::Table - test
hbase(main):002:0> list 'test'
TABLE
test
1 row(s) in 0.0270 seconds
=> ["test"]
hbase(main):003:0> describe 'test'
Table test is ENABLED
test
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP
_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMP
RESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6', REPLICATION_SCOPE => '0'}
1 row(s) in 0.0340 seconds
hbase(main):004:0>
插入数据
hbase(main):004:0> put 'test','row1','cf:a','value1'
0 row(s) in 0.0660 seconds
hbase(main):005:0> put 'test','row2','cf:b','value2'
0 row(s) in 0.0100 seconds
hbase(main):006:0> put 'test','row3','cf:c','value3'
0 row(s) in 0.0030 seconds
hbase(main):007:0>
获取数据
用Scan命令一次获取所有数据
hbase(main):007:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1580477510192, value=value1
row2 column=cf:b, timestamp=1580477534590, value=value2
row3 column=cf:c, timestamp=1580477545244, value=value3
3 row(s) in 0.0170 seconds
hbase(main):008:0>
获取单行数据
hbase(main):008:0> get 'test','row1'
COLUMN CELL
cf:a timestamp=1580477510192, value=value1
1 row(s) in 0.0220 seconds
hbase(main):009:0>
表中总共有多少行数据
hbase(main):017:0> count 'test'
3 row(s) in 0.0860 seconds
=> 3
删除表中一行(deleteall)
hbase(main):034:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1581082901839, value=value1
row2 column=cf:b, timestamp=1581082901854, value=value2
row3 column=cf:c, timestamp=1581082901866, value=value3
row4 column=cf:d, timestamp=1581082901870, value=value4
4 row(s) in 0.0390 seconds
hbase(main):036:0> deleteall 'test','row2'
0 row(s) in 0.0410 seconds
hbase(main):037:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1581082901839, value=value1
row3 column=cf:c, timestamp=1581082901866, value=value3
row4 column=cf:d, timestamp=1581082901870, value=value4
3 row(s) in 0.0150 seconds
hbase(main):038:0>
Disable表和删除表
删除表之前需要先disable表
hbase(main):011:0> disable 'test'
0 row(s) in 2.2920 seconds
hbase(main):012:0> enable 'test'
0 row(s) in 1.2980 seconds
hbase(main):013:0> drop 'test'
ERROR: Table test is enabled. Disable it first.
Here is some help for this command:
Drop the named table. Table must first be disabled:
hbase> drop 't1'
hbase> drop 'ns1:t1'
hbase(main):014:0> disable 'test'
0 row(s) in 2.2420 seconds
hbase(main):015:0> drop 'test'
0 row(s) in 1.2570 seconds
hbase(main):016:0>
退出并停止HBase
hbase(main):016:0> quit
yay@yay-ThinkPad-T470:~/software/hbase-1.4.12$ stop-hbase.sh
stopping hbase.....................
yay@yay-ThinkPad-T470:~/software/hbase-1.4.12$ jps
3804 Jps
yay@yay-ThinkPad-T470:~/software/hbase-1.4.12$
Distributed
Distributed mode 分为两种,一种是distributed but all daemons run on a single node — a.k.a pseudo-distributed ; 另外一种是fully-distributed (the daemons are spread across all nodes in the cluster). Pseudo-distributed vs. fully-distributed 命名(nomenclature )方法源自 Hadoop.
二、HBase的Pseudo-Distributed Local Install安装
伪分布式模式可以运行在本地文件系统,也可以运行在HDFS的一个实例上
Pseudo-distributed mode 是说HBase still runs completely on a
single host,但是每一个HBase daemon (HMaster, HRegionServer, and ZooKeeper) 都作为一个独立的进程在运行: in standalone mode all daemons ran in one jvm process/instance. By default, unless you configure the hbase.rootdir property , your data is still stored in /tmp/. 在下面的例子中, 我们把数据存储在 HDFS 中(
当然你也可以忽略HDFS configuration,继续把数据存储在本地文件系统)
2.1 配置hbase-site.xml
主要是配置hbase.rootdir和hbase.cluster.distributed两个属性
<configuration>
<property>
<name>hbase.rootdir</name>
<!--value>file:///home/yay/temp/hbasetemp/hbase</value-->
<!--不需要使用hdfs dfs -mkdir /hbase来创建目录,HBase自己会完成这个操作-->
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/yay/zookeepertemp/tempforhbase</value>
</property>
<!-- 设置HMaster的rpc端口, 由于采用的是HA模式,这里只写端口就可以了,不需要再写主机名-->
<property>
<name>hbase.master.port</name>
<value>60000</value>
</property>
<!-- 设置HMaster的http web console端口 -->
<property>
<name>hbase.master.info.port</name>
<value>16010</value>
</property>
<!--directs HBase to run in distributed mode, with one JVM instance per daemon-->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
</configuration>
2.2 启动HDFS
yay@yay-ThinkPad-T470:~$ hdfs namenode -format
...
yay@yay-ThinkPad-T470:~/software/hadoop/hadoop-3.2.1/bin$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [yay-ThinkPad-T470]
yay@yay-ThinkPad-T470:~/software/hadoop/hadoop-3.2.1/bin$ jps
10400 SecondaryNameNode
10529 Jps
10125 DataNode
9934 NameNode
再介绍几个命令: 图片.png
2.3 启动HBase
图片.png接着你可以检查创建的HBase目录(hadoop fs 和 hdfs dfs两者效果是一样的): 图片.png
2.4 测试创建表
yay@yay-ThinkPad-T470:~/software/hadoop/hadoop-3.2.1/bin$ hbase shell
...
hbase(main):001:0> create 'hello','fc'
0 row(s) in 2.6600 seconds
=> Hbase::Table - hello
hbase(main):002:0> put 'hello','rowkey1','fc:a','first row a'
0 row(s) in 0.0930 seconds
hbase(main):003:0> put 'hello','rowkey2','fc:b','first row b'
0 row(s) in 0.0110 seconds
hbase(main):006:0> scan 'hello'
ROW COLUMN+CELL
rowkey1 column=fc:a, timestamp=1580610536845, value=first row a
rowkey2 column=fc:b, timestamp=1580610547546, value=first row b
2 row(s) in 0.0350 seconds
hbase(main):007:0> get 'hello','rowkey2'
COLUMN CELL
fc:b timestamp=1580610547546, value=first row b
1 row(s) in 0.0250 seconds
看看hdfs上存储的hello表:
yay@yay-ThinkPad-T470:~/software$ hdfs dfs -ls /hbase/data
Found 2 items
drwxr-xr-x - yay supergroup 0 2020-02-02 10:28 /hbase/data/default
drwxr-xr-x - yay supergroup 0 2020-02-02 10:05 /hbase/data/hbase
yay@yay-ThinkPad-T470:~/software$ hdfs dfs -ls /hbase/data/default
Found 1 items
drwxr-xr-x - yay supergroup 0 2020-02-02 10:28 /hbase/data/default/hello
yay@yay-ThinkPad-T470:~/software$ hdfs dfs -ls /hbase/data/default/hello
附:简单介绍一下hdfs的get 和 put命令:
cat: hello: 没有那个文件或目录
yay@yay-ThinkPad-T470-W10DG:~$ echo "hello" > a.txt
yay@yay-ThinkPad-T470-W10DG:~$ dir
a.txt examples.desktop software 公共的 图片 音乐
calibre\ 书库 nouse temp 模板 文档 桌面
eclipse-workspace scalaproject zookeepertemp 视频 下载
yay@yay-ThinkPad-T470-W10DG:~$ cat a.txt
hello
yay@yay-ThinkPad-T470-W10DG:~$ hdfs dfs -put a.txt /hello
2020-02-06 17:55:17,561 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
yay@yay-ThinkPad-T470-W10DG:~$ hdfs dfs -ls /hello
Found 1 items
-rw-r--r-- 1 yay supergroup 6 2020-02-06 17:55 /hello/a.txt
yay@yay-ThinkPad-T470-W10DG:~$ rm a.txt
yay@yay-ThinkPad-T470-W10DG:~$ hdfs dfs -get /hello/a.txt .
2020-02-06 17:58:12,433 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
yay@yay-ThinkPad-T470-W10DG:~$ dir
a.txt examples.desktop software 公共的 图片 音乐
calibre\ 书库 nouse temp 模板 文档 桌面
eclipse-workspace scalaproject zookeepertemp 视频 下载
yay@yay-ThinkPad-T470-W10DG:~$ cat a.txt
hello
2.5 可以Start and stop 多个备份 HBase Master (HMaster) server
Running multiple HMaster instances on the same hardware does not make sense in a production environment, in the same way that running a pseudo-
distributed cluster does not make sense for production. This step is offered for
testing and learning purposes only.
每个HMaster 都使用3个 ports (默认为16010, 16020, and 16030)。我们在使用local-master-backup.sh创建备份HMaster时使用offset参数,比如使用offset 2, 则备份HMaster将会使用端口ports 16012, 16022, and 16032.
我们创建3个备份HMster,分别使用offset 2、3、5(即分别使用端口16012/16022/16032, 16013/16023/16033, and 16015/16025/16035):
yay@yay-ThinkPad-T470:~/software$ local-master-backup.sh start 2 3 5
running master, logging to /home/yay/software/hbase-1.4.12/logs/hbase-yay-2-master-yay-ThinkPad-T470.out
running master, logging to /home/yay/software/hbase-1.4.12/logs/hbase-yay-3-master-yay-ThinkPad-T470.out
running master, logging to /home/yay/software/hbase-1.4.12/logs/hbase-yay-5-master-yay-ThinkPad-T470.out
yay@yay-ThinkPad-T470:~/software$ jps
10400 SecondaryNameNode
15858 Jps
15284 HMaster
15428 HMaster
11476 HMaster
11628 HRegionServer
10125 DataNode
11406 HQuorumPeer
15582 HMaster
9934 NameNode
yay@yay-ThinkPad-T470:~/software$ local-master-backup.sh stop 2 3 5
running master, logging to /home/yay/software/hbase-1.4.12/logs/hbase-yay-2-master-yay-ThinkPad-T470.out
stopping master.
running master, logging to /home/yay/software/hbase-1.4.12/logs/hbase-yay-3-master-yay-ThinkPad-T470.out
stopping master.
running master, logging to /home/yay/software/hbase-1.4.12/logs/hbase-yay-5-master-yay-ThinkPad-T470.out
stopping master.
yay@yay-ThinkPad-T470:~/software$ jps
10400 SecondaryNameNode
16033 Jps
11476 HMaster
11628 HRegionServer
10125 DataNode
11406 HQuorumPeer
9934 NameNode
yay@yay-ThinkPad-T470:~/software$
图片.png
2.6 可以Start and stop 多个备份 RegionServers
Each RegionServer requires two ports, 默认为 16020 and 16030. 但是其他 RegionServers 的base 端口是 16200 and 16300 。 You can run 99 additional RegionServers that are not a HMaster or backup HMaster, on a server.
yay@yay-ThinkPad-T470:~/software$ local-regionservers.sh start 2 3 4 5
running regionserver, logging to /home/yay/software/hbase-1.4.12/logs/hbase-yay-2-regionserver-yay-ThinkPad-T470.out
running regionserver, logging to /home/yay/software/hbase-1.4.12/logs/hbase-yay-3-regionserver-yay-ThinkPad-T470.out
running regionserver, logging to /home/yay/software/hbase-1.4.12/logs/hbase-yay-4-regionserver-yay-ThinkPad-T470.out
running regionserver, logging to /home/yay/software/hbase-1.4.12/logs/hbase-yay-5-regionserver-yay-ThinkPad-T470.out
yay@yay-ThinkPad-T470:~/software$ jps
10400 SecondaryNameNode
16449 HRegionServer
16164 HRegionServer
11476 HMaster
16885 Jps
16652 HRegionServer
11628 HRegionServer
16301 HRegionServer
10125 DataNode
11406 HQuorumPeer
9934 NameNode
yay@yay-ThinkPad-T470:~/software$ local-regionservers.sh stop 2 3 4 5
3、 完全分布式
Fully-distributed mode 只能运行与HDFS之上。
除去primary server,一般,这个集群会配置成有多个节点作为RegionServers,ZooKeeper QuorumPeers和 backup HMaster servers
可以使用ifconfig -a来查看本机的IP
3.1 Configure Passwordless SSH Access
目的:确保集群里面几台机器可以相互无密码直接访问(我实验的几台机器都使用了相同的用户名yay)。
产生SSH key pair:
yay@yay-ThinkPad-T470:~$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/yay/.ssh/id_rsa):
/home/yay/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/yay/.ssh/id_rsa.
Your public key has been saved in /home/yay/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:...
The key's randomart image is:
...
yay@yay-ThinkPad-T470:~$ cd .ssh
yay@yay-ThinkPad-T470:~/.ssh$ dir
authorized_keys id_rsa id_rsa.pub known_hosts
yay@yay-ThinkPad-T470:~/.ssh$ cat id_rsa.pub >> ~/.ssh/authorized_keys
yay@yay-ThinkPad-T470:~/.ssh$
把.ssh/authorized_keys文件copy到另外一台集群主机上(这里另外一台机器是192.168.1.16):
yay@yay-ThinkPad-T470:~$ cd ~
yay@yay-ThinkPad-T470:~$ scp -r .ssh yay@192.168.1.16:/
yay@192.168.1.16's password:
scp: /.ssh: Permission denied
yay@yay-ThinkPad-T470:~$ scp -r .ssh yay@192.168.1.16:/home/yay
yay@192.168.1.16's password:
id_rsa.pub 100% 403 27.1KB/s 00:00
authorized_keys 100% 806 94.7KB/s 00:00
known_hosts 100% 666 70.3KB/s 00:00
id_rsa 100% 1675 92.6KB/s 00:00
yay@yay-ThinkPad-T470:~$
从本机验证ssh到另外一台集群主机是否可以免密码(同样试验从另外一台机器ssh到本机是否可以免密):
yay@yay-ThinkPad-T470:~$ ssh 192.168.1.16
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-54-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
* Canonical Livepatch is available for installation.
- Reduce system reboots and improve kernel security. Activate at:
https://ubuntu.com/livepatch
630 个可升级软件包。
329 个安全更新。
Last login: Mon Feb 3 11:27:03 2020 from 192.168.1.43
yay@yay-ThinkPad-T470-W10DG:~$
3.2 HBase配置
3.2.1 在HMaster上配置
这里192.168.1.43只做HMaster(也需要运行Zookeeper)hostname为yay-ThinkPad-T470; 192.168.1.16做RegionServer(也需要运行Zookeeper)hostname为yay-ThinkPad-T470-W10DG
查看另外一台集群主机的hostname
yay@yay-ThinkPad-T470:~$ ssh 192.168.1.16
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-54-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
* Canonical Livepatch is available for installation.
- Reduce system reboots and improve kernel security. Activate at:
https://ubuntu.com/livepatch
630 个可升级软件包。
329 个安全更新。
Last login: Mon Feb 3 11:42:00 2020 from 192.168.1.43
yay@yay-ThinkPad-T470-W10DG:~$ hostname
yay-ThinkPad-T470-W10DG
yay@yay-ThinkPad-T470-W10DG:~$
配置conf/regionservers:
图片.png
All hosts listed in conf/regionservers file will have their RegionServer processes started and stopped when the master server starts or stops.
接下来需要配置 Zookeeper,在hbase-site.xml中增加如下内容:
<property>
<name>hbase.zookeeper.quorum</name>
<value>192.168.1.43,192.168.1.16</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/yay/zookeepertemp/tempforhbase</value>
</property>
接下来修改hbase-env.sh里面的内容:
图片.png
3.2.2 把HMaster的conf目录copy到其他集群主机上
把conf目录copy到另外几个集群节点上去:
yay@yay-ThinkPad-T470:~$ cd software
yay@yay-ThinkPad-T470:~/software$ cd hbase-1.4.12
yay@yay-ThinkPad-T470:~/software/hbase-1.4.12$ scp -r conf yay@192.168.1.16:/home/yay/software/hbase-1.4.12
hbase-policy.xml 100% 2257 72.1KB/s 00:00
log4j-hbtop.properties 100% 1169 172.9KB/s 00:00
hbase-site.xml 100% 2039 259.8KB/s 00:00
log4j.properties 100% 4949 380.7KB/s 00:00
regionservers 100% 25 5.4KB/s 00:00
hadoop-metrics2-hbase.properties 100% 1811 249.1KB/s 00:00
hbase-env.cmd 100% 4668 531.1KB/s 00:00
hbase-env.sh 100% 7644 610.3KB/s 00:00
yay@yay-ThinkPad-T470:~/software/hbase-1.4.12$
3.3 启动HBase环境
如果不修改hbase-env.sh,则会报错,导致HMaster启动不了: 图片.png
图片.png 查看HMaster web UI: 图片.png注意下图中的HQuorumPeer进程实为一个Zookeeper实例
注意不是用hdfs,使用本地文件也是可以的:
图片.png
<value>file:///home/yay/hbasedata2</value>
查看RegionServer
图片.png
网友评论