1.检查前置条件,选择合适版本
本次实验使用的各组件版本为:hadoop2.7.3、hive1.2.2、scala2.11.8、spark2.1.1、zeppelin0.7.0,首先检查hbase和各组件的兼容性,选择最合适的版本,版本兼容性分析见附录,经分析,选取hbase1.2.6和zookeeper3.4.10作为集群新的组件。
2.安装zookeeper
用winscp将下载的tar包上传到master节点的download文件夹
#master节点执行
cd ~/download/
ls
chmod +x zookeeper-3.4.10.tar.gz
tar -zxf zookeeper-3.4.10.tar.gz -C /usr/local/
cd /usr/local/
mv zookeeper-3.4.10/ ./zookeeper
cd /usr/local/zookeeper/conf/
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
#修改下列项
dataDir=/usr/local/zookeeper/data
#添加下列项
server.1=192.168.1.40:2887:3887
server.2=192.168.1.141:2888:3888
cd /usr/local/zookeeper/
mkdir data
cd data/
vim myid
写入 1
scp -r /usr/local/zookeeper root@data:/usr/local/
#data节点执行
vim /usr/local/zookeeper/data/myid
将1替换成2
#所有节点执行
/usr/local/zookeeper/bin/zkServer.sh start
使用/usr/local/zookeeper/bin/zkServer.sh status查看启动状态,可以看到Mode:follower/leader表示启动成功
3.安装hbase
hbase集群需要依赖于一个Zookeeper ensemble。HBase集群中的所有节点以及要访问HBase的客户端都需要能够访问到该Zookeeper ensemble。HBase自带了Zookeeper,但为了方便其他应用程序使Zookeeper,最好使用单独安装的Zookeeper ensemble。
此外,Zookeeper ensemble一般配置为奇数个节点,并且Hadoop集群、Zookeeper ensemble、HBase集群是三个互相独立的集群,并不需要部署在相同的物理节点上,他们之间是通过网络通信的。
首先还是将tar包放入主节点download文件夹
#master节点执行
cd ~/download/
chmod +x hbase-1.2.6-bin.tar.gz
tar -zxf hbase-1.2.6-bin.tar.gz -C /usr/local/
cd /usr/local/
mv hbase-1.2.6/ ./hbase
vim /usr/local/hbase/conf/hbase-site.xml
#添加以下配置
<property>
<name>hbase.rootdir</name>
<value>hdfs://192.168.1.40:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>192.168.1.40:60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,data</value>
</property>
vim /usr/local/hbase/conf/hbase-env.sh
#加入以下配置
export JAVA_HOME=/usr/local/java
export HBASE_CLASSPATH=/usr/local/hadoop/conf
export HBASE_MANAGES_ZK=false
vim /usr/local/hbase/conf/regionservers
#去除原理的localhost
data
scp -r /usr/local/hbase root@data:/usr/local/
#启动hbase
/usr/local/hbase/bin/start-hbase.sh
#如果出现以下错误,执行下列操作
64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
vim /usr/local/hbase/conf/hbase-env.sh
找到以下两行然后注释掉
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
#使用jpsmaster节点出现HMaster,子节点出现HRegionServer表示成功
jps
11620 HMaster
10202 SecondaryNameNode
10362 ResourceManager
10717 QuorumPeerMain
11821 Jps
10014 NameNode
4.Hbase基础操作
进入hbase shell
/usr/local/hbase/bin/hbase shell
创建表
语法:
create '<table name>','<column family>'
table name:表名
column family:列族名
hbase(main):001:0> create 'emp', 'personal data', 'professional data'
0 row(s) in 2.7680 seconds
=> Hbase::Table - emp
列出表
list指令
hbase(main):002:0> list
TABLE
emp
1 row(s) in 0.0180 seconds
=> ["emp"]
可以看见我们刚才创建的表
禁用和启用表
disable、enable指令
语法:
disable/enable 'table name'
hbase(main):003:0> disable 'emp'
0 row(s) in 2.3530 seconds
hbase(main):004:0> enable 'emp'
0 row(s) in 1.2890 seconds
查看表结构和修改表结构
语法:
describe 'table name'
使用alter,可以设置和删除表范围,运算符,如MAX_FILESIZE,READONLY,MEMSTORE_FLUSHSIZE,DEFERRED_LOG_FLUSH等。
hbase(main):005:0> describe 'emp'
Table emp is ENABLED
emp
COLUMN FAMILIES DESCRIPTION
{NAME => 'personal data', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', K
EEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSI
ON => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATI
ON_SCOPE => '0'}
{NAME => 'professional data', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false
', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPR
ESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLI
CATION_SCOPE => '0'}
2 row(s) in 0.0460 seconds
验证表是否存在
语法:
exists 'table name'
hbase(main):006:0> exists 'emp'
Table emp does exist
0 row(s) in 0.0260 seconds
删除表
删除一个表之前必须先将其禁用
语法:
drop 'table name'
drop_all '符合正则表达式的表'
插入数据
- put 命令
- add() - Put类的方法
- put() - HTable类的方法
语法:
put '<table name>','row1','<colfamily:colname>','<value>'
hbase(main):007:0> put 'emp','1','personal data:name','raju'
0 row(s) in 0.1210 seconds
hbase(main):008:0> put 'emp','1','personal data:city','hyderabad'
0 row(s) in 0.0270 seconds
hbase(main):014:0> put 'emp','1','professional data:designation','manager'
0 row(s) in 0.0170 seconds
hbase(main):015:0> put 'emp','1','professional data:salary','50000'
0 row(s) in 0.0400 seconds
hbase(main):016:0> scan 'emp'
ROW COLUMN+CELL
1 column=personal data:city, timestamp=1499440363409, value=hydera
bad
1 column=personal data:name, timestamp=1499440353624, value=raju
1 column=professional data:designation, timestamp=1499440471736, v
alue=manager
1 column=professional data:salary, timestamp=1499440484284, value=
50000
1 row(s) in 0.0330 seconds
修改数据
语法:
put 'table name','row','Column family:column name','new value'
hbase(main):017:0> put 'emp','1','professional data:salary','40000'
0 row(s) in 0.0190 seconds
hbase(main):018:0> scan 'emp'
ROW COLUMN+CELL
1 column=personal data:city, timestamp=1499440363409, value=hydera
bad
1 column=personal data:name, timestamp=1499440353624, value=raju
1 column=professional data:designation, timestamp=1499440471736, v
alue=manager
1 column=professional data:salary, timestamp=1499441004178, value=
40000
1 row(s) in 0.0300 seconds
读取数据
语法:
读取一行 get '<table name>','row1'
读取指定列 get 'table name', 'rowid', {COLUMN => 'column family:column name'}
hbase(main):007:0> get 'emp', '1', {COLUMN=>'personal data:name'}
COLUMN CELL
personal data:name timestamp=1499440353624, value=raju
1 row(s) in 0.0090 seconds
删除数据
语法:
delete '<table name>', '<row>', '<column name >', '<time stamp>'
删除表的所有单元格
deleteall '<table name>', '<row>'
hbase(main):008:0> delete 'emp', '1', 'personal data:city'
0 row(s) in 0.0690 seconds
查看表数据
语法:
scan 'table name'
Hbase导入导出数据
语法:
hbase org.apache.hadoop.hbase.mapreduce.Export tablename hdfspath
hbase org.apache.hadoop.hbase.mapreduce.Import tablename hdfspath
[root@master hadoop]# /usr/local/hbase/bin/hbase org.apache.hadoop.hbase.mapreduce.Export
ERROR: Wrong number of arguments: 0
Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]
Note: -D properties will be applied to the conf used.
For example:
-D mapreduce.output.fileoutputformat.compress=true
-D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
-D mapreduce.output.fileoutputformat.compress.type=BLOCK
Additionally, the following SCAN properties can be specified
to control/limit what is exported..
-D hbase.mapreduce.scan.column.family=<familyName>
-D hbase.mapreduce.include.deleted.rows=true
-D hbase.mapreduce.scan.row.start=<ROWSTART>
-D hbase.mapreduce.scan.row.stop=<ROWSTOP>
For performance consider the following properties:
-Dhbase.client.scanner.caching=100
-Dmapreduce.map.speculative=false
-Dmapreduce.reduce.speculative=false
For tables with very wide rows consider setting the batch size as below:
-Dhbase.export.scanner.batch=10
可以看到该命令的一些参数,执行以下命令可以将emp表导出到hdfs文件系统上
/usr/local/hbase/bin/hbase org.apache.hadoop.hbase.mapreduce.Export emp /user/root/hbase/output
但是我会卡在mapreduce过程,报以下错误
InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist
根据网上提供的资料修改yarn-site.xml还是不能解决,但是该命令的mapreduce job已被提交。
下面给出一个使用读取hbase数据的程序的例子
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
public class RetriveData{
public static void main(String[] args) throws IOException, Exception{
// Instantiating Configuration class
Configuration config = HBaseConfiguration.create();
// Instantiating HTable class
HTable table = new HTable(config, "emp");
// Instantiating Get class
Get g = new Get(Bytes.toBytes("row1"));
// Reading the data
Result result = table.get(g);
// Reading values from Result class object
byte [] value = result.getValue(Bytes.toBytes("personal"),Bytes.toBytes("name"));
byte [] value1 = result.getValue(Bytes.toBytes("personal"),Bytes.toBytes("city"));
// Printing the values
String name = Bytes.toString(value);
String city = Bytes.toString(value1);
System.out.println("name: " + name + " city: " + city);
}
}
编译和执行后可以得到如下结果
$javac RetriveData.java
$java RetriveData
name: Raju city: Delhi
网友评论