HBase结合Phoenix使用初探

作者: 白面葫芦娃92 | 来源:发表于2019-03-17 00:22 被阅读0次

hbase命令帮助：

[root@hadoop001 bin]# hbase shell
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
19/03/16 22:48:32 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.0-cdh5.12.0, rUnknown, Thu Jun 29 04:38:21 PDT 2017

hbase(main):001:0> help
HBase Shell, version 1.2.0-cdh5.12.0, rUnknown, Thu Jun 29 04:38:21 PDT 2017
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.

COMMAND GROUPS:
  Group name: general
  Commands: status, table_help, version, whoami

  Group name: ddl
  Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters

  Group name: namespace
  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

  Group name: dml
  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

  Group name: tools
  Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_mob, compact_rs, flush, major_compact, major_compact_mob, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, trace, unassign, wal_roll, zk_dump

  Group name: replication
  Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs, update_peer_config

  Group name: snapshots
  Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot

  Group name: configuration
  Commands: update_all_config, update_config

  Group name: quotas
  Commands: list_quotas, set_quota

  Group name: security
  Commands: grant, list_security_capabilities, revoke, user_permission

  Group name: procedures
  Commands: abort_procedure, list_procedures

  Group name: visibility labels
  Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility

  Group name: rsgroup
  Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_servers_rsgroup, move_tables_rsgroup, remove_rsgroup

SHELL USAGE:
Quote all names in HBase Shell such as table and column names.  Commas delimit
command parameters.  Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:

  {'key1' => 'value1', 'key2' => 'value2', ...}

and are opened and closed with curley-braces.  Key/values are delimited by the
'=>' character combination.  Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc.  Constants do not need to be quoted.  Type
'Object.constants' to see a (messy) list of all constants in the environment.

If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:

  hbase> get 't1', "key\x03\x3f\xcd"
  hbase> get 't1', "key\003\023\011"
  hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"

The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html

hbase的create以及put命令帮助：

hbase(main):002:0> create

ERROR: wrong number of arguments (0 for 1)

Creates a table. Pass a table name, and a set of column family
specifications (at least one), and, optionally, table configuration.
Column specification can be a simple string (name), or a dictionary
(dictionaries are described below in main help output), necessarily 
including NAME attribute. 
Examples:

Create a table with namespace=ns1 and table qualifier=t1
  hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}

Create a table with namespace=default and table qualifier=t1
  hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
  hbase> # The above in shorthand would be the following:
  hbase> create 't1', 'f1', 'f2', 'f3'
  hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
  hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
  hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 1000000, MOB_COMPACT_PARTITION_POLICY => 'weekly'}

Table configuration options can be put at the end.
Examples:

  hbase> create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']
  hbase> create 't1', 'f1', SPLITS => ['10', '20', '30', '40']
  hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'
  hbase> create 't1', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }
  hbase> # Optionally pre-split the table into NUMREGIONS, using
  hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)
  hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
  hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', REGION_REPLICATION => 2, CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}
  hbase> create 't1', {NAME => 'f1', DFS_REPLICATION => 1}

You can also keep around a reference to the created table:

  hbase> t1 = create 't1', 'f1'

Which gives you a reference to the table named 't1', on which you can then
call methods.

hbase(main):006:0> put

ERROR: wrong number of arguments (0 for 4)

Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates.  To put a cell value into table 'ns1:t1' or 't1'
at row 'r1' under column 'c1' marked with the time 'ts1', do:

  hbase> put 'ns1:t1', 'r1', 'c1', 'value'
  hbase> put 't1', 'r1', 'c1', 'value'
  hbase> put 't1', 'r1', 'c1', 'value', ts1
  hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
  hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
  hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'}

The same commands also can be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:

  hbase> t.put 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}

hbase(main):004:0> create 't_hbase1','info'
0 row(s) in 1.5790 seconds

=> Hbase::Table - t_hbase1
hbase(main):005:0> desc 't_hbase1'
Table t_hbase1 is ENABLED                                                       
t_hbase1                                                                        
COLUMN FAMILIES DESCRIPTION                                                     
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KE
EP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', CO
MPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65
536', REPLICATION_SCOPE => '0'}                                                 
1 row(s) in 0.1260 seconds

hbase(main):007:0> put 't_hbase1','1','info:id','1'
0 row(s) in 0.1320 seconds

hbase(main):008:0> put 't_hbase1','1','info:salary','1'
0 row(s) in 0.0170 seconds

hbase(main):009:0> put 't_hbase1','1','info:start_date','2019-03-18'
0 row(s) in 0.0230 seconds

hbase(main):010:0> put 't_hbase1','1','info:end_date','2019-03-18'
0 row(s) in 0.0190 seconds

hbase(main):011:0> scan 't_hbase1'
ROW                   COLUMN+CELL                                               
 1                    column=info:end_date, timestamp=1552748561766, value=2019-
                      03-18                                                     
 1                    column=info:id, timestamp=1552748507732, value=1          
 1                    column=info:salary, timestamp=1552748515347, value=1      
 1                    column=info:start_date, timestamp=1552748545143, value=201
                      9-03-18                                                   
1 row(s) in 0.0640 seconds

[root@hadoop001 bin]# ./sqlline.py 192.168.137.2:2181
0: jdbc:phoenix:192.168.137.2:2181> create table "t_hbase1"(  "ROW" varchar primary key,  "info"."start_date" varchar , "info"."end_date" varchar ,  "info"."id" varchar ,  "info"."salary" varchar);
0: jdbc:phoenix:192.168.137.2:2181> select * from "t_hbase1";
+------+-------------+-------------+-----+---------+
| ROW  | start_date  |  end_date   | id  | salary  |
+------+-------------+-------------+-----+---------+
| 1    | 2019-03-18  | 2019-03-18  | 1   | 1       |
+------+-------------+-------------+-----+---------+
1 row selected (0.091 seconds)
0: jdbc:phoenix:192.168.137.2:2181> UPSERT INTO "t_hbase1" VALUES('2','2019-04-18','2019-04-18','2','2');
t_hbase1
1 row affected (0.052 seconds)
0: jdbc:phoenix:192.168.137.2:2181> select * from "t_hbase1";
t_hbase1
+------+-------------+-------------+-----+---------+
| ROW  | start_date  |  end_date   | id  | salary  |
+------+-------------+-------------+-----+---------+
| 1    | 2019-03-18  | 2019-03-18  | 1   | 1       |
| 2    | 2019-04-18  | 2019-04-18  | 2   | 2       |
+------+-------------+-------------+-----+---------+
2 rows selected (0.059 seconds)

hbase(main):012:0> scan 't_hbase1'
ROW                   COLUMN+CELL                                               
 1                    column=info:_0, timestamp=1552748561766, value=           
 1                    column=info:end_date, timestamp=1552748561766, value=2019-
                      03-18                                                     
 1                    column=info:id, timestamp=1552748507732, value=1          
 1                    column=info:salary, timestamp=1552748515347, value=1      
 1                    column=info:start_date, timestamp=1552748545143, value=201
                      9-03-18                                                   
 2                    column=info:_0, timestamp=1552748985363, value=x          
 2                    column=info:end_date, timestamp=1552748985363, value=2019-
                      04-18                                                     
 2                    column=info:id, timestamp=1552748985363, value=2          
 2                    column=info:salary, timestamp=1552748985363, value=2      
 2                    column=info:start_date, timestamp=1552748985363, value=201
                      9-04-18                                                   
2 row(s) in 0.0300 seconds

hbase(main):002:0> put 't_hbase1','3','info:id','3'
0 row(s) in 0.0990 seconds

hbase(main):002:0> put 't_hbase1','3','info:salary','3'
0 row(s) in 0.0990 seconds

hbase(main):003:0> put 't_hbase1','3','info:start_date','2019-05-18'
0 row(s) in 0.0130 seconds

hbase(main):004:0> put 't_hbase1','3','info:end_date','2019-05-18'
0 row(s) in 0.0190 seconds

hbase(main):005:0> scan 't_hbase1'
ROW                   COLUMN+CELL                                               
 1                    column=info:_0, timestamp=1552748561766, value=           
 1                    column=info:end_date, timestamp=1552748561766, value=2019-
                      03-18                                                     
 1                    column=info:id, timestamp=1552748507732, value=1          
 1                    column=info:salary, timestamp=1552748515347, value=1      
 1                    column=info:start_date, timestamp=1552748545143, value=201
                      9-03-18                                                   
 2                    column=info:_0, timestamp=1552748985363, value=x          
 2                    column=info:end_date, timestamp=1552748985363, value=2019-
                      04-18                                                     
 2                    column=info:id, timestamp=1552748985363, value=2          
 2                    column=info:salary, timestamp=1552748985363, value=2      
 2                    column=info:start_date, timestamp=1552748985363, value=201
                      9-04-18                                                   
 3                    column=info:end_date, timestamp=1552749225788, value=2019-
                      05-18                                                     
 3                    column=info:id, timestamp=1552749074811, value=3          
 3                    column=info:salary, timestamp=1552749201553, value=3      
 3                    column=info:start_date, timestamp=1552749215234, value=201
                      9-05-18                                                   
3 row(s) in 0.0410 seconds

0: jdbc:phoenix:192.168.137.2:2181> select * from "t_hbase1";
+------+-------------+-------------+-----+---------+
| ROW  | start_date  |  end_date   | id  | salary  |
+------+-------------+-------------+-----+---------+
| 1    | 2019-03-18  | 2019-03-18  | 1   | 1       |
| 2    | 2019-04-18  | 2019-04-18  | 2   | 2       |
| 3    | 2019-05-18  | 2019-05-18  | 3   | 3       |
+------+-------------+-------------+-----+---------+
3 rows selected (0.061 seconds)
0: jdbc:phoenix:192.168.137.2:2181> delete from "t_hbase1" where "id"='3';
t_hbase1
1 row affected (0.028 seconds)
0: jdbc:phoenix:192.168.137.2:2181> select * from "t_hbase1";
t_hbase1
+------+-------------+-------------+-----+---------+
| ROW  | start_date  |  end_date   | id  | salary  |
+------+-------------+-------------+-----+---------+
| 1    | 2019-03-18  | 2019-03-18  | 1   | 1       |
| 2    | 2019-04-18  | 2019-04-18  | 2   | 2       |
+------+-------------+-------------+-----+---------+
2 rows selected (0.048 seconds)

hbase(main):001:0> scan 't_hbase1'
ROW                   COLUMN+CELL                                               
 1                    column=info:_0, timestamp=1552748561766, value=           
 1                    column=info:end_date, timestamp=1552748561766, value=2019-
                      03-18                                                     
 1                    column=info:id, timestamp=1552748507732, value=1          
 1                    column=info:salary, timestamp=1552748515347, value=1      
 1                    column=info:start_date, timestamp=1552748545143, value=201
                      9-03-18                                                   
 2                    column=info:_0, timestamp=1552748985363, value=x          
 2                    column=info:end_date, timestamp=1552748985363, value=2019-
                      04-18                                                     
 2                    column=info:id, timestamp=1552748985363, value=2          
 2                    column=info:salary, timestamp=1552748985363, value=2      
 2                    column=info:start_date, timestamp=1552748985363, value=201
                      9-04-18                                                   
2 row(s) in 0.3110 seconds

创建二级索引

0: jdbc:phoenix:localhost:2181:/hbase> CREATE INDEX THBASE1_INDEX_ID ON "t_hbase1"("info"."id");
0: jdbc:phoenix:localhost:2181:/hbase> select * from THBASE1_INDEX_ID;
+----------+-------+
| info:id  | :ROW  |
+----------+-------+
| 1        | 1     |
| 2        | 2     |
+----------+-------+
2 rows selected (0.072 seconds)

[root@hadoop001 ~]# hadoop fs -ls /hbase/data/default
Found 4 items
drwxr-xr-x   - hbase hbase          0 2019-03-10 16:19 /hbase/data/default/SYSTEM.MUTEX
drwxr-xr-x   - hbase hbase          0 2019-03-16 23:51 /hbase/data/default/THBASE1_INDEX_ID
drwxr-xr-x   - hbase hbase          0 2019-03-10 16:19 /hbase/data/default/USER
drwxr-xr-x   - hbase hbase          0 2019-03-16 22:59 /hbase/data/default/t_hbase1

有了索引表之后，直接在hbase里插入数据

hbase(main):002:0> put 't_hbase1','3','info:id','3'
0 row(s) in 0.0990 seconds

hbase(main):003:0> put 't_hbase1','3','info:salary','3'
0 row(s) in 0.0210 seconds

hbase(main):004:0> put 't_hbase1','3','info:start_date','2019-09-18'
0 row(s) in 0.0180 seconds

hbase(main):005:0> put 't_hbase1','3','info:end_date','2019-09-18'
0 row(s) in 0.0210 seconds

//phoenix里的数据会更新
0: jdbc:phoenix:localhost:2181:/hbase> select * from "t_hbase1";
+------+-------------+-------------+-----+---------+
| ROW  | start_date  |  end_date   | id  | salary  |
+------+-------------+-------------+-----+---------+
| 1    | 2019-03-18  | 2019-03-18  | 1   | 1       |
| 2    | 2019-04-18  | 2019-04-18  | 2   | 2       |
| 3    | 2019-09-18  | 2019-09-18  | 3   | 3       |
+------+-------------+-------------+-----+---------+
3 rows selected (0.096 seconds)
0: jdbc:phoenix:localhost:2181:/hbase>  select * from "t_hbase1" where "id"='3'; 
+------+-------------+-------------+-----+---------+
| ROW  | start_date  |  end_date   | id  | salary  |
+------+-------------+-------------+-----+---------+
| 3    | 2019-09-18  | 2019-09-18  | 3   | 3       |
+------+-------------+-------------+-----+---------+
1 row selected (0.084 seconds)
//但是索引表不会更新
0: jdbc:phoenix:localhost:2181:/hbase> select count(*) from "t_hbase1";
+-----------+
| COUNT(1)  |
+-----------+
| 2         |
+-----------+
1 row selected (0.029 seconds)
0: jdbc:phoenix:localhost:2181:/hbase> select * from THBASE1_INDEX_ID;
+----------+-------+
| info:id  | :ROW  |
+----------+-------+
| 1        | 1     |
| 2        | 2     |
+----------+-------+
2 rows selected (0.025 seconds)
//那尝试一下直接在phoenix插入一条数据
0: jdbc:phoenix:localhost:2181:/hbase> UPSERT INTO "t_hbase1" VALUES('4','2019-10-18','2019-10-18','4','4');
t_hbase1
1 row affected (0.17 seconds)
//发现数据和索引都更新了
0: jdbc:phoenix:localhost:2181:/hbase> select * from THBASE1_INDEX_ID;
+----------+-------+
| info:id  | :ROW  |
+----------+-------+
| 1        | 1     |
| 2        | 2     |
| 4        | 4     |
+----------+-------+
3 rows selected (0.024 seconds)
0: jdbc:phoenix:localhost:2181:/hbase> select count(*) from "t_hbase1";
+-----------+
| COUNT(1)  |
+-----------+
| 3         |
+-----------+
1 row selected (0.029 seconds)
0: jdbc:phoenix:localhost:2181:/hbase> select * from "t_hbase1";
+------+-------------+-------------+-----+---------+
| ROW  | start_date  |  end_date   | id  | salary  |
+------+-------------+-------------+-----+---------+
| 1    | 2019-03-18  | 2019-03-18  | 1   | 1       |
| 2    | 2019-04-18  | 2019-04-18  | 2   | 2       |
| 3    | 2019-09-18  | 2019-09-18  | 3   | 3       |
| 4    | 2019-10-18  | 2019-10-18  | 4   | 4       |
+------+-------------+-------------+-----+---------+
4 rows selected (0.086 seconds)

hbase里的数据和索引也都更新了

hbase(main):009:0> scan 't_hbase1'
ROW                   COLUMN+CELL                                               
 1                    column=info:_0, timestamp=1552748561766, value=           
 1                    column=info:end_date, timestamp=1552748561766, value=2019-
                      03-18                                                     
 1                    column=info:id, timestamp=1552748507732, value=1          
 1                    column=info:salary, timestamp=1552748515347, value=1      
 1                    column=info:start_date, timestamp=1552748545143, value=201
                      9-03-18                                                   
 2                    column=info:_0, timestamp=1552748985363, value=x          
 2                    column=info:end_date, timestamp=1552748985363, value=2019-
                      04-18                                                     
 2                    column=info:id, timestamp=1552748985363, value=2          
 2                    column=info:salary, timestamp=1552748985363, value=2      
 2                    column=info:start_date, timestamp=1552748985363, value=201
                      9-04-18                                                   
 3                    column=info:end_date, timestamp=1552751714137, value=2019-
                      09-18                                                     
 3                    column=info:id, timestamp=1552751696636, value=3          
 3                    column=info:salary, timestamp=1552751700685, value=3      
 3                    column=info:start_date, timestamp=1552751709072, value=201
                      9-09-18                                                   
 4                    column=info:_0, timestamp=1552751877072, value=x          
 4                    column=info:end_date, timestamp=1552751877072, value=2019-
                      10-18                                                     
 4                    column=info:id, timestamp=1552751877072, value=4          
 4                    column=info:salary, timestamp=1552751877072, value=4      
 4                    column=info:start_date, timestamp=1552751877072, value=201
                      9-10-18                                                   
4 row(s) in 0.0510 seconds
hbase(main):008:0> scan 'THBASE1_INDEX_ID'
ROW                   COLUMN+CELL                                               
 1\x001               column=0:_0, timestamp=1552751474487, value=x             
 2\x002               column=0:_0, timestamp=1552751474487, value=x             
 4\x004               column=0:_0, timestamp=1552751877072, value=_0            
3 row(s) in 0.0220 seconds

这个问题的解决参考下面这个链接：
https://blog.csdn.net/u012551524/article/details/79575632
通过上述结果发现，当给t_hbase1表创建了二级索引后，如果通过hbase shell 进行插入数据时，该二级索引表数据是不会同步进行更新的。
当给t_hbase1表创建了二级索引后，如果通过phoenix shell 进行插入数据时，该二级索引表数据是会自动同步的，原理主要是通过协处理器进行更新。
总的来说如果需要维护phoenix表所创建的二级索引，源表数据的操作需要通过Phoenix客户端进行操作，当然如果不用维护对应的二级索引，数据的操作就无所谓通过什么方式进行了。
本文参考：https://blog.csdn.net/u013850277/article/details/80978858

HBase结合Phoenix使用初探

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读