美文网首页
HBase结合Phoenix使用初探

HBase结合Phoenix使用初探

作者: 白面葫芦娃92 | 来源:发表于2019-03-17 00:22 被阅读0次

    hbase命令帮助:

    [root@hadoop001 bin]# hbase shell
    Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
    19/03/16 22:48:32 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
    HBase Shell; enter 'help<RETURN>' for list of supported commands.
    Type "exit<RETURN>" to leave the HBase Shell
    Version 1.2.0-cdh5.12.0, rUnknown, Thu Jun 29 04:38:21 PDT 2017
    
    hbase(main):001:0> help
    HBase Shell, version 1.2.0-cdh5.12.0, rUnknown, Thu Jun 29 04:38:21 PDT 2017
    Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
    Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
    
    COMMAND GROUPS:
      Group name: general
      Commands: status, table_help, version, whoami
    
      Group name: ddl
      Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters
    
      Group name: namespace
      Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
    
      Group name: dml
      Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve
    
      Group name: tools
      Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_mob, compact_rs, flush, major_compact, major_compact_mob, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, trace, unassign, wal_roll, zk_dump
    
      Group name: replication
      Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs, update_peer_config
    
      Group name: snapshots
      Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot
    
      Group name: configuration
      Commands: update_all_config, update_config
    
      Group name: quotas
      Commands: list_quotas, set_quota
    
      Group name: security
      Commands: grant, list_security_capabilities, revoke, user_permission
    
      Group name: procedures
      Commands: abort_procedure, list_procedures
    
      Group name: visibility labels
      Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility
    
      Group name: rsgroup
      Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_servers_rsgroup, move_tables_rsgroup, remove_rsgroup
    
    SHELL USAGE:
    Quote all names in HBase Shell such as table and column names.  Commas delimit
    command parameters.  Type <RETURN> after entering a command to run it.
    Dictionaries of configuration used in the creation and alteration of tables are
    Ruby Hashes. They look like this:
    
      {'key1' => 'value1', 'key2' => 'value2', ...}
    
    and are opened and closed with curley-braces.  Key/values are delimited by the
    '=>' character combination.  Usually keys are predefined constants such as
    NAME, VERSIONS, COMPRESSION, etc.  Constants do not need to be quoted.  Type
    'Object.constants' to see a (messy) list of all constants in the environment.
    
    If you are using binary keys or values and need to enter them in the shell, use
    double-quote'd hexadecimal representation. For example:
    
      hbase> get 't1', "key\x03\x3f\xcd"
      hbase> get 't1', "key\003\023\011"
      hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"
    
    The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
    For more on the HBase Shell, see http://hbase.apache.org/book.html
    

    hbase的create以及put命令帮助:

    hbase(main):002:0> create
    
    ERROR: wrong number of arguments (0 for 1)
    
    Creates a table. Pass a table name, and a set of column family
    specifications (at least one), and, optionally, table configuration.
    Column specification can be a simple string (name), or a dictionary
    (dictionaries are described below in main help output), necessarily 
    including NAME attribute. 
    Examples:
    
    Create a table with namespace=ns1 and table qualifier=t1
      hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}
    
    Create a table with namespace=default and table qualifier=t1
      hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
      hbase> # The above in shorthand would be the following:
      hbase> create 't1', 'f1', 'f2', 'f3'
      hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
      hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
      hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 1000000, MOB_COMPACT_PARTITION_POLICY => 'weekly'}
    
    Table configuration options can be put at the end.
    Examples:
    
      hbase> create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']
      hbase> create 't1', 'f1', SPLITS => ['10', '20', '30', '40']
      hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'
      hbase> create 't1', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }
      hbase> # Optionally pre-split the table into NUMREGIONS, using
      hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)
      hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
      hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', REGION_REPLICATION => 2, CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}
      hbase> create 't1', {NAME => 'f1', DFS_REPLICATION => 1}
    
    You can also keep around a reference to the created table:
    
      hbase> t1 = create 't1', 'f1'
    
    Which gives you a reference to the table named 't1', on which you can then
    call methods.
    
    hbase(main):006:0> put
    
    ERROR: wrong number of arguments (0 for 4)
    
    Put a cell 'value' at specified table/row/column and optionally
    timestamp coordinates.  To put a cell value into table 'ns1:t1' or 't1'
    at row 'r1' under column 'c1' marked with the time 'ts1', do:
    
      hbase> put 'ns1:t1', 'r1', 'c1', 'value'
      hbase> put 't1', 'r1', 'c1', 'value'
      hbase> put 't1', 'r1', 'c1', 'value', ts1
      hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
      hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
      hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
    
    The same commands also can be run on a table reference. Suppose you had a reference
    t to table 't1', the corresponding command would be:
    
      hbase> t.put 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
    
    
    hbase(main):004:0> create 't_hbase1','info'
    0 row(s) in 1.5790 seconds
    
    => Hbase::Table - t_hbase1
    hbase(main):005:0> desc 't_hbase1'
    Table t_hbase1 is ENABLED                                                       
    t_hbase1                                                                        
    COLUMN FAMILIES DESCRIPTION                                                     
    {NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KE
    EP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', CO
    MPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65
    536', REPLICATION_SCOPE => '0'}                                                 
    1 row(s) in 0.1260 seconds
    
    hbase(main):007:0> put 't_hbase1','1','info:id','1'
    0 row(s) in 0.1320 seconds
    
    hbase(main):008:0> put 't_hbase1','1','info:salary','1'
    0 row(s) in 0.0170 seconds
    
    hbase(main):009:0> put 't_hbase1','1','info:start_date','2019-03-18'
    0 row(s) in 0.0230 seconds
    
    hbase(main):010:0> put 't_hbase1','1','info:end_date','2019-03-18'
    0 row(s) in 0.0190 seconds
    
    hbase(main):011:0> scan 't_hbase1'
    ROW                   COLUMN+CELL                                               
     1                    column=info:end_date, timestamp=1552748561766, value=2019-
                          03-18                                                     
     1                    column=info:id, timestamp=1552748507732, value=1          
     1                    column=info:salary, timestamp=1552748515347, value=1      
     1                    column=info:start_date, timestamp=1552748545143, value=201
                          9-03-18                                                   
    1 row(s) in 0.0640 seconds
    
    [root@hadoop001 bin]# ./sqlline.py 192.168.137.2:2181
    0: jdbc:phoenix:192.168.137.2:2181> create table "t_hbase1"(  "ROW" varchar primary key,  "info"."start_date" varchar , "info"."end_date" varchar ,  "info"."id" varchar ,  "info"."salary" varchar);
    0: jdbc:phoenix:192.168.137.2:2181> select * from "t_hbase1";
    +------+-------------+-------------+-----+---------+
    | ROW  | start_date  |  end_date   | id  | salary  |
    +------+-------------+-------------+-----+---------+
    | 1    | 2019-03-18  | 2019-03-18  | 1   | 1       |
    +------+-------------+-------------+-----+---------+
    1 row selected (0.091 seconds)
    0: jdbc:phoenix:192.168.137.2:2181> UPSERT INTO "t_hbase1" VALUES('2','2019-04-18','2019-04-18','2','2');
    t_hbase1
    1 row affected (0.052 seconds)
    0: jdbc:phoenix:192.168.137.2:2181> select * from "t_hbase1";
    t_hbase1
    +------+-------------+-------------+-----+---------+
    | ROW  | start_date  |  end_date   | id  | salary  |
    +------+-------------+-------------+-----+---------+
    | 1    | 2019-03-18  | 2019-03-18  | 1   | 1       |
    | 2    | 2019-04-18  | 2019-04-18  | 2   | 2       |
    +------+-------------+-------------+-----+---------+
    2 rows selected (0.059 seconds)
    
    hbase(main):012:0> scan 't_hbase1'
    ROW                   COLUMN+CELL                                               
     1                    column=info:_0, timestamp=1552748561766, value=           
     1                    column=info:end_date, timestamp=1552748561766, value=2019-
                          03-18                                                     
     1                    column=info:id, timestamp=1552748507732, value=1          
     1                    column=info:salary, timestamp=1552748515347, value=1      
     1                    column=info:start_date, timestamp=1552748545143, value=201
                          9-03-18                                                   
     2                    column=info:_0, timestamp=1552748985363, value=x          
     2                    column=info:end_date, timestamp=1552748985363, value=2019-
                          04-18                                                     
     2                    column=info:id, timestamp=1552748985363, value=2          
     2                    column=info:salary, timestamp=1552748985363, value=2      
     2                    column=info:start_date, timestamp=1552748985363, value=201
                          9-04-18                                                   
    2 row(s) in 0.0300 seconds
    
    hbase(main):002:0> put 't_hbase1','3','info:id','3'
    0 row(s) in 0.0990 seconds
    
    hbase(main):002:0> put 't_hbase1','3','info:salary','3'
    0 row(s) in 0.0990 seconds
    
    hbase(main):003:0> put 't_hbase1','3','info:start_date','2019-05-18'
    0 row(s) in 0.0130 seconds
    
    hbase(main):004:0> put 't_hbase1','3','info:end_date','2019-05-18'
    0 row(s) in 0.0190 seconds
    
    hbase(main):005:0> scan 't_hbase1'
    ROW                   COLUMN+CELL                                               
     1                    column=info:_0, timestamp=1552748561766, value=           
     1                    column=info:end_date, timestamp=1552748561766, value=2019-
                          03-18                                                     
     1                    column=info:id, timestamp=1552748507732, value=1          
     1                    column=info:salary, timestamp=1552748515347, value=1      
     1                    column=info:start_date, timestamp=1552748545143, value=201
                          9-03-18                                                   
     2                    column=info:_0, timestamp=1552748985363, value=x          
     2                    column=info:end_date, timestamp=1552748985363, value=2019-
                          04-18                                                     
     2                    column=info:id, timestamp=1552748985363, value=2          
     2                    column=info:salary, timestamp=1552748985363, value=2      
     2                    column=info:start_date, timestamp=1552748985363, value=201
                          9-04-18                                                   
     3                    column=info:end_date, timestamp=1552749225788, value=2019-
                          05-18                                                     
     3                    column=info:id, timestamp=1552749074811, value=3          
     3                    column=info:salary, timestamp=1552749201553, value=3      
     3                    column=info:start_date, timestamp=1552749215234, value=201
                          9-05-18                                                   
    3 row(s) in 0.0410 seconds
    
    0: jdbc:phoenix:192.168.137.2:2181> select * from "t_hbase1";
    +------+-------------+-------------+-----+---------+
    | ROW  | start_date  |  end_date   | id  | salary  |
    +------+-------------+-------------+-----+---------+
    | 1    | 2019-03-18  | 2019-03-18  | 1   | 1       |
    | 2    | 2019-04-18  | 2019-04-18  | 2   | 2       |
    | 3    | 2019-05-18  | 2019-05-18  | 3   | 3       |
    +------+-------------+-------------+-----+---------+
    3 rows selected (0.061 seconds)
    0: jdbc:phoenix:192.168.137.2:2181> delete from "t_hbase1" where "id"='3';
    t_hbase1
    1 row affected (0.028 seconds)
    0: jdbc:phoenix:192.168.137.2:2181> select * from "t_hbase1";
    t_hbase1
    +------+-------------+-------------+-----+---------+
    | ROW  | start_date  |  end_date   | id  | salary  |
    +------+-------------+-------------+-----+---------+
    | 1    | 2019-03-18  | 2019-03-18  | 1   | 1       |
    | 2    | 2019-04-18  | 2019-04-18  | 2   | 2       |
    +------+-------------+-------------+-----+---------+
    2 rows selected (0.048 seconds)
    
    hbase(main):001:0> scan 't_hbase1'
    ROW                   COLUMN+CELL                                               
     1                    column=info:_0, timestamp=1552748561766, value=           
     1                    column=info:end_date, timestamp=1552748561766, value=2019-
                          03-18                                                     
     1                    column=info:id, timestamp=1552748507732, value=1          
     1                    column=info:salary, timestamp=1552748515347, value=1      
     1                    column=info:start_date, timestamp=1552748545143, value=201
                          9-03-18                                                   
     2                    column=info:_0, timestamp=1552748985363, value=x          
     2                    column=info:end_date, timestamp=1552748985363, value=2019-
                          04-18                                                     
     2                    column=info:id, timestamp=1552748985363, value=2          
     2                    column=info:salary, timestamp=1552748985363, value=2      
     2                    column=info:start_date, timestamp=1552748985363, value=201
                          9-04-18                                                   
    2 row(s) in 0.3110 seconds
    

    创建二级索引

    0: jdbc:phoenix:localhost:2181:/hbase> CREATE INDEX THBASE1_INDEX_ID ON "t_hbase1"("info"."id");
    0: jdbc:phoenix:localhost:2181:/hbase> select * from THBASE1_INDEX_ID;
    +----------+-------+
    | info:id  | :ROW  |
    +----------+-------+
    | 1        | 1     |
    | 2        | 2     |
    +----------+-------+
    2 rows selected (0.072 seconds)
    
    [root@hadoop001 ~]# hadoop fs -ls /hbase/data/default
    Found 4 items
    drwxr-xr-x   - hbase hbase          0 2019-03-10 16:19 /hbase/data/default/SYSTEM.MUTEX
    drwxr-xr-x   - hbase hbase          0 2019-03-16 23:51 /hbase/data/default/THBASE1_INDEX_ID
    drwxr-xr-x   - hbase hbase          0 2019-03-10 16:19 /hbase/data/default/USER
    drwxr-xr-x   - hbase hbase          0 2019-03-16 22:59 /hbase/data/default/t_hbase1
    

    有了索引表之后,直接在hbase里插入数据

    hbase(main):002:0> put 't_hbase1','3','info:id','3'
    0 row(s) in 0.0990 seconds
    
    hbase(main):003:0> put 't_hbase1','3','info:salary','3'
    0 row(s) in 0.0210 seconds
    
    hbase(main):004:0> put 't_hbase1','3','info:start_date','2019-09-18'
    0 row(s) in 0.0180 seconds
    
    hbase(main):005:0> put 't_hbase1','3','info:end_date','2019-09-18'
    0 row(s) in 0.0210 seconds
    
    //phoenix里的数据会更新
    0: jdbc:phoenix:localhost:2181:/hbase> select * from "t_hbase1";
    +------+-------------+-------------+-----+---------+
    | ROW  | start_date  |  end_date   | id  | salary  |
    +------+-------------+-------------+-----+---------+
    | 1    | 2019-03-18  | 2019-03-18  | 1   | 1       |
    | 2    | 2019-04-18  | 2019-04-18  | 2   | 2       |
    | 3    | 2019-09-18  | 2019-09-18  | 3   | 3       |
    +------+-------------+-------------+-----+---------+
    3 rows selected (0.096 seconds)
    0: jdbc:phoenix:localhost:2181:/hbase>  select * from "t_hbase1" where "id"='3'; 
    +------+-------------+-------------+-----+---------+
    | ROW  | start_date  |  end_date   | id  | salary  |
    +------+-------------+-------------+-----+---------+
    | 3    | 2019-09-18  | 2019-09-18  | 3   | 3       |
    +------+-------------+-------------+-----+---------+
    1 row selected (0.084 seconds)
    //但是索引表不会更新
    0: jdbc:phoenix:localhost:2181:/hbase> select count(*) from "t_hbase1";
    +-----------+
    | COUNT(1)  |
    +-----------+
    | 2         |
    +-----------+
    1 row selected (0.029 seconds)
    0: jdbc:phoenix:localhost:2181:/hbase> select * from THBASE1_INDEX_ID;
    +----------+-------+
    | info:id  | :ROW  |
    +----------+-------+
    | 1        | 1     |
    | 2        | 2     |
    +----------+-------+
    2 rows selected (0.025 seconds)
    //那尝试一下直接在phoenix插入一条数据
    0: jdbc:phoenix:localhost:2181:/hbase> UPSERT INTO "t_hbase1" VALUES('4','2019-10-18','2019-10-18','4','4');
    t_hbase1
    1 row affected (0.17 seconds)
    //发现数据和索引都更新了
    0: jdbc:phoenix:localhost:2181:/hbase> select * from THBASE1_INDEX_ID;
    +----------+-------+
    | info:id  | :ROW  |
    +----------+-------+
    | 1        | 1     |
    | 2        | 2     |
    | 4        | 4     |
    +----------+-------+
    3 rows selected (0.024 seconds)
    0: jdbc:phoenix:localhost:2181:/hbase> select count(*) from "t_hbase1";
    +-----------+
    | COUNT(1)  |
    +-----------+
    | 3         |
    +-----------+
    1 row selected (0.029 seconds)
    0: jdbc:phoenix:localhost:2181:/hbase> select * from "t_hbase1";
    +------+-------------+-------------+-----+---------+
    | ROW  | start_date  |  end_date   | id  | salary  |
    +------+-------------+-------------+-----+---------+
    | 1    | 2019-03-18  | 2019-03-18  | 1   | 1       |
    | 2    | 2019-04-18  | 2019-04-18  | 2   | 2       |
    | 3    | 2019-09-18  | 2019-09-18  | 3   | 3       |
    | 4    | 2019-10-18  | 2019-10-18  | 4   | 4       |
    +------+-------------+-------------+-----+---------+
    4 rows selected (0.086 seconds)
    

    hbase里的数据和索引也都更新了

    hbase(main):009:0> scan 't_hbase1'
    ROW                   COLUMN+CELL                                               
     1                    column=info:_0, timestamp=1552748561766, value=           
     1                    column=info:end_date, timestamp=1552748561766, value=2019-
                          03-18                                                     
     1                    column=info:id, timestamp=1552748507732, value=1          
     1                    column=info:salary, timestamp=1552748515347, value=1      
     1                    column=info:start_date, timestamp=1552748545143, value=201
                          9-03-18                                                   
     2                    column=info:_0, timestamp=1552748985363, value=x          
     2                    column=info:end_date, timestamp=1552748985363, value=2019-
                          04-18                                                     
     2                    column=info:id, timestamp=1552748985363, value=2          
     2                    column=info:salary, timestamp=1552748985363, value=2      
     2                    column=info:start_date, timestamp=1552748985363, value=201
                          9-04-18                                                   
     3                    column=info:end_date, timestamp=1552751714137, value=2019-
                          09-18                                                     
     3                    column=info:id, timestamp=1552751696636, value=3          
     3                    column=info:salary, timestamp=1552751700685, value=3      
     3                    column=info:start_date, timestamp=1552751709072, value=201
                          9-09-18                                                   
     4                    column=info:_0, timestamp=1552751877072, value=x          
     4                    column=info:end_date, timestamp=1552751877072, value=2019-
                          10-18                                                     
     4                    column=info:id, timestamp=1552751877072, value=4          
     4                    column=info:salary, timestamp=1552751877072, value=4      
     4                    column=info:start_date, timestamp=1552751877072, value=201
                          9-10-18                                                   
    4 row(s) in 0.0510 seconds
    hbase(main):008:0> scan 'THBASE1_INDEX_ID'
    ROW                   COLUMN+CELL                                               
     1\x001               column=0:_0, timestamp=1552751474487, value=x             
     2\x002               column=0:_0, timestamp=1552751474487, value=x             
     4\x004               column=0:_0, timestamp=1552751877072, value=_0            
    3 row(s) in 0.0220 seconds
    

    这个问题的解决参考下面这个链接:
    https://blog.csdn.net/u012551524/article/details/79575632
    通过上述结果发现,当给t_hbase1表创建了二级索引后,如果通过hbase shell 进行插入数据时,该二级索引表数据是不会同步进行更新的。
    当给t_hbase1表创建了二级索引后,如果通过phoenix shell 进行插入数据时,该二级索引表数据是会自动同步的,原理主要是通过协处理器进行更新。
    总的来说如果需要维护phoenix表所创建的二级索引,源表数据的操作需要通过Phoenix客户端进行操作,当然如果不用维护对应的二级索引,数据的操作就无所谓通过什么方式进行了。
    本文参考:https://blog.csdn.net/u013850277/article/details/80978858

    相关文章

      网友评论

          本文标题:HBase结合Phoenix使用初探

          本文链接:https://www.haomeiwen.com/subject/gpbemqtx.html