美文网首页大数据
HBase入门实践

HBase入门实践

作者: 肥兔子爱豆畜子 | 来源:发表于2022-01-19 17:10 被阅读0次
    概要

    本文简单的安装单机版的HBase数据库,单机版底层存储是直接使用的本地文件系统、这样的话就不用搭建HDFS文件服务了。然后HBase提供了hbase-client来对数据库做操作,但是这里使用Apache Phoenix,可以支持SQL的方式来读写HBase,搭建完HBase并安装Phoenix插件之后,我们基于Spring JDBC和Phoenix客户端来开发一个增删改查HBase的示例。

    Phoenix分为客户端和服务端两部分,相当于在HBase上再加了一层SQL翻译,支持JDBC协议,客户端发送SQL经由phoenix发到其作为一个HBase插件的服务端上,把SQL再转成HBase指令交给HBase执行。

    HBase简介

    HBase是大数据时代的默认存储,适合存储海量数据,用户行为类数据、其他大数据平台的底层存储、报表展示类。

    环境安装与搭建

    吐槽一下,HBase这入门环境搭建简直是霍格大爷,差点被劝退。
    hbase-2.3.7 + phoenix-hbase-2.3-5.1.2死活不行,hbase本身倒是能正常用shell登入进行操作,用phoenix就是不行,卡在sqlline.py连接那里,然后hbase就Region in transition了、要么就ConnectionLoss for /hbase/hbaseid,只能删除数据目录重启。
    后来只能老实的安装网上别人的成功安装经验,用hbase-2.2.4 + phoenix-hbase-2.0-5.0.0这个组合才成功。

    conf/hbase-env.sh里改一下JAVA_HOME环境变量:

    export JAVA_HOME=/usr/java/jdk1.8.0_131/
    

    修改hbase-site.xml

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <configuration>
    
    <!-- hbase存放数据目录 -->
      <property>
        <name>hbase.rootdir</name>
        <value>file:///home/hbase-2.2.4/hbase</value>
      </property>
      <!-- ZooKeeper数据文件路径 -->
      <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/home/hbase-2.2.4/zookeeper</value>
      </property>
      <property>
        <name>hbase.master.ipc.address</name>
        <value>0.0.0.0</value>
      </property>
      <property>
        <name>hbase.regionserver.ipc.address</name>
        <value>0.0.0.0</value>
      </property>
    
     <property>
        <name>hbase.unsafe.stream.capability.enforce</name>
        <value>false</value>
      </property>
    </configuration>
    

    这里可以启动一下试试看,./bin/start-hbase.sh
    hbase shell进入命令行,list查看表,create 'test', 'cf' ,describe 'test'

    hbase(main):001:0> list
    TABLE                                                                                                                                                   
    0 row(s)
    Took 1.1635 seconds                                                                                                                                     
    => []
    
    
    hbase(main):013:0* create 'test', 'cf'
    Created table test
    Took 0.7584 seconds                                                                                                                                     
    => Hbase::Table - test
    hbase(main):014:0> list
    TABLE                                                                                                                                                   
    test                                                                                                                                                    
    1 row(s)
    Took 0.0299 seconds                                                                                                                                     
    => ["test"]
    
    
    hbase(main):019:0* describe 'test'
    Table test is ENABLED                                                                                                                                   
    test                                                                                                                                                    
    COLUMN FAMILIES DESCRIPTION                                                                                                                             
    {NAME => 'cf', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION =>
     'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                   
    
    1 row(s)
    Quota is disabled
    Took 0.3620 seconds 
    
    
    hbase(main):020:0> put 'test', 'row1', 'cf:a', 'value1'
    Took 0.1434 seconds                                                                                                                                     
    hbase(main):021:0> put 'test', 'row2', 'cf:b', 'value2'
    Took 0.0289 seconds                                                                                                                                     
    hbase(main):022:0> put 'test', 'row3', 'cf:c', 'value3'
    Took 0.0142 seconds                                                                                                                                     
    hbase(main):023:0> scan 'test'
    ROW                                     COLUMN+CELL                                                                                                     
     row1                                   column=cf:a, timestamp=2022-01-18T14:16:36.606, value=value1                                                    
     row2                                   column=cf:b, timestamp=2022-01-18T14:16:49.123, value=value2                                                    
     row3                                   column=cf:c, timestamp=2022-01-18T14:16:59.043, value=value3                                                    
    3 row(s)
    Took 0.0911 seconds
    
    hbase(main):025:0* get 'test', 'row1'
    COLUMN                                  CELL                                                                                                            
     cf:a                                   timestamp=2022-01-18T14:16:36.606, value=value1                                                                 
    1 row(s)
    Took 0.0510 seconds
    

    禁用表、启用表、禁用后删除表:

    disable 'test'
    enable 'test'
    drop 'test'
    

    然后安装Phoenix:
    1、把phoenix安装包里的jar包复制到hbase的lib目录里
    2、把hbase-site.xml文件cp到phoenix的bin目录,后面用本地这个phoenix客户端需要。
    3、添加环境变量
    vim /etc/profile

    # For Phoenix
    export PHOENIX_HOME=/usr/phoenix-hbase-2.3-5.1.2-bin
    export PHOENIX_CLASSPATH=$PHOENIX_HOME
    export PATH=$PHOENIX_HOME/bin:$PATH
    

    source /etc/profile 生效。
    使用phoenix自带的sqlline.py localhost:2181 验证一下:

    [root@VM_0_11_centos bin]# ./sqlline.py localhost:2181
    Setting property: [incremental, false]
    Setting property: [isolation, TRANSACTION_READ_COMMITTED]
    issuing: !connect jdbc:phoenix:localhost:2181 none none org.apache.phoenix.jdbc.PhoenixDriver
    Connecting to jdbc:phoenix:localhost:2181
    22/01/18 17:14:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Connected to: Phoenix (version 5.0)
    Driver: PhoenixEmbeddedDriver (version 5.0)
    Autocommit status: true
    Transaction isolation: TRANSACTION_READ_COMMITTED
    Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
    133/133 (100%) Done
    Done
    sqlline version 1.2.0
    0: jdbc:phoenix:localhost:2181> !table
    +------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
    | TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | IMMU |
    +------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
    |            | SYSTEM       | CATALOG     | SYSTEM TABLE  |          |            |                            |                 |              | fals |
    |            | SYSTEM       | FUNCTION    | SYSTEM TABLE  |          |            |                            |                 |              | fals |
    |            | SYSTEM       | LOG         | SYSTEM TABLE  |          |            |                            |                 |              | true |
    |            | SYSTEM       | SEQUENCE    | SYSTEM TABLE  |          |            |                            |                 |              | fals |
    |            | SYSTEM       | STATS       | SYSTEM TABLE  |          |            |                            |                 |              | fals |
    +------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
    

    测试一下表操作:

    0: jdbc:phoenix:localhost:2181> create table if not exists "staff"(
    . . . . . . . . . . . . . . . > id varchar primary key,
    . . . . . . . . . . . . . . . > name varchar,
    . . . . . . . . . . . . . . . > age varchar);
    No rows affected (1.28 seconds)
    
    0: jdbc:phoenix:localhost:2181> !table
    +------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
    | TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | IMMU |
    +------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
    |            | SYSTEM       | CATALOG     | SYSTEM TABLE  |          |            |                            |                 |              | fals |
    |            | SYSTEM       | FUNCTION    | SYSTEM TABLE  |          |            |                            |                 |              | fals |
    |            | SYSTEM       | LOG         | SYSTEM TABLE  |          |            |                            |                 |              | true |
    |            | SYSTEM       | SEQUENCE    | SYSTEM TABLE  |          |            |                            |                 |              | fals |
    |            | SYSTEM       | STATS       | SYSTEM TABLE  |          |            |                            |                 |              | fals |
    |            |              | staff       | TABLE         |          |            |                            |                 |              | fals |
    +------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+------+
    
    SpringBoot整合开发

    用的org.apache.phoenix:phoenix-core:5.0.0-HBase-2.0这个依赖,slf4j绑定跟springboot的冲突,所以exclude掉:

    plugins {
        id 'org.springframework.boot' version '2.1.13.RELEASE'
        id 'io.spring.dependency-management' version '1.0.9.RELEASE'
        id 'java'
    }
    
    
    version = '0.0.1-SNAPSHOT'
    sourceCompatibility = '1.8'
     
    repositories {
        mavenLocal()
        maven { url 'http://maven.aliyun.com/nexus/content/groups/public/' }
        //mavenCentral()
    }
    
    dependencies {
    
        implementation 'org.springframework.boot:spring-boot-starter-web'
        implementation 'org.springframework.boot:spring-boot-starter-jdbc'
        implementation 'org.springframework.boot:spring-boot-starter-test'
        implementation 'org.projectlombok:lombok:1.18.22'
        annotationProcessor('org.projectlombok:lombok')
        compile group: 'com.alibaba', name: 'fastjson', version: '1.2.73'
        
        compile('org.apache.phoenix:phoenix-core:5.0.0-HBase-2.0'){
            exclude group: 'org.slf4j'
        }
        
    }
    

    phoenix支持JDBC,这里选了Spring JDBC也就是JdbcTemplate来通过phoenix对HBase做增删改查。

    数据源配置:

    server.port=8080
    spring.application.name=hbase-test
    
    spring.datasource.driver-class-name=org.apache.phoenix.jdbc.PhoenixDriver
    spring.datasource.name=phoenixDataSource
    spring.datasource.url=jdbc:phoenix:122.xx.xxx.187:2181
    

    演示代码:

    应用启动的时候创建一个custem_user表:

    @Slf4j
    @Component
    public class SystemInitRunner implements ApplicationRunner{
        
        @Autowired
        private JdbcTemplate jdbcTemplate;
    
        @Override
        public void run(ApplicationArguments args) throws Exception {
            
            log.info("应用启动...");
            
            initHBaseTables();
        }
        
        public void initHBaseTables() {
            
            StringBuilder builder = new StringBuilder();
            builder.append("CREATE TABLE IF NOT EXISTS \"custemuser\" (")
                    .append("\"uid\" VARCHAR primary key,")
                    .append("\"basic\".\"name\" VARCHAR,")
                    .append("\"basic\".\"mobile\" VARCHAR)");
            String sql = builder.toString();
            
            log.info("开始执行HBase建表语句 {}" , sql);
            
            try {
                jdbcTemplate.execute(sql);
                log.info("HBase custemuser表创建完毕");
            }catch(DataAccessException e) {
                log.error("HBase custemuser表创建失败:{}", e.getMessage());
                throw new RuntimeException(e.getCause());
            }
            
        }
    
    }
    

    对custem_user表的新增与查询接口:

    @Slf4j
    @RestController
    @RequestMapping("/hbase")
    public class HBaseTestController {
        
        @Autowired
        private JdbcTemplate jdbcTemplate;
        
        @RequestMapping(value = "/addUser", method = RequestMethod.POST)
        public void addUser(@RequestBody CustemUser user) {
            
            String sql = "upsert into \"custemuser\"  values(?,?,?)";
            
            int ret = jdbcTemplate.update(sql,  new PreparedStatementSetter() {
    
                @Override
                public void setValues(PreparedStatement ps) throws SQLException {
                    ps.setString(1, user.getUid());
                    ps.setString(2, user.getName());
                    ps.setString(3, user.getMobile());
                }});
            
            log.info("HBase表custem_user已添加修改完毕,数据库返回{}", ret);
        }
        
        @RequestMapping(value = "/getUserByMobile", method = RequestMethod.GET)
        public CustemUser getUserByMobile(String mobile) {
            
            String sql = "select * from \"custemuser\" where \"basic\".\"mobile\" = ?";
            
            CustemUser user= jdbcTemplate.queryForObject(sql, 
                                new Object[] {mobile}, 
                                new RowMapper<CustemUser>() {
    
                                    @Override
                                    public CustemUser mapRow(ResultSet rs, int rowNum) throws SQLException {
                                        CustemUser u = new CustemUser();
                                        u.setUid(rs.getString(1));
                                        u.setName(rs.getString(2));
                                        u.setMobile(rs.getString(3));
                                        return u;
                                    }});
            
            log.info("HBase用户查询结果{}", JSON.toJSONString(user));
            
            return user;
        }
        
    }
    

    DTO对象:

    @Setter
    @Getter
    @NoArgsConstructor
    @ToString
    public class CustemUser {
        private String uid;
        private String name;
        private String mobile;
    }
    

    postMan测试:

    POST http://localhost:8080/hbase/addUser

    requestBody:

    {  
      "uid":"1001",
      "name":"肥兔子爱豆畜子",
      "mobile":"137xxxx8612"
    }
    

    GET http://localhost:8080/hbase/getUserByMobile?mobile=137xxxx8612

    返回:

    {
        "uid": "1001",
        "name": "肥兔子爱豆畜子",
        "mobile": "137xxxx8612"
    }
    
    Phoenix SQL语法

    我们直接使用hbase shell去数据库里看一下custem_user的记录:

    hbase(main):013:0> scan "custem_user"
    ROW                                     COLUMN+CELL                                                                                                     
     1001                                   column=0:\x00\x00\x00\x00, timestamp=1642576351561, value=x                                                     
     1001                                   column=0:\x80\x0B, timestamp=1642576351561, value=\xE8\x82\xA5\xE5\x85\x94\xE5\xAD\x90\xE7\x88\xB1\xE8\xB1\x86\x
                                            E7\x95\x9C\xE5\xAD\x90                                                                                          
     1001                                   column=0:\x80\x0C, timestamp=1642576351561, value=137xxxx8612                                                   
    1 row(s)
    Took 0.0814 seconds 
    

    可以看到Rowkey对应的就是我们建的表的主键id,然后id、name、mobile3个列一起被归到0这个列族了,这是因为我们在建表的时候没有指定列族。把建表语句改一下就行了:

    CREATE TABLE IF NOT EXISTS "custem_user" (
                    "uid" VARCHAR primary key,
                    "basic"."name" VARCHAR,
                    "basic"."mobile" VARCHAR)
    

    就可以把name和mobile归结到basic这个列族里。
    一般开发时在写到Java代码之前可以用DBeaver工具测试一下SQL是否正确:

    CREATE TABLE IF NOT EXISTS "test" (
    "uid" VARCHAR primary key,
    "basic"."name" VARCHAR,
    "basic"."mobile" VARCHAR
    );
    UPSERT INTO "test" values('123','liny','13789388372');
    UPSERT INTO "test" values('456','douchuzi','13429586338');
    
    SELECT * FROM "test" WHERE "basic"."mobile" = '13789388372'; 
    SELECT * FROM "test" WHERE "mobile" = '13429586338'; 
    

    上面两种查询方式都是可以的。
    而如下这么写不行:

    SELECT * FROM "test" WHERE mobile = '13429586338'; 
    

    报错:SQL 错误 [504] [42703]: ERROR 504 (42703): Undefined column. columnName=test.MOBILE

    Phoenix SQL里边表名、列明都是大小写敏感的,需要用双引号标识,我们建表的时候表custem_user的basic列族下mobile列,WHERE条件后的mobile字段没有加双引号,而从报错信息看显然是去按照test.MOBILE去找列了。

    实践中遇到的问题:

    坑1:

    应用启动的时候报错:HADOOP_HOME AND HADOOP.HOME.DIR ARE UNSET,解决办法是去steveloughran/winutils: Windows binaries for Hadoop versions (github.com) 下载各版本Hadoop的winutil到本地,然后设置好环境变量就可以了。依赖包里可以看到是Hadoop3.0,所以设置里边的3.0目录到环境变量HADOOP_HOME,重启IDE即可。

    坑2:

    应用开始运行后用phoenix创建表的时候报错:Can not resolve VM_0_11_centos, please check your network java.net.UnknownHostException: VM_0_11_centos

    报错日志可以看到是hbase-client连接失败,VM_0_11_centos是笔者远程HBase所在服务器的机器名,查阅一些文档发现HBase的Region Server启动的时候就是把自己的hostname存放在zookeeper的、而不是ip,所以在客户端本地hosts文件中添加:122.xx.xxx.187 VM_0_11_centos,然后刷新下windows的本地dns即可:
    ipconfig /displaydns
    ipconfig /flushdns

    下一步进阶

    HBase的原理,包括它的架构和集群搭建。
    底层存储LSM-Tree数据结构,数据读写流程。
    由底层存储结构和架构决定的性能特性,使用场景:海量数据存储、高性能的随机写、较高性能的随机读。
    集群服务故障的处理机制,集群工具,周边生态,性能调优以及最佳实践等。

    参考:

    入门环境搭建与Phoenix集成开发:
    SpringBoot - 使用Phoenix操作HBase教程2(使用JdbcTemplate) (hangge.com) 系列

    基础概念:
    我终于看懂了HBase,太不容易了... - 知乎 (zhihu.com)

    入门HBase,看这一篇就够了 - 简书 (jianshu.com)

    Hbase--读取数据快还是写数据快 - 简书 (jianshu.com)

    架构应用:
    云数据库HBase,云时代的大会数据存储 - 阿里云 (aliyun.com)

    分库分表技术演进暨最佳实践 - 简书 (jianshu.com)

    HBase实战 | 从MySQL到HBase:数据存储方案转型的演进-阿里云开发者社区 (aliyun.com)

    基于HBase快速构建 海量订单存储系统-阿里云开发者社区 (aliyun.com)

    参考书:
    《HBase实战》

    相关文章

      网友评论

        本文标题:HBase入门实践

        本文链接:https://www.haomeiwen.com/subject/gsirhrtx.html