美文网首页
HBase从入门到精通8:HBase Java API之过滤器实

HBase从入门到精通8:HBase Java API之过滤器实

作者: 金字塔下的小蜗牛 | 来源:发表于2020-04-02 09:48 被阅读0次

数据库在查询数据的时候都免不了要用到过滤器,HBase作为一款开源分布式非关系型数据库也不例外,因为在大数据场景下不可能同时查询所有数据。使用过滤器可以让我们不用查询全部结果就可以快速找到我们想要的数据。本节就来介绍一下HBase中过滤器的Java API的使用。

注意:在Windows上使用Eclipse操作HBase时,HBase节点的主机名一定不能是localhost!!!

0.搭建本地开发环境

(0)将HBase的主机名加入本地Hosts

编辑C:\Windows\System32\drivers\etc\hosts文件,加入下面的内容:

192.168.126.110 bigdata

(1)下载HBase依赖的Jars

使用FTP工具如WinSCP将HBase安装目录下的lib目录下的所有Jar包下载至本地目录如E:/hbaselibs中。

(2)新建HBaseTest工程

打开Eclipse IDE,依次选择”File”->”New”->”Java Project”,工程名字填写HBaseTest,”Finsh”。

(3)给工程添加依赖包

在HBaseTest工程上右键单击,依次选择”New”->”Folder”,文件夹名字填写”hbaselibs”,”Finish”。将E:/hbaselibs目录中的所有Jar包复制、粘贴到工程下面的hbaselibs文件夹中。展开hbaselibs文件夹,选中所有Jar包,右键,依次选择”Build Path”->”Add to Build Path”即可

(4)创建Demo Package

在HBaseTest工程下面的src文件上右键,依次选择”New”->”Package”,Package名字填写”Demo”,”Finish”。我们下面的测试代码都在该Demo Package下编写。

(5)创建HBaseFilterTest.Java类

右键Demo,依次选择”New”->”Class”,类名填写”HBaseFilterTest”,”Finish”。

(6)导入以下案例用到的包

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.ColumnPrefixFilter;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.FilterList.Operator;
import org.apache.hadoop.hbase.filter.MultipleColumnPrefixFilter;
import org.apache.hadoop.hbase.filter.RegexStringComparator;
import org.apache.hadoop.hbase.filter.RowFilter;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.Test;

(7)执行下面的代码导入测试数据

public class DataInit {
    @Test
    public void CreateTable() throws Exception{
        // 本地Hadoop环境,为了消除警告
        System.setProperty("hadoop.home.dir", "E:\\hadoop-2.4.1\\hadoop-2.4.1");
        // 配置信息
        Configuration conf = HBaseConfiguration.create();
        // Zookeeper的地址
        conf.set("hbase.zookeeper.quorum", "192.168.126.110");
        // 创建连接
        Connection conn = ConnectionFactory.createConnection(conf);
        // 创建客户端
        Admin admin = conn.getAdmin();
        if (admin.tableExists(TableName.valueOf("emp"))) {
            System.out.println("table has exist!");
            System.exit(0);
        } 
        else {
            // 指定表名
            HTableDescriptor ht = new HTableDescriptor(TableName.valueOf("emp"));
            // 指定列族名
            HColumnDescriptor hc = new HColumnDescriptor("empinfo");
            // 将列族加入到表中
            ht.addFamily(hc);
            // 创建表
            admin.createTable(ht);
            System.out.println("create table Success!");
        }
        // 关闭客户端
        admin.close();
    }

    @Test
    public void PutData() throws Exception{
        //本地Hadoop环境,为了消除警告
        System.setProperty("hadoop.home.dir", "E:\\hadoop-2.4.1\\hadoop-2.4.1");
        // 配置信息
        Configuration conf = HBaseConfiguration.create();
        // Zookeeper的地址
        conf.set("hbase.zookeeper.quorum", "192.168.126.110");
        // 创建连接
        Connection conn = ConnectionFactory.createConnection(conf);
        // 创建客户端
        Admin admin = conn.getAdmin();
        if (!admin.tableExists(TableName.valueOf("emp"))) {
            System.out.println("table does not exist!");
            System.exit(0);
        }
        else
        {
            //客户端
            Table table = conn.getTable(TableName.valueOf("emp"));

            //第一条数据
            Put put1 = new Put(Bytes.toBytes("7369"));
            put1.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"), Bytes.toBytes("SMITH"));
            Put put2 = new Put(Bytes.toBytes("7369"));
            put2.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), Bytes.toBytes("800"));

            //第二条数据
            Put put3 = new Put(Bytes.toBytes("7499"));
            put3.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"), Bytes.toBytes("ALLEN"));
            Put put4 = new Put(Bytes.toBytes("7499"));
            put4.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), Bytes.toBytes("1600")); 

            //第三条数据
            Put put5 = new Put(Bytes.toBytes("7521"));
            put5.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"), Bytes.toBytes("WARD"));
            Put put6 = new Put(Bytes.toBytes("7521"));
            put6.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), Bytes.toBytes("1250")); 

            //第四条数据
            Put put7 = new Put(Bytes.toBytes("7566"));
            put7.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"), Bytes.toBytes("JONES"));
            Put put8 = new Put(Bytes.toBytes("7566"));
            put8.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), Bytes.toBytes("2975")); 

            //第五条数据
            Put put9 = new Put(Bytes.toBytes("7654"));
            put9.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"), Bytes.toBytes("MARTIN"));
            Put put10 = new Put(Bytes.toBytes("7654"));
            put10.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), Bytes.toBytes("1250"));

            //第六条数据
            Put put11 = new Put(Bytes.toBytes("7698"));
            put11.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"), Bytes.toBytes("BLAKE"));
            Put put12 = new Put(Bytes.toBytes("7698"));
            put12.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), Bytes.toBytes("2850"));

            //第七条数据
            Put put13 = new Put(Bytes.toBytes("7782"));
            put13.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"), Bytes.toBytes("CLARK"));
            Put put14 = new Put(Bytes.toBytes("7782"));
            put14.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), Bytes.toBytes("2450"));

            //第八条数据
            Put put15 = new Put(Bytes.toBytes("7788"));
            put15.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"), Bytes.toBytes("SCOTT"));
            Put put16 = new Put(Bytes.toBytes("7788"));
            put16.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), Bytes.toBytes("3000")); 

            //第九条数据
            Put put17 = new Put(Bytes.toBytes("7839"));
            put17.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"), Bytes.toBytes("KING"));
            Put put18 = new Put(Bytes.toBytes("7839"));
            put18.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), Bytes.toBytes("5000")); 

            //第十条数据
            Put put19 = new Put(Bytes.toBytes("7844"));
            put19.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"), Bytes.toBytes("TURNER"));
            Put put20 = new Put(Bytes.toBytes("7844"));
            put20.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), Bytes.toBytes("1500")); 

            //第十一条数据
            Put put21 = new Put(Bytes.toBytes("7876"));
            put21.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"), Bytes.toBytes("ADAMS"));
            Put put22 = new Put(Bytes.toBytes("7876"));
            put22.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), Bytes.toBytes("1100")); 

            //第十二条数据
            Put put23 = new Put(Bytes.toBytes("7900"));
            put23.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"), Bytes.toBytes("JAMES"));
            Put put24 = new Put(Bytes.toBytes("7900"));
            put24.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), Bytes.toBytes("950"));

            //第十三条数据
            Put put25 = new Put(Bytes.toBytes("7902"));
            put25.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"), Bytes.toBytes("FORD"));
            Put put26 = new Put(Bytes.toBytes("7902"));
            put26.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), Bytes.toBytes("3000"));

            //第十四条数据
            Put put27 = new Put(Bytes.toBytes("7934"));
            put27.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("ename"), Bytes.toBytes("MILLER"));
            Put put28 = new Put(Bytes.toBytes("7934"));
            put28.addColumn(Bytes.toBytes("empinfo"), Bytes.toBytes("sal"), Bytes.toBytes("1300"));

            //构造List
            List<Put> list = new ArrayList<Put>();
            list.add(put1);
            list.add(put2);
            list.add(put3);
            list.add(put4);
            list.add(put5);
            list.add(put6);
            list.add(put7);
            list.add(put8);
            list.add(put9);
            list.add(put10);
            list.add(put11);
            list.add(put12);
            list.add(put13);
            list.add(put14);
            list.add(put15);
            list.add(put16);
            list.add(put17);
            list.add(put18);
            list.add(put19);
            list.add(put20);
            list.add(put21);
            list.add(put22);
            list.add(put23);
            list.add(put24);
            list.add(put25);
            list.add(put26);
            list.add(put27);
            list.add(put28); 

            //插入数据
            table.put(list);
            table.close(); 
            System.out.println("Put data Success!");
        }
        //关闭客户端
        admin.close();
    }
}
image

1.单列值过滤器

使用单列值过滤器有以下几个步骤:

  1. 指定查询的表:Table table = conn.getTable(TableName.valueOf(“emp”));
  2. 创建Scanner:Scan scanner = new Scan();
  3. 创建创建单列值过滤器:SingleColumnValueFilter(列族,列,比较规则,比较值)
  4. 将Filter加到Scanner:scanner.setFilter(filter);
  5. 执行查询:ResultScanner result = table.getScanner(scanner);
  6. 输出结果:r.getValue(列族, 列)
  7. 关闭表:table.close()

查询员工表emp中工资等于3000的员工姓名

public class HBaseFilterTest {
    @Test
    public void testSingleColumnValueFilter() throws Exception{
    //本地Hadoop环境,为了消除警告
    System.setProperty("hadoop.home.dir","E:\\hadoop-2.4.1\\hadoop-2.4.1");
    //配置信息
    Configuration conf = HBaseConfiguration.create();
    //Zookeeper的地址
    conf.set("hbase.zookeeper.quorum", "192.168.126.110");
    //创建连接
    Connection conn = ConnectionFactory.createConnection(conf);
    //创建客户端
    Admin admin = conn.getAdmin();
    if(!admin.tableExists(TableName.valueOf("emp"))) 
    {
            System.out.println("table does not exist!");
        System.exit(0);
    } 
    else
    {
        //指定要查询的表
        Table table = conn.getTable(TableName.valueOf("emp"));
        //创建Scanner
        Scan scanner = new Scan();
        //创建单列值过滤器:SingleColumnValueFilter    
        SingleColumnValueFilter filter = new SingleColumnValueFilter(
                                             Bytes.toBytes("empinfo"),//列族
                                             Bytes.toBytes("sal"),//列
                                             CompareOp.EQUAL,//比较规则
                                             Bytes.toBytes("3000"));//比较值
            //将过滤器加入到Scanner中
        scanner.setFilter(filter);
        //执行查询
        ResultScanner result = table.getScanner(scanner);
        //循环打印查询结果
        for(Result r:result){
            System.out.println(Bytes.toString(r.getValue(Bytes.toBytes("empinfo"),
               Bytes.toBytes("ename"))));
        }
        //关闭表
        table.close();
        System.out.println("Get data Success!");
    }
    //关闭客户端
    admin.close();
    }
}

执行Junit输出结果如下:

SCOTT
FORD
Get data Success!

2.列名前缀过滤器

使列名前缀过滤器有以下几个步骤:

  1. 指定查询的表:Table table = conn.getTable(TableName.valueOf(“emp”));
  2. 创建Scanner:Scan scanner = new Scan();
  3. 创建列名前缀过滤器:ColumnPrefixFilter filter = new ColumnPrefixFilter(列名);
  4. 将Filter加到Scanner:scanner.setFilter(filter);
  5. 执行查询:ResultScanner result = table.getScanner(scanner);
  6. 输出结果:r.getValue(列族, 列);
  7. 关闭表:table.close();

查询员工表emp中的所有员工姓名

public class HBaseFilterTest {
    @Test
    public void testColumnPrefixFilter() throws Exception{
    //本地Hadoop环境,为了消除警告
    System.setProperty("hadoop.home.dir","E:\\hadoop-2.4.1\\hadoop-2.4.1");
    //配置信息
    Configuration conf = HBaseConfiguration.create();
    //Zookeeper的地址
    conf.set("hbase.zookeeper.quorum", "192.168.126.110");
    //创建连接
    Connection conn = ConnectionFactory.createConnection(conf);
    //创建客户端
    Admin admin = conn.getAdmin();
    if (!admin.tableExists(TableName.valueOf("emp"))) 
    {
            System.out.println("table does not exist!");
        System.exit(0);
    } 
    else
    {
        //指定要查询的表
        Table table = conn.getTable(TableName.valueOf("emp"));
        //创建Scanner
        Scan scanner = new Scan();
        //创建列名前缀过滤器:ColumnPrefixFilter    
        ColumnPrefixFilter filter = new ColumnPrefixFilter(Bytes.toBytes("ename"));//列
        //将过滤器加入到Scanner中
        scanner.setFilter(filter);
        //执行查询
        ResultScanner result = table.getScanner(scanner);
        //循环打印查询结果
        for(Result r:result){
            System.out.println(Bytes.toString(r.getValue(Bytes.toBytes("empinfo"),
               Bytes.toBytes("ename"))));
        }
        //关闭表
        table.close();
            System.out.println("Get data Success!");
    }
        //关闭客户端
        admin.close();
    }
}

执行Junit输出结果如下:

SMITH
ALLEN
WARD
JONES
MARTIN
BLAKE
CLARK
SCOTT
KING
TURNER
ADAMS
JAMES
FORD
MILLER
Get data Success!

3.多列名前缀过滤器

使多列名前缀过滤器有以下几个步骤:

  1. 指定查询的表:Table table = conn.getTable(TableName.valueOf(“emp”));
  2. 创建Scanner:Scan scanner = new Scan();
  3. 创建多列名前缀过滤器:MultipleColumnPrefixFilter filter = new MultipleColumnPrefixFilter(列名数组);
  4. 将Filter加到Scanner:scanner.setFilter(filter);
  5. 执行查询:ResultScanner result = table.getScanner(scanner);
  6. 输出结果:r.getValue(列族, 列);
  7. 关闭表:table.close();

查询员工表emp中所有员工的姓名和薪水

public class HBaseFilterTest {
    @Test
    public void testMultipleColumnPrefixFilter() throws Exception{
    //本地Hadoop环境,为了消除警告
    System.setProperty("hadoop.home.dir","E:\\hadoop-2.4.1\\hadoop-2.4.1");
    //配置信息
    Configuration conf = HBaseConfiguration.create();
    //Zookeeper的地址
    conf.set("hbase.zookeeper.quorum", "192.168.126.110");
    //创建连接
    Connection conn = ConnectionFactory.createConnection(conf);
    //创建客户端
    Admin admin = conn.getAdmin();
    if (!admin.tableExists(TableName.valueOf("emp"))) 
    {
        System.out.println("table does not exist!");
        System.exit(0);
    } 
    else
    {
        //指定要查询的表
        Table table = conn.getTable(TableName.valueOf("emp"));
        //创建Scanner
        Scan scanner = new Scan();
        //将要查询的多个列名放到一个数组中
        byte[][] prefixes = new byte[][]{Bytes.toBytes("ename"),Bytes.toBytes("sal")};
        //创建多列名前缀过滤器:ColumnPrefixFilter    
        MultipleColumnPrefixFilter filter = new MultipleColumnPrefixFilter(prefixes);//列名数组
        //将过滤器加入到Scanner中
        scanner.setFilter(filter);
        //执行查询
        ResultScanner result = table.getScanner(scanner);
        //循环打印查询结果
        for(Result r:result){
            String ename = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename")));
            String sal = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("sal")));
            System.out.println(ename + "\t" + sal);
        }
        //关闭表
        table.close();
        System.out.println("Get data Success!");
    }
        //关闭客户端
        admin.close();
    }
}

执行Junit输出结果如下:

SMITH 800
ALLEN 1600
WARD 1250
JONES 2975
MARTIN 1250
BLAKE 2850
CLARK 2450
SCOTT 3000
KING 5000
TURNER 1500
ADAMS 1100
JAMES 950
FORD 3000
MILLER 1300
Get data Success!

4.Rowkey过滤器

使Rowkey过滤器有以下几个步骤:

  1. 指定查询的表:Table table = conn.getTable(TableName.valueOf(“emp”));
  2. 创建Scanner:Scan scanner = new Scan()
  3. 创建Row过滤器:RowFilter filter = new RowFilter(比较规则, Rowkey)
  4. 将Filter加到Scanner:scanner.setFilter(filter);
  5. 执行查询:ResultScanner result = table.getScanner(scanner);
  6. 输出结果:r.getValue(列族, 列)
  7. 关闭表:table.close()

查询员工表emp中Rowkey=7839的员工姓名和薪水

public class HBaseFilterTest {  
    @Test
    public void testRowFilter() throws Exception{
    //本地Hadoop环境,为了消除警告
    System.setProperty("hadoop.home.dir","E:\\hadoop-2.4.1\\hadoop-2.4.1");
    //配置信息
    Configuration conf = HBaseConfiguration.create();
    //Zookeeper的地址
    conf.set("hbase.zookeeper.quorum", "192.168.126.110");
    //创建连接
    Connection conn = ConnectionFactory.createConnection(conf);
    //创建客户端
    Admin admin = conn.getAdmin();
    if (!admin.tableExists(TableName.valueOf("emp"))) 
    {
            System.out.println("table does not exist!");
        System.exit(0);
    } 
    else
    {
        //指定要查询的表
        Table table = conn.getTable(TableName.valueOf("emp"));
        //创建Scanner
        Scan scanner = new Scan();
        //创建行键过滤器:RowFilter    
        RowFilter filter = new RowFilter(CompareOp.EQUAL, //比较规则
                         new RegexStringComparator("7839"));//行键
        //将过滤器加入到Scanner中
        scanner.setFilter(filter);
        //执行查询
        ResultScanner result = table.getScanner(scanner);
        //循环打印查询结果
        for(Result r:result){
            String ename = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename")));
            String sal = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("sal")));
            System.out.println(ename + "\t" + sal);
        }
        //关闭表
        table.close();
        System.out.println("Get data Success!");
    }
        //关闭客户端
        admin.close();
    }
}

执行Junit输出结果如下:

KING 5000
Get data Success!

5.混合使用多个过滤器

混合使用多个过滤器有以下几个步骤:

  1. 指定查询的表:Table table = conn.getTable(TableName.valueOf(“emp”));
  2. 创建Scanner:Scan scanner = new Scan();
  3. 创建多个过滤器:filter1、filter2、……
  4. 创建过滤器列表:FilterList list = new FilterList(多个过滤器使用规则);
  5. 将过滤器加到过滤器列表:list.addFilter(filter1); list.addFilter(filter2);
  6. 将过滤器列表加到Scanner:scanner.setFilter(list);
  7. 执行查询:ResultScanner result = table.getScanner(scanner);
  8. 输出结果:r.getValue(列族, 列);
  9. 关闭表:table.close();

查询员工表emp中Rowkey=7839的员工的姓名

public class HBaseFilterTest { 
    @Test
    public void testMultipleFilter() throws Exception{
    //本地Hadoop环境,为了消除警告
    System.setProperty("hadoop.home.dir","E:\\hadoop-2.4.1\\hadoop-2.4.1");
    //配置信息
    Configuration conf = HBaseConfiguration.create();
    //Zookeeper的地址
    conf.set("hbase.zookeeper.quorum", "192.168.126.110");
    //创建连接
    Connection conn = ConnectionFactory.createConnection(conf);
    //创建客户端
    Admin admin = conn.getAdmin();
    if (!admin.tableExists(TableName.valueOf("emp"))) 
    {
            System.out.println("table does not exist!");
        System.exit(0);
    } 
    else
    {
        //指定要查询的表
        Table table = conn.getTable(TableName.valueOf("emp"));
        //创建Scanner
        Scan scanner = new Scan();
        //创建行键过滤器:RowFilter    
        RowFilter filter1 = new RowFilter(CompareOp.EQUAL, //比较规则
                  new RegexStringComparator("7839"));//行键
        //创建列名前缀过滤器:ColumnPrefixFilter(列名前缀)
        ColumnPrefixFilter filter2 = new ColumnPrefixFilter(Bytes.toBytes("ename"));//列名前缀
        //创建过滤器列表:FilterList(过滤规则)
        FilterList list = new FilterList(Operator.MUST_PASS_ALL);
        list.addFilter(filter1);
        list.addFilter(filter2);
        //将过滤器列表加入到Scanner中
        scanner.setFilter(list);
        //执行查询
        ResultScanner result = table.getScanner(scanner);
        //循环打印查询结果
        for(Result r:result){
            String ename = Bytes.toString(r.getValue(Bytes.toBytes("empinfo"), Bytes.toBytes("ename")));
            System.out.println(ename);
        }
        //关闭表
        table.close();
        System.out.println("Get data Success!");
    }
        //关闭客户端
        admin.close();
    }
}

执行Junit输出结果如下:

KING
Get data Success!

相关文章

网友评论

      本文标题:HBase从入门到精通8:HBase Java API之过滤器实

      本文链接:https://www.haomeiwen.com/subject/cdocdhtx.html