美文网首页程序园程序员
通过ElasticSearch实现复杂大数据搜索

通过ElasticSearch实现复杂大数据搜索

作者: 杭宇_8ba6 | 来源:发表于2019-04-26 17:53 被阅读7次

    what who

    Elasticsearch不仅仅是Lucene和全文搜索,它还是
    • 分布式的实时文件存储,每个字段都被索引并可被搜索
    • 分布式的实时分析搜索引擎
    • 可以扩展到上百台服务器,处理PB级结构化或非结构化数据

    • 它还有一些特点
      第一:JSON存储属于文档存储
      第二:采用倒排索引
      第三:没有事务

    • 它还有一些缺点
      第一:有1~2秒延迟落盘
      第二:mapping定义不能随便修改,哪怕修改一个字段类型都属于全局重建索引
      但有解决方案:采用同义词(alias)新建索引别名,当需要修改时,创建新的索引指向该索引别名,待新索引数据全部新建完,一键删除老索引指向新索引,平滑过度~

    • 有一些基本概念要提一下
      我们首先要做的是存储员工数据,每个文档代表一个员工。在Elasticsearch中存储数据的行为就叫做索引(indexing),不过在索引之前我们需要明确数据应该存储在哪里。
      在Elasticsearch中,文档归属于一种类型(type),而这些类型存在于索引(index)中,我们可以画一些简单的对比图来类比传统关系型数据库:

    Relational DB -> Databases -> Tables -> Rows -> Columns
    Elasticsearch -> Indices -> Types -> Documents -> Fields

    Elasticsearch集群可以包含多个索引(indices)(数据库)
    每一个索引可以包含多个类型(types)(表)
    每一个类型包含多个文档(documents)(行), Json
    然后每个文档包含多个字段(Fields)(列)。 Json中的一个属性

    索引(index)这个词在Elasticsearch中有着不同的含义,一个索引(index)就像是传统关系数据库中的数据库,它是相关文档存储的地方,index的复数是indices 或indexes。

    where when

    • 在什么时候下该使用ES呢

    搜索、日志分析(ELK)等等

    我们的业务场景:订单数据量庞大,采用分库分表做数据存储,根据openId作为shardingKey,满足前台所有查询场景(所有请求都带openId来查订单信息,粒度是到用户),但后台运营需要查看所有订单信息,粒度就不是单个用户了,而且会带各种维度的查询条件来查询,但订单数据落在了不同的库不同的表中,通过db遍历搜索然后分页肯定不太现实,这种场景ElasticSearch再合适不过了~

    • 和Apache生态的Solr比较呢
      solr.png

    elasticsearch与solr的比较
    总结:
    1、当单纯的对已有数据进行搜索时,Solr更快。
    2、当实时建立索引时, Solr会产生io阻塞,查询性能较差, Elasticsearch具有明显的优势。
    3、随着数据量的增加,Solr的搜索效率会变得更低,而Elasticsearch却没有明显的变化。
    4、Solr的架构不适合实时搜索的应用。
    5、Solr 支持更多格式的数据,而 Elasticsearch 仅支持json文件格式
    6、Solr 在传统的搜索应用中表现好于 Elasticsearch,但在处理实时搜索应用时效率明显低于 Elasticsearch
    7、Solr 是传统搜索应用的有力解决方案,但 Elasticsearch 更适用于新兴的实时搜索应用

    how

    • ES迭代版本非常快,了解下ES API的技术栈

    第一:学习《 [Elasticsearch权威指南]》
    第二:用什么版本呢?

    从1.7到2.X,初始化方式改了一遍,从2.X到5.X又变了,如今已经有6.X版本,最新目前已经到7.X了,但推荐使用5.X!
    注意:2.x版本数据可以直接迁移到 5.x; 5.X版本的数据可以直接迁移到6.x; 但是2.x版本数据无法直接迁移到6.x

    ES 2.x版本
    优点:

    1. Java技术栈, spring-boot-starter-data-elasticsearch 支持in-memory方式启动,单元测试开箱即用
    2. 当前线上运行的主流版本, 比较稳定
      缺点:
    3. 版本较老,无法体验新功能,且性能不如5.x
    4. 后期升级数据迁移比较麻烦
    5. 周边工具版本比较混乱;Kinbana等工具的对应版本需要自己查

    ES 5.x版本
    优点

    1. 版本相对较新,性能较好官方宣称索引吞吐量提升在25%到80%之间,新的数据结构用于存储数值和地理位置字段,性能大幅提升;5.x版本搜索进行了重构,搜索聚合能力大幅提高
    2. 周边工具比较全,版本号比较友好。 ES官方在5.x时代统一了 ELK体系的版本号
    3. 升级到6.x也比较方便
      缺点:
    4. 官方宣布已不支持In-Memory模式和Node Client已失效, 如果需要使用in-memory方式单测,需要自己手动配置ES版本、spring-data-elasticsearch版本、打开http访问开关等配置,并行使用REST API访问

    第三:客户端如何使用呢?

    Java技术栈目前有三种可以选择 Node Client, Transport Client, Rest API,
    需要注明的是,官方已经标明NodeClient 已经过期,Transport Client 将在7.x版本开始不再支持,
    最终会在7.x 统一到Rest API。目前Transport Client使用范围比较广;Rest API方式兼容性较好;除非在In-memory模式下运行单元测试,否则不推荐NodeClient。
    本篇API使用还是采用Transport Client模式,

    elasticsearch2.X调用方式:

     public static Client getClient() throws UnknownHostException {
            String clusterName = "elasticsearch";
            List<String> clusterNodes = Arrays.asList("http://172.16.0.29:9300");
            Settings settings = Settings.settingsBuilder().put("cluster.name", clusterName).build();  
            TransportClient client = TransportClient.builder().settings(settings).build();
            for (String node : clusterNodes) {
                URI host = URI.create(node);
                client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(host.getHost()), host.getPort()));
            }
            return client;
        }
    

    elasticsearch5.X调用方式:

    public static Client getClient() throws UnknownHostException {
            String clusterName = "shopmall-es";
            List<String> clusterNodes = Arrays.asList("http://172.16.32.69:9300","http://172.16.32.48:9300");
            Settings settings = Settings.builder().put("cluster.name", clusterName).build();
            TransportClient client = new PreBuiltTransportClient(settings);
            for (String node : clusterNodes) {
                URI host = URI.create(node);
                client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(host.getHost()), host.getPort()));
            }
            return client;
    
    • 撸代码,首先引入需要的包
     <dependency>
                <groupId>org.elasticsearch.client</groupId>
                <artifactId>transport</artifactId>
                <version>5.3.2</version>
            </dependency>
            <dependency>
                <groupId>org.elasticsearch</groupId>
                <artifactId>elasticsearch</artifactId>
                <version>5.3.2</version>
            </dependency>
    
            <dependency>
                <groupId>com.google.code.gson</groupId>
                <artifactId>gson</artifactId>
                <version>2.8.2</version>
            </dependency>
    
            <dependency>
                <groupId>junit</groupId>
                <artifactId>junit</artifactId>
                <version>4.12</version>
                <scope>test</scope>
            </dependency>
    
            <dependency>
                <groupId>org.apache.logging.log4j</groupId>
                <artifactId>log4j-api</artifactId>
                <version>2.11.1</version>
            </dependency>
            <dependency>
                <groupId>org.apache.logging.log4j</groupId>
                <artifactId>log4j-core</artifactId>
                <version>2.11.1</version>
            </dependency>
    
    /**
     * @Title:
     * @Auther: hangyu
     * @Date: 2019/4/23
     * @Description
     * @Version:1.0
     */
    public class Book {
        public static SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd");
        private String id;
        private String title;
        private List<String> authors;
        private String summary;
        private String publish_date;
        private Integer num_reviews;
        private String publisher;
    
        public Book(String id, String title, List<String> authors, String summary, String publish_date, Integer num_reviews, String publisher) {
            this.id = id;
            this.title = title;
            this.authors = authors;
            this.summary = summary;
            this.publish_date = publish_date;
            this.num_reviews = num_reviews;
            this.publisher = publisher;
        }
    
        public static SimpleDateFormat getSimpleDateFormat() {
            return simpleDateFormat;
        }
    
        public static void setSimpleDateFormat(SimpleDateFormat simpleDateFormat) {
            Book.simpleDateFormat = simpleDateFormat;
        }
    
        public String getId() {
            return id;
        }
    
        public void setId(String id) {
            this.id = id;
        }
    
        public String getTitle() {
            return title;
        }
    
        public void setTitle(String title) {
            this.title = title;
        }
    
        public List<String> getAuthors() {
            return authors;
        }
    
        public void setAuthors(List<String> authors) {
            this.authors = authors;
        }
    
        public String getSummary() {
            return summary;
        }
    
        public void setSummary(String summary) {
            this.summary = summary;
        }
    
        public String getPublish_date() {
            return publish_date;
        }
    
        public void setPublish_date(String publish_date) {
            this.publish_date = publish_date;
        }
    
        public Integer getNum_reviews() {
            return num_reviews;
        }
    
        public void setNum_reviews(Integer num_reviews) {
            this.num_reviews = num_reviews;
        }
    
        public String getPublisher() {
            return publisher;
        }
    
        public void setPublisher(String publisher) {
            this.publisher = publisher;
        }
    }
    
    /**
     * @Title:
     * @Auther: hangyu
     * @Date: 2019/4/23
     * @Description
     * @Version:1.0
     */
    public class DataUtil {
        public static SimpleDateFormat dateFormater = new SimpleDateFormat("yyyy-MM-dd");
    
        /**
         * 模拟获取数据
         */
        public static List<Book> batchData() {
            List<Book> list = new LinkedList<>();
            Book book1 = new Book("1", "Elasticsearch: The Definitive Guide", Arrays.asList("clinton gormley", "zachary tong"),
                    "A distibuted real-time search and analytics engine", "2015-02-07", 20, "oreilly");
            Book book2 = new Book("2", "Taming Text: How to Find, Organize, and Manipulate It", Arrays.asList("grant ingersoll", "thomas morton", "drew farris"),
                    "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
                    "2013-01-24", 12, "manning");
            Book book3 = new Book("3", "Elasticsearch in Action", Arrays.asList("radu gheorge", "matthew lee hinman", "roy russo"),
                    "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
                    "2015-12-03", 18, "manning");
            Book book4 = new Book("4", "Solr in Action", Arrays.asList("trey grainger", "timothy potter"), "Comprehensive guide to implementing a scalable search engine using Apache Solr",
                    "2014-04-05", 23, "manning");
    
            list.add(book1);
            list.add(book2);
            list.add(book3);
            list.add(book4);
    
            return list;
        }
    
        public static Date parseDate(String dateStr) {
            try {
                return dateFormater.parse(dateStr);
            } catch (ParseException e) {
            }
            return null;
        }
    }
    
    /**
     * @Title:
     * @Auther: hangyu
     * @Date: 2019/4/23
     * @Description
     * @Version:1.0
     */
    public class Constants {
    
        // 字段名
    
        public static String ID = "id";
        public static String TITLE = "title";
        public static String AUTHORS = "authors";
        public static String SUMMARY = "summary";
        public static String PUBLISHDATE = "publish_date";
        public static String PUBLISHER = "publisher";
        public static String NUM_REVIEWS = "num_reviews";
    
        // 过滤要返回的字段
    
        public static String[] fetchFieldsTSPD = {ID, TITLE, SUMMARY, PUBLISHDATE};
        public static String[] fetchFieldsTA = {ID, TITLE, AUTHORS};
    
    
        // 高亮
    
        public static HighlightBuilder highlightS = new HighlightBuilder().field(SUMMARY);
    }
    
    /**
     * @Title:
     * @Auther: hangyu
     * @Date: 2019/4/24
     * @Description
     * @Version:1.0
     */
    public class Response<T> {
    
        private ResponseCode responseCode;
    
        private T data;
    
        public Response(ResponseCode responseCode, T data) {
            this.responseCode = responseCode;
            this.data = data;
        }
    
        public Response(ResponseCode responseCode) {
            this.responseCode = responseCode;
        }
    }
    
    /**
     * @Title:
     * @Auther: hangyu
     * @Date: 2019/4/24
     * @Description
     * @Version:1.0
     */
    public enum ResponseCode {
    
        ESTIMEOUT(1, "超时"),
    
        FAILEDSHARDS(2, "shard执行失败"),
    
        OK(0, "成功");
    
        private Integer code;
    
        private String desc;
    
        ResponseCode(Integer code, String desc) {
            this.code = code;
            this.desc = desc;
        }
    
        public Integer getCode() {
            return code;
        }
    
        public void setCode(Integer code) {
            this.code = code;
        }
    
        public String getDesc() {
            return desc;
        }
    
        public void setDesc(String desc) {
            this.desc = desc;
        }
    }
    
    /**
     * @Title:
     * @Auther: hangyu
     * @Date: 2019/4/23
     * @Description
     * @Version:1.0
     */
    public class CommonQueryUtils {
    
        public static Gson gson = new GsonBuilder().setDateFormat("YYYY-MM-dd").create();
    
        /**
         * 处理ES返回的数据,封装
         */
        public static List<Book> parseResponse(SearchResponse searchResponse) {
            List<Book> list = new LinkedList<>();
            //可打印总记录数
            System.out.println("parseResponse count is "+searchResponse.getHits().getTotalHits());
    
            for (SearchHit hit : searchResponse.getHits().getHits()) {
                // 用gson直接解析
                Book book = gson.fromJson(hit.getSourceAsString(), Book.class);
    
                list.add(book);
            }
            return list;
        }
    
        /**
         * 解析完数据后,构建 Response 对象
         */
        public static Response<List<Book>> buildResponse(SearchResponse searchResponse) {
            // 超时处理
            if (searchResponse.isTimedOut()) {
                return new Response<>(ResponseCode.ESTIMEOUT);
            }
            // 处理ES返回的数据
            List<Book> list = parseResponse(searchResponse);
            // 有shard执行失败
            if (searchResponse.getFailedShards() > 0) {
                return new Response<>(ResponseCode.FAILEDSHARDS, list);
            }
            return new Response<>(ResponseCode.OK, list);
        }
    }
    
    休息下~
    • 关键逻辑开始了
    /**
     * @Title:
     * @Auther: hangyu
     * @Date: 2019/4/23
     * @Description
     * @Version:1.0
     */
    public class EsConfig {
    
        //http是9200 api访问用9300
        private static String clusterNodes = "127.0.0.1:9300";
    
        //集群名称必须事先配置在elasticsearch.yml中
        private static String clusterName = "es-book-test";
    
        public static Client client() {
            Settings settings = Settings.builder().put("cluster.name", clusterName)
                                        .put("client.transport.sniff", true).build();
    
            TransportClient client = null;
            try {
                 client = new PreBuiltTransportClient(settings);
                if (clusterNodes != null && !"".equals(clusterNodes)) {
                    for (String node : clusterNodes.split(",")) {
                        String[] nodeInfo = node.split(":");
                        client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(nodeInfo[0]), Integer.parseInt(nodeInfo[1])));
                    }
                }
            } catch (Exception e) {
                System.out.println("e"+e);
            }
    
            return client;
        }
    }
    
    /**
     * @Title:
     * @Auther: hangyu
     * @Date: 2019/4/23
     * @Description
     * @Version:1.0
     */
    public class DDLAndBulk {
    
        private static String bookIndex = "book_index";
    
        private static String bookIndexAlias = "book_index_alias";
    
        private static String bookType = "book_type";
    
        public static Gson gson = new GsonBuilder().setDateFormat("YYYY-MM-dd").create();
    
        /**
         * 创建索引,设置 settings,设置mappings
         */
        public static void createIndex() {
            int settingShards = 1;
            int settingReplicas = 0;
    
            Client client = EsConfig.client();
            // 判断索引是否存在,存在则删除
            IndicesExistsResponse indicesExistsResponse = client.admin().indices().prepareExists(bookIndex).get();
    
            if (indicesExistsResponse.isExists()) {
                System.out.println("索引 " + bookIndex + " 存在!");
                // 删除索引,防止报异常  ResourceAlreadyExistsException[index [bookdb_index/yL05ZfXFQ4GjgOEM5x8tFQ] already exists
                DeleteIndexResponse deleteResponse = client.admin().indices().prepareDelete(bookIndex).get();
                if (deleteResponse.isAcknowledged()){
                    System.out.println("索引" + bookIndex + "已删除");
                }else {
                    System.out.println("索引" + bookIndex + "删除失败");
                }
    
    
            } else {
                System.out.println("索引 " + bookIndex + " 不存在!");
            }
    
            // 设置Settings,第一步新建index
            CreateIndexResponse response = client.admin().indices().prepareCreate(bookIndex)
                                                 .setSettings(Settings.builder()
                                                                      .put("index.number_of_shards", settingShards)
                                                                      .put("index.number_of_replicas", settingReplicas))
                                                 .get();
    
            // 查看结果
            GetSettingsResponse getSettingsResponse = client.admin().indices()
                                                            .prepareGetSettings(bookIndex).get();
            System.out.println("索引设置结果");
            for (ObjectObjectCursor<String, Settings> cursor : getSettingsResponse.getIndexToSettings()) {
                String index = cursor.key;
                Settings settings = cursor.value;
                Integer shards = settings.getAsInt("index.number_of_shards", null);
                Integer replicas = settings.getAsInt("index.number_of_replicas", null);
                System.out.println("index:" + index + ", shards:" + shards + ", replicas:" + replicas);
            }
        }
    
        /**
         * Bulk 批量插入数据
         */
        public static void bulk() {
            List<Book> list = DataUtil.batchData();
    
            Client client = EsConfig.client();
    
            BulkRequestBuilder bulkRequestBuilder = client.prepareBulk();
    
            //第二步新建type和创建mapping 其实也可以忽略,如果不设置mapping,则es通过source中数据自动添加数据类型
            if (!client.admin().indices().prepareTypesExists(bookIndex).setTypes(bookType).get().isExists()){
                client.admin().indices().preparePutMapping(bookIndex).setType(bookType).setSource(readFileTOString("es-book-mapping.json")).get()
                           .isAcknowledged();
                //第二步和第三步中间可以加一小步,可让之后mapping得到扩展,那就是创建索引别名
                createAlias(bookIndex, bookIndexAlias);
            }
    
            // 添加index操作到 bulk 中
            list.forEach(book -> {
                // 第三步插入数据,ps:第三步可以包含第二步的新建type,并省略mapping构建,让数据自动由es识别出数据类型
                // 新版的API中使用setSource时,参数的个数必须是偶数,否则需要加上 setSource(json, XContentType.JSON)
                bulkRequestBuilder.add(client.prepareIndex(bookIndexAlias, bookType, book.getId()).setSource(gson.toJson(book), XContentType.JSON));
            });
    
            BulkResponse responses = bulkRequestBuilder.get();
            if (responses.hasFailures()) {
                // bulk有失败
                for (BulkItemResponse res : responses) {
                    System.out.println(res.getFailure());
                }
            }
        }
    
        /**
         * 创建别名
         */
        private static boolean createAlias(String indexName, String indexAlias) {
            Client client = EsConfig.client();
    
            // 获取老的索引和别名对应关系
            List<String> oldIndexName = new ArrayList<String>();
            GetAliasesResponse getAliases = client.admin().indices().prepareGetAliases(indexAlias).get();
            for (ObjectCursor<String> objectCursor : getAliases.getAliases().keys()) {
                if (!indexName.equals(objectCursor.value)) {
                    oldIndexName.add(objectCursor.value);
                }
            }
            // 添加新的别名
            IndicesAliasesResponse r = client.admin().indices().prepareAliases().addAlias(indexName, indexAlias)
                                                  .execute().actionGet();
            if (!r.isAcknowledged()) {
                throw new RuntimeException("[ES Check] indexName:" + indexName + ", 创建别名失败:" + indexAlias);
            }
            if (oldIndexName.size() > 0) {
                System.out.println("[ES Check] indexAlias:"+indexAlias+"获取到老的别名对应关系 oldIndexName:{}."+oldIndexName);
                // 删除老关系
                IndicesAliasesResponse r2 = client.admin().indices().prepareAliases()
                                                       .removeAlias(oldIndexName.toArray(new String[] {}), indexAlias).get();// .isAcknowledged();
                if (!r2.isAcknowledged()) {
                    throw new RuntimeException("[ES Check] indexAlias:" + indexAlias + ", 删除老的别名对应关系失败:" + oldIndexName);
                } else {
                    System.out.println("[ES Check] indexAlias:"+indexAlias+", 删除老的别名对应关系 oldIndexName:{}."+oldIndexName);
                }
            }
    
            return true;
        }
    
        public static String readFileTOString(String name) {
    
            InputStream inputStream = getResourceAsStream(name);
    
            if (null == inputStream){
                return null;
            }
            StringBuilder sb = new StringBuilder("");
    
            BufferedReader reader = null;
            try {
                reader = new BufferedReader(new InputStreamReader(inputStream));
                String tempString = null;
                // 一次读入一行,直到读入null为文件结束
                while ((tempString = reader.readLine()) != null) {
                    sb.append(tempString);
                }
                reader.close();
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                if (reader != null) {
                    try {
                        reader.close();
                    } catch (IOException e1) {
                    }
                }
            }
    
            return sb.toString();
        }
    
        public static InputStream getResourceAsStream(String name) {
    
            InputStream resourceStream = null;
    
            // Try the current Thread context classloader
            ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
            resourceStream = classLoader.getResourceAsStream(name);
            if (resourceStream == null) {
                // Finally, try the classloader for this class
                classLoader = DDLAndBulk.class.getClassLoader();
                resourceStream = classLoader.getResourceAsStream(name);
            }
    
            return resourceStream;
        }
    
        public static void main(String[] args) {
            createIndex();
            bulk();
        }
    
    }
    
    {
        "book_type": {
            "properties": {
                "id": {
                    "type": "long"
                },
                "title": {
                    "type": "string",
                    "index": "analyzed"
                },
                "authors": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "summary": {
                    "type": "string",
                    "index": "analyzed"
                },
                "publish_date": {
                    "type": "date",
                    "index": "not_analyzed"
                },
                "num_reviews": {
                    "type": "integer",
                    "index": "not_analyzed"
                },
                "publisher": {
                    "type": "string",
                    "index": "not_analyzed"
                }
            }
        }
    }
    
    /**
     * @Title:
     * @Auther: hangyu
     * @Date: 2019/4/24
     * @Description
     * @Version:1.0
     */
    public class BasicMatchQueryService {
    
        private static Client client = EsConfig.client();
    
        private static String bookIndexAlias = "book_index_alias";
    
        private static String bookType = "book_type";
    
    
        public static void main(String[] args) {
            //multiBatch();
            //match();
            boolPage();
            //boolPageMatch();
            //fuzzy();
            //wildcard();
            //phrase();
            //phrasePrefix();
        }
        /**
         * 进行ES查询,执行请求前后打印出 查询语句 和 查询结果
         */
        private static SearchResponse requestGet(String queryName, SearchRequestBuilder requestBuilder) {
            System.out.println(queryName + " 构建的查询:" + requestBuilder.toString());
            SearchResponse searchResponse = requestBuilder.get();
            System.out.println(queryName + " 搜索结果:" + searchResponse.toString());
            return searchResponse;
        }
    
        /**
         * 1.1 对 "guide" 执行全文检索
         * 测试:http://localhost:8080/basicmatch/multimatch?query=guide
         */
        public static Response<List<Book>> multiBatch() {
            MultiMatchQueryBuilder queryBuilder = new MultiMatchQueryBuilder("guide");
    
            SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                        .setTypes(bookType).setQuery(queryBuilder);
    
            SearchResponse searchResponse = requestGet("multiBatch", requestBuilder);
    
            return CommonQueryUtils.buildResponse(searchResponse);
        }
    
        /**
         * 1.2 指定特定字段检索
         * 测试:http://localhost:8080/basicmatch/match?title=in action&from=0&size=4
         */
        public static void match() {
            MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder(Constants.TITLE, "in Action");
            // 高亮
            HighlightBuilder highlightBuilder = new HighlightBuilder().field(Constants.TITLE).fragmentSize(200);
    
            SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                        .setTypes(bookType).setQuery(matchQueryBuilder)
                                                        .setFrom(0).setSize(4)
                                                        .highlighter(highlightBuilder)
                                                        // 设置 _source 要返回的字段
                                                        .setFetchSource(Constants.fetchFieldsTSPD, null);
    
            SearchResponse searchResponse = requestGet("multiBatch", requestBuilder);
    
        }
    
        /**
         * 精确匹配
         * @return
         */
        public static Response<List<Book>> boolPage() {
            BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
    
            RangeQueryBuilder rangeQueryBuilder = new RangeQueryBuilder(Constants.NUM_REVIEWS)
                    .gte(15).lte(50);
    
            boolQueryBuilder.should().add(QueryBuilders.termQuery(Constants.PUBLISHER, "manning"));
            boolQueryBuilder.should().add(QueryBuilders.termQuery(Constants.PUBLISHER, "oreilly"));
    
            //term 精确匹配 range 范围匹配
            //should表示或者关系(or) must表示并且(and) mustNot并且不是(and not)
            boolQueryBuilder.mustNot(QueryBuilders.termQuery(Constants.AUTHORS, "radu gheorge")).filter().add(rangeQueryBuilder);
            //boolQueryBuilder.must(rangeQueryBuilder).mustNot(QueryBuilders.termQuery(Constants.AUTHORS, "radu gheorge"));
    
            SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType).setQuery(boolQueryBuilder)
                    .setFrom(0).setSize(10).addSort("id", SortOrder.DESC);
    
            SearchResponse searchResponse = requestGet("bool", requestBuilder);
    
            return CommonQueryUtils.buildResponse(searchResponse);
        }
    
        /**
         * 全文匹配(针对text类型的字段进行全文检索)
         * @return
         */
        public static Response<List<Book>> boolPageMatch() {
            BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
    
            //matchQuery 分词匹配 matchPhraseQuery 短语匹配
            boolQueryBuilder.must(QueryBuilders.matchQuery(Constants.SUMMARY,"engine using"))
                            .mustNot(QueryBuilders.matchPhraseQuery(Constants.SUMMARY, "analytics engine"));
    
            SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType).setQuery(boolQueryBuilder)
                                                        .setFrom(0).setSize(10).addSort(SortBuilders.scoreSort());
    
            SearchResponse searchResponse = requestGet("bool", requestBuilder);
    
            return CommonQueryUtils.buildResponse(searchResponse);
        }
    
        /**
         *  模糊检索(
         * @return
         */
        public static Response<List<Book>> fuzzy() {
            MultiMatchQueryBuilder queryBuilder = new MultiMatchQueryBuilder("elasticseares")
                    .field("title").field("summary")
                    .fuzziness(Fuzziness.AUTO);
    
            SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                        .setTypes(bookType).setQuery(queryBuilder)
                                                        .setFetchSource(Constants.fetchFieldsTSPD, null)
                                                        .setSize(2);
    
            SearchResponse searchResponse = requestGet("fuzzy", requestBuilder);
    
            return CommonQueryUtils.buildResponse(searchResponse);
        }
    
        /**
         * 通配符检索、要查找具有以 "t" 字母开头的作者的所有记录
         */
        public static Response<List<Book>> wildcard() {
            WildcardQueryBuilder wildcardQueryBuilder = new WildcardQueryBuilder(Constants.AUTHORS, "t*");
            HighlightBuilder highlightBuilder = new HighlightBuilder().field(Constants.AUTHORS, 200);
    
            SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                        .setTypes(bookType).setQuery(wildcardQueryBuilder)
                                                        .setFetchSource(Constants.fetchFieldsTA, null)
                                                        .highlighter(highlightBuilder);
    
            SearchResponse searchResponse = requestGet("wildcard", requestBuilder);
    
            return CommonQueryUtils.buildResponse(searchResponse);
        }
    
        /**
         * 正则表达式
         * @return
         */
        public static Response<List<Book>> regexp() {
            String regexp = "t[a-z]*n";
            RegexpQueryBuilder queryBuilder = new RegexpQueryBuilder(Constants.AUTHORS, regexp);
            HighlightBuilder highlightBuilder = new HighlightBuilder().field(Constants.AUTHORS);
    
            SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                        .setQuery(queryBuilder).setTypes(bookType).highlighter(highlightBuilder)
                                                        .setFetchSource(Constants.fetchFieldsTA, null);
    
            SearchResponse searchResponse = requestGet("regexp", requestBuilder);
    
            return CommonQueryUtils.buildResponse(searchResponse);
        }
    
        /**
         * 短语匹配-必须在一个单元中同时包含这两个词,可以不相连,而不是分词包含其中之一
         *
         *  "summary":"Comprehensive guide to implementing a scalable search engine using Apache Solr",
         *      "summary":"A distibuted real-time search and analytics engine",
         * @return
         */
        public static Response<List<Book>> phrase() {
            MultiMatchQueryBuilder queryBuilder = new MultiMatchQueryBuilder("search engine")
                    .field(Constants.SUMMARY)
                    .type(MultiMatchQueryBuilder.Type.PHRASE).slop(3);
    
            SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType)
                                                        .setQuery(queryBuilder)
                                                        .setFetchSource(Constants.fetchFieldsTSPD, null);
    
    
            SearchResponse searchResponse = requestGet("phrase", requestBuilder);
    
            return CommonQueryUtils.buildResponse(searchResponse);
        }
    
        /**
         * 匹配词组前缀检索
         * @return
         */
        public static Response<List<Book>> phrasePrefix() {
            MatchPhrasePrefixQueryBuilder queryBuilder = new MatchPhrasePrefixQueryBuilder(Constants.SUMMARY, "search en")
                    .slop(3).maxExpansions(10);
    
            SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType)
                                                        .setQuery(queryBuilder).setFetchSource(Constants.fetchFieldsTSPD, null);
    
            SearchResponse searchResponse = requestGet("phrasePrefix", requestBuilder);
    
            return CommonQueryUtils.buildResponse(searchResponse);
        }
    }
    

    之后的任务,研究下6.X甚至7.X的新特性,如何采用Rest Api去实现

    • 接下来研究如何结合kibana的使用
    • 如何结合logStash的使用
    • ElasticSearch进阶
    • 最后一个小知识点fuzzy
      fuzzy搜索技术 --> 自动将拼写错误的搜索文本,进行纠正,纠正以后去尝试匹配索引中的数据
      surprize --> 拼写错误 --> surprise --> s -> z
      surprize --> surprise -> z -> s,纠正一个字母,就可以匹配上,所以在fuziness指定的2范围内
      surprize --> surprised -> z -> s,末尾加个d,纠正了2次,也可以匹配上,在fuziness指定的2范围内
      surprize --> surprising -> z -> s,去掉e,ing,3次,总共要5次,才可以匹配上,始终纠正不了

    经过测试,fuzzy可以自动纠错两次~

    相关文章

      网友评论

        本文标题:通过ElasticSearch实现复杂大数据搜索

        本文链接:https://www.haomeiwen.com/subject/rdsqnqtx.html