通过ElasticSearch实现复杂大数据搜索

作者: 杭宇_8ba6 | 来源:发表于2019-04-26 17:53 被阅读7次

通过ElasticSearch实现复杂大数据搜索
Elasticsearch入门笔记
基于Elasticsearch实现搜索推荐
Elasticsearch 及其与 Python 的对接基础实现
Elasticsearch基本搜索
Elasticsearch
电商项目mall学习(5)使用ElasticSearch实现商品
ElasticSearch简介
滴滴 Elasticsearch集群
滴滴 Elasticsearch 多集群架构实践

what who

Elasticsearch不仅仅是Lucene和全文搜索，它还是
• 分布式的实时文件存储，每个字段都被索引并可被搜索
• 分布式的实时分析搜索引擎
• 可以扩展到上百台服务器，处理PB级结构化或非结构化数据

它还有一些特点：
第一：JSON存储属于文档存储
第二：采用倒排索引
第三：没有事务
它还有一些缺点：
第一：有1~2秒延迟落盘
第二：mapping定义不能随便修改，哪怕修改一个字段类型都属于全局重建索引
但有解决方案：采用同义词（alias）新建索引别名，当需要修改时，创建新的索引指向该索引别名，待新索引数据全部新建完，一键删除老索引指向新索引，平滑过度~
有一些基本概念要提一下：
我们首先要做的是存储员工数据，每个文档代表一个员工。在Elasticsearch中存储数据的行为就叫做索引(indexing)，不过在索引之前我们需要明确数据应该存储在哪里。
在Elasticsearch中，文档归属于一种类型(type),而这些类型存在于索引(index)中，我们可以画一些简单的对比图来类比传统关系型数据库：

Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices -> Types -> Documents -> Fields

Elasticsearch集群可以包含多个索引(indices)（数据库），
每一个索引可以包含多个类型(types)（表），
每一个类型包含多个文档(documents)（行）， Json
然后每个文档包含多个字段(Fields)（列）。 Json中的一个属性

索引(index)这个词在Elasticsearch中有着不同的含义，一个索引(index)就像是传统关系数据库中的数据库，它是相关文档存储的地方，index的复数是indices 或indexes。

where when

在什么时候下该使用ES呢？

搜索、日志分析（ELK）等等

我们的业务场景：订单数据量庞大，采用分库分表做数据存储，根据openId作为shardingKey，满足前台所有查询场景（所有请求都带openId来查订单信息，粒度是到用户），但后台运营需要查看所有订单信息，粒度就不是单个用户了，而且会带各种维度的查询条件来查询，但订单数据落在了不同的库不同的表中，通过db遍历搜索然后分页肯定不太现实，这种场景ElasticSearch再合适不过了~

和Apache生态的Solr比较呢？
solr.png

elasticsearch与solr的比较：
总结：
1、当单纯的对已有数据进行搜索时，Solr更快。
2、当实时建立索引时, Solr会产生io阻塞，查询性能较差, Elasticsearch具有明显的优势。
3、随着数据量的增加，Solr的搜索效率会变得更低，而Elasticsearch却没有明显的变化。
4、Solr的架构不适合实时搜索的应用。
5、Solr 支持更多格式的数据，而 Elasticsearch 仅支持json文件格式
6、Solr 在传统的搜索应用中表现好于 Elasticsearch，但在处理实时搜索应用时效率明显低于 Elasticsearch
7、Solr 是传统搜索应用的有力解决方案，但 Elasticsearch 更适用于新兴的实时搜索应用

how

ES迭代版本非常快，了解下ES API的技术栈

第一：学习《 [Elasticsearch权威指南]》
第二：用什么版本呢？

从1.7到2.X，初始化方式改了一遍，从2.X到5.X又变了，如今已经有6.X版本，最新目前已经到7.X了，但推荐使用5.X！
注意：2.x版本数据可以直接迁移到 5.x； 5.X版本的数据可以直接迁移到6.x；但是2.x版本数据无法直接迁移到6.x

ES 2.x版本
优点：

Java技术栈, spring-boot-starter-data-elasticsearch 支持in-memory方式启动，单元测试开箱即用
当前线上运行的主流版本，比较稳定
缺点：
版本较老，无法体验新功能，且性能不如5.x
后期升级数据迁移比较麻烦
周边工具版本比较混乱；Kinbana等工具的对应版本需要自己查

ES 5.x版本
优点

版本相对较新，性能较好官方宣称索引吞吐量提升在25%到80%之间，新的数据结构用于存储数值和地理位置字段，性能大幅提升；5.x版本搜索进行了重构，搜索聚合能力大幅提高
周边工具比较全，版本号比较友好。 ES官方在5.x时代统一了 ELK体系的版本号
升级到6.x也比较方便
缺点：
官方宣布已不支持In-Memory模式和Node Client已失效, 如果需要使用in-memory方式单测，需要自己手动配置ES版本、spring-data-elasticsearch版本、打开http访问开关等配置，并行使用REST API访问

第三：客户端如何使用呢？

Java技术栈目前有三种可以选择 Node Client, Transport Client, Rest API,
需要注明的是，官方已经标明NodeClient 已经过期，Transport Client 将在7.x版本开始不再支持，
最终会在7.x 统一到Rest API。目前Transport Client使用范围比较广；Rest API方式兼容性较好；除非在In-memory模式下运行单元测试，否则不推荐NodeClient。
本篇API使用还是采用Transport Client模式，

elasticsearch2.X调用方式：

 public static Client getClient() throws UnknownHostException {
        String clusterName = "elasticsearch";
        List<String> clusterNodes = Arrays.asList("http://172.16.0.29:9300");
        Settings settings = Settings.settingsBuilder().put("cluster.name", clusterName).build();  
        TransportClient client = TransportClient.builder().settings(settings).build();
        for (String node : clusterNodes) {
            URI host = URI.create(node);
            client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(host.getHost()), host.getPort()));
        }
        return client;
    }

elasticsearch5.X调用方式：

public static Client getClient() throws UnknownHostException {
        String clusterName = "shopmall-es";
        List<String> clusterNodes = Arrays.asList("http://172.16.32.69:9300","http://172.16.32.48:9300");
        Settings settings = Settings.builder().put("cluster.name", clusterName).build();
        TransportClient client = new PreBuiltTransportClient(settings);
        for (String node : clusterNodes) {
            URI host = URI.create(node);
            client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(host.getHost()), host.getPort()));
        }
        return client;

撸代码，首先引入需要的包

 <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>transport</artifactId>
            <version>5.3.2</version>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>5.3.2</version>
        </dependency>

        <dependency>
            <groupId>com.google.code.gson</groupId>
            <artifactId>gson</artifactId>
            <version>2.8.2</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api</artifactId>
            <version>2.11.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.11.1</version>
        </dependency>

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class Book {
    public static SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd");
    private String id;
    private String title;
    private List<String> authors;
    private String summary;
    private String publish_date;
    private Integer num_reviews;
    private String publisher;

    public Book(String id, String title, List<String> authors, String summary, String publish_date, Integer num_reviews, String publisher) {
        this.id = id;
        this.title = title;
        this.authors = authors;
        this.summary = summary;
        this.publish_date = publish_date;
        this.num_reviews = num_reviews;
        this.publisher = publisher;
    }

    public static SimpleDateFormat getSimpleDateFormat() {
        return simpleDateFormat;
    }

    public static void setSimpleDateFormat(SimpleDateFormat simpleDateFormat) {
        Book.simpleDateFormat = simpleDateFormat;
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public List<String> getAuthors() {
        return authors;
    }

    public void setAuthors(List<String> authors) {
        this.authors = authors;
    }

    public String getSummary() {
        return summary;
    }

    public void setSummary(String summary) {
        this.summary = summary;
    }

    public String getPublish_date() {
        return publish_date;
    }

    public void setPublish_date(String publish_date) {
        this.publish_date = publish_date;
    }

    public Integer getNum_reviews() {
        return num_reviews;
    }

    public void setNum_reviews(Integer num_reviews) {
        this.num_reviews = num_reviews;
    }

    public String getPublisher() {
        return publisher;
    }

    public void setPublisher(String publisher) {
        this.publisher = publisher;
    }
}

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class DataUtil {
    public static SimpleDateFormat dateFormater = new SimpleDateFormat("yyyy-MM-dd");

    /**
     * 模拟获取数据
     */
    public static List<Book> batchData() {
        List<Book> list = new LinkedList<>();
        Book book1 = new Book("1", "Elasticsearch: The Definitive Guide", Arrays.asList("clinton gormley", "zachary tong"),
                "A distibuted real-time search and analytics engine", "2015-02-07", 20, "oreilly");
        Book book2 = new Book("2", "Taming Text: How to Find, Organize, and Manipulate It", Arrays.asList("grant ingersoll", "thomas morton", "drew farris"),
                "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
                "2013-01-24", 12, "manning");
        Book book3 = new Book("3", "Elasticsearch in Action", Arrays.asList("radu gheorge", "matthew lee hinman", "roy russo"),
                "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
                "2015-12-03", 18, "manning");
        Book book4 = new Book("4", "Solr in Action", Arrays.asList("trey grainger", "timothy potter"), "Comprehensive guide to implementing a scalable search engine using Apache Solr",
                "2014-04-05", 23, "manning");

        list.add(book1);
        list.add(book2);
        list.add(book3);
        list.add(book4);

        return list;
    }

    public static Date parseDate(String dateStr) {
        try {
            return dateFormater.parse(dateStr);
        } catch (ParseException e) {
        }
        return null;
    }
}

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class Constants {

    // 字段名

    public static String ID = "id";
    public static String TITLE = "title";
    public static String AUTHORS = "authors";
    public static String SUMMARY = "summary";
    public static String PUBLISHDATE = "publish_date";
    public static String PUBLISHER = "publisher";
    public static String NUM_REVIEWS = "num_reviews";

    // 过滤要返回的字段

    public static String[] fetchFieldsTSPD = {ID, TITLE, SUMMARY, PUBLISHDATE};
    public static String[] fetchFieldsTA = {ID, TITLE, AUTHORS};


    // 高亮

    public static HighlightBuilder highlightS = new HighlightBuilder().field(SUMMARY);
}

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/24
 * @Description
 * @Version:1.0
 */
public class Response<T> {

    private ResponseCode responseCode;

    private T data;

    public Response(ResponseCode responseCode, T data) {
        this.responseCode = responseCode;
        this.data = data;
    }

    public Response(ResponseCode responseCode) {
        this.responseCode = responseCode;
    }
}

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/24
 * @Description
 * @Version:1.0
 */
public enum ResponseCode {

    ESTIMEOUT(1, "超时"),

    FAILEDSHARDS(2, "shard执行失败"),

    OK(0, "成功");

    private Integer code;

    private String desc;

    ResponseCode(Integer code, String desc) {
        this.code = code;
        this.desc = desc;
    }

    public Integer getCode() {
        return code;
    }

    public void setCode(Integer code) {
        this.code = code;
    }

    public String getDesc() {
        return desc;
    }

    public void setDesc(String desc) {
        this.desc = desc;
    }
}

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class CommonQueryUtils {

    public static Gson gson = new GsonBuilder().setDateFormat("YYYY-MM-dd").create();

    /**
     * 处理ES返回的数据，封装
     */
    public static List<Book> parseResponse(SearchResponse searchResponse) {
        List<Book> list = new LinkedList<>();
        //可打印总记录数
        System.out.println("parseResponse count is "+searchResponse.getHits().getTotalHits());

        for (SearchHit hit : searchResponse.getHits().getHits()) {
            // 用gson直接解析
            Book book = gson.fromJson(hit.getSourceAsString(), Book.class);

            list.add(book);
        }
        return list;
    }

    /**
     * 解析完数据后，构建 Response 对象
     */
    public static Response<List<Book>> buildResponse(SearchResponse searchResponse) {
        // 超时处理
        if (searchResponse.isTimedOut()) {
            return new Response<>(ResponseCode.ESTIMEOUT);
        }
        // 处理ES返回的数据
        List<Book> list = parseResponse(searchResponse);
        // 有shard执行失败
        if (searchResponse.getFailedShards() > 0) {
            return new Response<>(ResponseCode.FAILEDSHARDS, list);
        }
        return new Response<>(ResponseCode.OK, list);
    }
}

休息下~

关键逻辑开始了

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class EsConfig {

    //http是9200 api访问用9300
    private static String clusterNodes = "127.0.0.1:9300";

    //集群名称必须事先配置在elasticsearch.yml中
    private static String clusterName = "es-book-test";

    public static Client client() {
        Settings settings = Settings.builder().put("cluster.name", clusterName)
                                    .put("client.transport.sniff", true).build();

        TransportClient client = null;
        try {
             client = new PreBuiltTransportClient(settings);
            if (clusterNodes != null && !"".equals(clusterNodes)) {
                for (String node : clusterNodes.split(",")) {
                    String[] nodeInfo = node.split(":");
                    client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(nodeInfo[0]), Integer.parseInt(nodeInfo[1])));
                }
            }
        } catch (Exception e) {
            System.out.println("e"+e);
        }

        return client;
    }
}

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/23
 * @Description
 * @Version:1.0
 */
public class DDLAndBulk {

    private static String bookIndex = "book_index";

    private static String bookIndexAlias = "book_index_alias";

    private static String bookType = "book_type";

    public static Gson gson = new GsonBuilder().setDateFormat("YYYY-MM-dd").create();

    /**
     * 创建索引，设置 settings，设置mappings
     */
    public static void createIndex() {
        int settingShards = 1;
        int settingReplicas = 0;

        Client client = EsConfig.client();
        // 判断索引是否存在，存在则删除
        IndicesExistsResponse indicesExistsResponse = client.admin().indices().prepareExists(bookIndex).get();

        if (indicesExistsResponse.isExists()) {
            System.out.println("索引 " + bookIndex + " 存在！");
            // 删除索引，防止报异常  ResourceAlreadyExistsException[index [bookdb_index/yL05ZfXFQ4GjgOEM5x8tFQ] already exists
            DeleteIndexResponse deleteResponse = client.admin().indices().prepareDelete(bookIndex).get();
            if (deleteResponse.isAcknowledged()){
                System.out.println("索引" + bookIndex + "已删除");
            }else {
                System.out.println("索引" + bookIndex + "删除失败");
            }


        } else {
            System.out.println("索引 " + bookIndex + " 不存在！");
        }

        // 设置Settings，第一步新建index
        CreateIndexResponse response = client.admin().indices().prepareCreate(bookIndex)
                                             .setSettings(Settings.builder()
                                                                  .put("index.number_of_shards", settingShards)
                                                                  .put("index.number_of_replicas", settingReplicas))
                                             .get();

        // 查看结果
        GetSettingsResponse getSettingsResponse = client.admin().indices()
                                                        .prepareGetSettings(bookIndex).get();
        System.out.println("索引设置结果");
        for (ObjectObjectCursor<String, Settings> cursor : getSettingsResponse.getIndexToSettings()) {
            String index = cursor.key;
            Settings settings = cursor.value;
            Integer shards = settings.getAsInt("index.number_of_shards", null);
            Integer replicas = settings.getAsInt("index.number_of_replicas", null);
            System.out.println("index:" + index + ", shards:" + shards + ", replicas:" + replicas);
        }
    }

    /**
     * Bulk 批量插入数据
     */
    public static void bulk() {
        List<Book> list = DataUtil.batchData();

        Client client = EsConfig.client();

        BulkRequestBuilder bulkRequestBuilder = client.prepareBulk();

        //第二步新建type和创建mapping 其实也可以忽略，如果不设置mapping，则es通过source中数据自动添加数据类型
        if (!client.admin().indices().prepareTypesExists(bookIndex).setTypes(bookType).get().isExists()){
            client.admin().indices().preparePutMapping(bookIndex).setType(bookType).setSource(readFileTOString("es-book-mapping.json")).get()
                       .isAcknowledged();
            //第二步和第三步中间可以加一小步，可让之后mapping得到扩展，那就是创建索引别名
            createAlias(bookIndex, bookIndexAlias);
        }

        // 添加index操作到 bulk 中
        list.forEach(book -> {
            // 第三步插入数据,ps:第三步可以包含第二步的新建type，并省略mapping构建，让数据自动由es识别出数据类型
            // 新版的API中使用setSource时，参数的个数必须是偶数，否则需要加上 setSource(json, XContentType.JSON)
            bulkRequestBuilder.add(client.prepareIndex(bookIndexAlias, bookType, book.getId()).setSource(gson.toJson(book), XContentType.JSON));
        });

        BulkResponse responses = bulkRequestBuilder.get();
        if (responses.hasFailures()) {
            // bulk有失败
            for (BulkItemResponse res : responses) {
                System.out.println(res.getFailure());
            }
        }
    }

    /**
     * 创建别名
     */
    private static boolean createAlias(String indexName, String indexAlias) {
        Client client = EsConfig.client();

        // 获取老的索引和别名对应关系
        List<String> oldIndexName = new ArrayList<String>();
        GetAliasesResponse getAliases = client.admin().indices().prepareGetAliases(indexAlias).get();
        for (ObjectCursor<String> objectCursor : getAliases.getAliases().keys()) {
            if (!indexName.equals(objectCursor.value)) {
                oldIndexName.add(objectCursor.value);
            }
        }
        // 添加新的别名
        IndicesAliasesResponse r = client.admin().indices().prepareAliases().addAlias(indexName, indexAlias)
                                              .execute().actionGet();
        if (!r.isAcknowledged()) {
            throw new RuntimeException("[ES Check] indexName:" + indexName + ", 创建别名失败:" + indexAlias);
        }
        if (oldIndexName.size() > 0) {
            System.out.println("[ES Check] indexAlias:"+indexAlias+"获取到老的别名对应关系 oldIndexName:{}."+oldIndexName);
            // 删除老关系
            IndicesAliasesResponse r2 = client.admin().indices().prepareAliases()
                                                   .removeAlias(oldIndexName.toArray(new String[] {}), indexAlias).get();// .isAcknowledged();
            if (!r2.isAcknowledged()) {
                throw new RuntimeException("[ES Check] indexAlias:" + indexAlias + ", 删除老的别名对应关系失败:" + oldIndexName);
            } else {
                System.out.println("[ES Check] indexAlias:"+indexAlias+", 删除老的别名对应关系 oldIndexName:{}."+oldIndexName);
            }
        }

        return true;
    }

    public static String readFileTOString(String name) {

        InputStream inputStream = getResourceAsStream(name);

        if (null == inputStream){
            return null;
        }
        StringBuilder sb = new StringBuilder("");

        BufferedReader reader = null;
        try {
            reader = new BufferedReader(new InputStreamReader(inputStream));
            String tempString = null;
            // 一次读入一行，直到读入null为文件结束
            while ((tempString = reader.readLine()) != null) {
                sb.append(tempString);
            }
            reader.close();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e1) {
                }
            }
        }

        return sb.toString();
    }

    public static InputStream getResourceAsStream(String name) {

        InputStream resourceStream = null;

        // Try the current Thread context classloader
        ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
        resourceStream = classLoader.getResourceAsStream(name);
        if (resourceStream == null) {
            // Finally, try the classloader for this class
            classLoader = DDLAndBulk.class.getClassLoader();
            resourceStream = classLoader.getResourceAsStream(name);
        }

        return resourceStream;
    }

    public static void main(String[] args) {
        createIndex();
        bulk();
    }

}

{
    "book_type": {
        "properties": {
            "id": {
                "type": "long"
            },
            "title": {
                "type": "string",
                "index": "analyzed"
            },
            "authors": {
                "type": "string",
                "index": "not_analyzed"
            },
            "summary": {
                "type": "string",
                "index": "analyzed"
            },
            "publish_date": {
                "type": "date",
                "index": "not_analyzed"
            },
            "num_reviews": {
                "type": "integer",
                "index": "not_analyzed"
            },
            "publisher": {
                "type": "string",
                "index": "not_analyzed"
            }
        }
    }
}

/**
 * @Title:
 * @Auther: hangyu
 * @Date: 2019/4/24
 * @Description
 * @Version:1.0
 */
public class BasicMatchQueryService {

    private static Client client = EsConfig.client();

    private static String bookIndexAlias = "book_index_alias";

    private static String bookType = "book_type";


    public static void main(String[] args) {
        //multiBatch();
        //match();
        boolPage();
        //boolPageMatch();
        //fuzzy();
        //wildcard();
        //phrase();
        //phrasePrefix();
    }
    /**
     * 进行ES查询，执行请求前后打印出 查询语句 和 查询结果
     */
    private static SearchResponse requestGet(String queryName, SearchRequestBuilder requestBuilder) {
        System.out.println(queryName + " 构建的查询：" + requestBuilder.toString());
        SearchResponse searchResponse = requestBuilder.get();
        System.out.println(queryName + " 搜索结果：" + searchResponse.toString());
        return searchResponse;
    }

    /**
     * 1.1 对 "guide" 执行全文检索
     * 测试：http://localhost:8080/basicmatch/multimatch?query=guide
     */
    public static Response<List<Book>> multiBatch() {
        MultiMatchQueryBuilder queryBuilder = new MultiMatchQueryBuilder("guide");

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                    .setTypes(bookType).setQuery(queryBuilder);

        SearchResponse searchResponse = requestGet("multiBatch", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 1.2 指定特定字段检索
     * 测试：http://localhost:8080/basicmatch/match?title=in action&from=0&size=4
     */
    public static void match() {
        MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder(Constants.TITLE, "in Action");
        // 高亮
        HighlightBuilder highlightBuilder = new HighlightBuilder().field(Constants.TITLE).fragmentSize(200);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                    .setTypes(bookType).setQuery(matchQueryBuilder)
                                                    .setFrom(0).setSize(4)
                                                    .highlighter(highlightBuilder)
                                                    // 设置 _source 要返回的字段
                                                    .setFetchSource(Constants.fetchFieldsTSPD, null);

        SearchResponse searchResponse = requestGet("multiBatch", requestBuilder);

    }

    /**
     * 精确匹配
     * @return
     */
    public static Response<List<Book>> boolPage() {
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

        RangeQueryBuilder rangeQueryBuilder = new RangeQueryBuilder(Constants.NUM_REVIEWS)
                .gte(15).lte(50);

        boolQueryBuilder.should().add(QueryBuilders.termQuery(Constants.PUBLISHER, "manning"));
        boolQueryBuilder.should().add(QueryBuilders.termQuery(Constants.PUBLISHER, "oreilly"));

        //term 精确匹配 range 范围匹配
        //should表示或者关系(or) must表示并且(and) mustNot并且不是(and not)
        boolQueryBuilder.mustNot(QueryBuilders.termQuery(Constants.AUTHORS, "radu gheorge")).filter().add(rangeQueryBuilder);
        //boolQueryBuilder.must(rangeQueryBuilder).mustNot(QueryBuilders.termQuery(Constants.AUTHORS, "radu gheorge"));

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType).setQuery(boolQueryBuilder)
                .setFrom(0).setSize(10).addSort("id", SortOrder.DESC);

        SearchResponse searchResponse = requestGet("bool", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 全文匹配（针对text类型的字段进行全文检索）
     * @return
     */
    public static Response<List<Book>> boolPageMatch() {
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

        //matchQuery 分词匹配 matchPhraseQuery 短语匹配
        boolQueryBuilder.must(QueryBuilders.matchQuery(Constants.SUMMARY,"engine using"))
                        .mustNot(QueryBuilders.matchPhraseQuery(Constants.SUMMARY, "analytics engine"));

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType).setQuery(boolQueryBuilder)
                                                    .setFrom(0).setSize(10).addSort(SortBuilders.scoreSort());

        SearchResponse searchResponse = requestGet("bool", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     *  模糊检索(
     * @return
     */
    public static Response<List<Book>> fuzzy() {
        MultiMatchQueryBuilder queryBuilder = new MultiMatchQueryBuilder("elasticseares")
                .field("title").field("summary")
                .fuzziness(Fuzziness.AUTO);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                    .setTypes(bookType).setQuery(queryBuilder)
                                                    .setFetchSource(Constants.fetchFieldsTSPD, null)
                                                    .setSize(2);

        SearchResponse searchResponse = requestGet("fuzzy", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 通配符检索、要查找具有以 "t" 字母开头的作者的所有记录
     */
    public static Response<List<Book>> wildcard() {
        WildcardQueryBuilder wildcardQueryBuilder = new WildcardQueryBuilder(Constants.AUTHORS, "t*");
        HighlightBuilder highlightBuilder = new HighlightBuilder().field(Constants.AUTHORS, 200);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                    .setTypes(bookType).setQuery(wildcardQueryBuilder)
                                                    .setFetchSource(Constants.fetchFieldsTA, null)
                                                    .highlighter(highlightBuilder);

        SearchResponse searchResponse = requestGet("wildcard", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 正则表达式
     * @return
     */
    public static Response<List<Book>> regexp() {
        String regexp = "t[a-z]*n";
        RegexpQueryBuilder queryBuilder = new RegexpQueryBuilder(Constants.AUTHORS, regexp);
        HighlightBuilder highlightBuilder = new HighlightBuilder().field(Constants.AUTHORS);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias)
                                                    .setQuery(queryBuilder).setTypes(bookType).highlighter(highlightBuilder)
                                                    .setFetchSource(Constants.fetchFieldsTA, null);

        SearchResponse searchResponse = requestGet("regexp", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 短语匹配-必须在一个单元中同时包含这两个词，可以不相连，而不是分词包含其中之一
     *
     *  "summary":"Comprehensive guide to implementing a scalable search engine using Apache Solr",
     *      "summary":"A distibuted real-time search and analytics engine",
     * @return
     */
    public static Response<List<Book>> phrase() {
        MultiMatchQueryBuilder queryBuilder = new MultiMatchQueryBuilder("search engine")
                .field(Constants.SUMMARY)
                .type(MultiMatchQueryBuilder.Type.PHRASE).slop(3);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType)
                                                    .setQuery(queryBuilder)
                                                    .setFetchSource(Constants.fetchFieldsTSPD, null);


        SearchResponse searchResponse = requestGet("phrase", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }

    /**
     * 匹配词组前缀检索
     * @return
     */
    public static Response<List<Book>> phrasePrefix() {
        MatchPhrasePrefixQueryBuilder queryBuilder = new MatchPhrasePrefixQueryBuilder(Constants.SUMMARY, "search en")
                .slop(3).maxExpansions(10);

        SearchRequestBuilder requestBuilder = client.prepareSearch(bookIndexAlias).setTypes(bookType)
                                                    .setQuery(queryBuilder).setFetchSource(Constants.fetchFieldsTSPD, null);

        SearchResponse searchResponse = requestGet("phrasePrefix", requestBuilder);

        return CommonQueryUtils.buildResponse(searchResponse);
    }
}

之后的任务，研究下6.X甚至7.X的新特性，如何采用Rest Api去实现

接下来研究如何结合kibana的使用
如何结合logStash的使用
ElasticSearch进阶
最后一个小知识点fuzzy
fuzzy搜索技术 --> 自动将拼写错误的搜索文本，进行纠正，纠正以后去尝试匹配索引中的数据
surprize --> 拼写错误 --> surprise --> s -> z
surprize --> surprise -> z -> s，纠正一个字母，就可以匹配上，所以在fuziness指定的2范围内
surprize --> surprised -> z -> s，末尾加个d，纠正了2次，也可以匹配上，在fuziness指定的2范围内
surprize --> surprising -> z -> s，去掉e，ing，3次，总共要5次，才可以匹配上，始终纠正不了

经过测试，fuzzy可以自动纠错两次~

通过ElasticSearch实现复杂大数据搜索
what who Elasticsearch不仅仅是Lucene和全文搜索，它还是• 分布式的实时文件存储，每个字...
Elasticsearch入门笔记
Elasticsearch 全文搜索，结构化搜索、数据分析、复杂的语言处理、地理位置和对象间关联关系等。如何给数据...
基于Elasticsearch实现搜索推荐
在基于Elasticsearch实现搜索建议一文中我们曾经介绍过如何基于Elasticsearch来实现搜索建议，...
Elasticsearch 及其与 Python 的对接基础实现
什么是 Elasticsearch 想查数据就免不了搜索，搜索就离不开搜索引擎，百度、谷歌都是一个非常庞大复杂的搜...
Elasticsearch基本搜索
Elasticsearch 数据搜索 Elasticsearch基于JSON文档或者基于URL的请求进行搜索。El...
Elasticsearch
Elasticsearch 为什么要有搜索引擎？在传统的数据库中，如果想实现搜索一般我们会用like方式，但遇...
电商项目mall学习(5)使用ElasticSearch实现商品
向电商项目中添加ElasticSearch实现商品搜索 (一)使用框架 ElasticSearch Elastic...
ElasticSearch简介
ElasticSearch 1.用途搜索引擎：爬虫电商搜索：数据库站内搜索：系统数据文件搜索：磁盘使用案...
滴滴 Elasticsearch集群
Elasticsearch 是基于 Lucene 实现的分布式搜索引擎，提供了海量数据实时检索和分析能力。Elas...
滴滴 Elasticsearch 多集群架构实践
Elasticsearch 是基于 Lucene 实现的分布式搜索引擎，提供了海量数据实时检索和分析能力。Elas...