美文网首页Hadoop
Elasticsearch入门篇——基础知识

Elasticsearch入门篇——基础知识

作者: 冯文议 | 来源:发表于2019-03-11 21:47 被阅读33次

    还记得大二的时候,初入Java大门,就大言不惭的给老师说,我要开发一个搜索引擎,结果是各种学习,各种找资料,终于在期末的时候,做出了一个简单新闻搜索页面,搜索模块是使用了Lucene。

    今天,我们一起走进Elasticsearch的殿堂。

    Elastic

    以Elastic之名进行交易的数据搜索软件初创公司Elastic search于2018年10月5日(美国时间)上市。

    Elastic 上市

    Elastic Search 只是 Elastic 公司最出名的产品之一,其中还包括有分布式日志解决方案 ELK(Elastic Search、Logstash、Kibana)、Beats、ECE等。

    Elasticsearch

    官网:https://www.elastic.co/cn/products/elasticsearch

    Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

    翻译:

    Elasticsearch 是一个分布式的基于 RESTful 接口的搜索和分析引擎,它能够解决越来越多的使用场景。作为 Elastic Stack 的核心,它集中存储数据,可以发现预期及之外的结果。

    Elastic Stack 的核心

    Elasticsearch 是一个分布式、RESTful 风格的搜索和数据分析引擎,能够解决不断涌现出的各种用例。作为 Elastic Stack 的核心,它集中存储您的数据,帮助您发现意料之中以及意料之外的情况。

    Elastic Stack 的特点:

    查询 保持好奇心。从数据中探寻各种问题的答案。

    通过 Elasticsearch,您能够执行及合并多种类型的搜索(结构化数据、非结构化数据、地理位置、指标),搜索方式随心而变。先从一个简单的问题出发,试试看能够从中发现些什么。

    分析 大处着眼,全局在握。

    找到与查询最匹配的十个文档是一回事。但如果面对的是十亿行日志,又该如何解读呢?Elasticsearch 聚合让您能够从大处着眼,探索数据的趋势和模式。

    速度 Elasticsearch 很快。 快到不可思议。

    如果您能够立即获得答案,您与数据的关系就会发生变化。这样您就有条件进行迭代并涵盖更大的范围。

    但是要达到这样的速度并非易事。我们通过有限状态转换器实现了用于全文检索的倒排索引,实现了用于存储数值数据和地理位置数据的 BKD 树,以及用于分析的列存储。

    而且由于每个数据都被编入了索引,因此您再也不用因为某些数据没有索引而烦心。您可以用快到令人惊叹的速度使用和访问您的所有数据。

    可扩展性 可以在笔记本电脑上运行。 也可以在承载了 PB 级数据的成百上千台服务器上运行。

    原型环境和生产环境可无缝切换;无论 Elasticsearch 是在一个节点上运行,还是在一个包含 300 个节点的集群上运行,您都能够以相同的方式与 Elasticsearch 进行通信。

    它能够水平扩展,每秒钟可处理海量事件,同时能够自动管理索引和查询在集群中的分布方式,以实现极其流畅的操作。

    弹性 我们在您高飞的时候保驾护航。

    硬件故障。网络分割。Elasticsearch 为您检测这些故障并确保您的集群(和数据)的安全性和可用性。通过跨集群复制功能,辅助集群可以作为热备份随时投入使用。

    Elasticsearch 运行在一个分布式的环境中,从设计之初就考虑到了这一点,目的只有一个,让您永远高枕无忧。

    灵活性 具备多个案例场景?一个全有。

    数字、文本、地理位置、结构化数据、非结构化数据。欢迎使用所有数据类型。

    应用搜索安全分析指标日志分析只是全球众多公司利用 Elasticsearch 解决各种挑战的冰山一角。

    操作的乐趣 享受更多成功的时刻,告别垂头丧气的失落

    简单的事情就该简单做。我们确保 Elasticsearch 在任何规模下都能够易于操作,而无需在功能和性能方面做出牺牲。

    客户端库 使用您自己的编程语言与 Elasticsearch 进行交互

    Elasticsearch 使用的是标准的 RESTful 风格的 API 和 JSON。此外,我们还构建和维护了很多其他语言的客户端,例如 Java、Python、.NET、SQL 和 PHP。与此同时,我们的社区也贡献了很多客户端。这些客户端使用起来简单自然,而且就像 Elasticsearch 一样,不会对您的使用方式进行限制。

    尽享强大功能 延展 Elasticsearch

    为您的集群添加用户名和密码,监控 Elasticsearch 的性能表现,通过运行 Machine Learning 任务来发现异常等等,这些特性尽在 Elastic Stack 内置的多项功能

    通过 SecurityMonitoringAlertingReportingGraph 关联分析Machine Learning 等功能,获得更优的使用体验。

    HADOOP 和 SPARK Elasticsearch 加 Hadoop

    Hadoop 中有大量数据?您可以使用 Elasticsearch-Hadoop (ES-Hadoop)连接器,利用 Elasticsearch 的实时搜索和分析功能处理您的大数据。这是两大领域最大优势的融合。

    基础概念

    我可以这样说,学习完这些概念,你或许就能明白RESTful的含义了,所以,学习这些概念是很有必要的。

    Near Realtime (NRT)

    Elasticsearch is a near-realtime search platform. What this means is there is a slight latency (normally one second) from the time you index a document until the time it becomes searchable.

    集群(Cluster)

    A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. A cluster is identified by a unique name which by default is "elasticsearch". This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.

    Make sure that you don’t reuse the same cluster names in different environments, otherwise you might end up with nodes joining the wrong cluster. For instance you could use logging-dev, logging-stage, and logging-prod for the development, staging, and production clusters.

    Note that it is valid and perfectly fine to have a cluster with only a single node in it. Furthermore, you may also have multiple independent clusters each with its own unique cluster name.

    节点(Node)

    A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities. Just like a cluster, a node is identified by a name which by default is a random Universally Unique IDentifier (UUID) that is assigned to the node at startup. You can define any node name you want if you do not want the default. This name is important for administration purposes where you want to identify which servers in your network correspond to which nodes in your Elasticsearch cluster.

    A node can be configured to join a specific cluster by the cluster name. By default, each node is set up to join a cluster named elasticsearch which means that if you start up a number of nodes on your network and—assuming they can discover each other—they will all automatically form and join a single cluster named elasticsearch.

    In a single cluster, you can have as many nodes as you want. Furthermore, if there are no other Elasticsearch nodes currently running on your network, starting a single node will by default form a new single-node cluster named elasticsearch.

    索引(Index)

    An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data. An index is identified by a name (that must be all lowercase) and this name is used to refer to the index when performing indexing, search, update, and delete operations against the documents in it.

    In a single cluster, you can define as many indexes as you want.

    类型(Type)

    A type used to be a logical category/partition of your index to allow you to store different types of documents in the same index, e.g. one type for users, another type for blog posts. It is no longer possible to create multiple types in an index, and the whole concept of types will be removed in a later version. See Removal of mapping types for more.

    文档(Document)

    A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order. This document is expressed in JSON (JavaScript Object Notation) which is a ubiquitous internet data interchange format.

    Within an index/type, you can store as many documents as you want. Note that although a document physically resides in an index, a document actually must be indexed/assigned to a type inside an index.

    Shards & Replicas

    An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.

    To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.

    Sharding is important for two primary reasons:

    • It allows you to horizontally split/scale your content volume
    • It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput

    The mechanics of how a shard is distributed and also how its documents are aggregated back into search requests are completely managed by Elasticsearch and is transparent to you as the user.

    In a network/cloud environment where failures can be expected anytime, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason. To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.

    Replication is important for two primary reasons:

    • It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
    • It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.

    To summarize, each index can be split into multiple shards. An index can also be replicated zero (meaning no replicas) or more times. Once replicated, each index will have primary shards (the original shards that were replicated from) and replica shards (the copies of the primary shards).

    The number of shards and replicas can be defined per index at the time the index is created. After the index is created, you may also change the number of replicas dynamically anytime. You can change the number of shards for an existing index using the _shrink and _split APIs, however this is not a trivial task and pre-planning for the correct number of shards is the optimal approach.

    By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.

    结构

    ElasticSearch 结构

    上图来自SpringBoot整合ElasticSearch及源码

    安装

    配置:

    cluster.name: es-wyf
    node.name: master
    path.data: /Users/wenyifeng/Software/elasticsearch/data/master/data
    path.logs: /Users/wenyifeng/Software/elasticsearch/data/master/logs
    network.host: 127.0.0.1
    http.port: 9200
    discovery.zen.ping.unicast.hosts: ["127.0.0.1"]
    http.cors.enabled: true
    http.cors.allow-origin: "*"
    bootstrap.memory_lock: false
    bootstrap.system_call_filter: false
    

    说起配置,我是很头疼的,看过视频教程和很多博文,都失败,最后在老大的帮助下,成功了,给了我如上配置。感谢我们老大对我的帮助。

    Mac 安装

    #下载并解压,进入目录,后台运行
    ./bin/elasticsearch -d
    

    docker安装

    Pulling the image

    Obtaining Elasticsearch for Docker is as simple as issuing a docker pull command against the Elastic Docker registry.

    docker pull docker.elastic.co/elasticsearch/elasticsearch:6.6.1
    

    Alternatively, you can download other Docker images that contain only features available under the Apache 2.0 license. To download the images, go to www.docker.elastic.co.

    Running Elasticsearch from the command line

    Development modeedit

    Elasticsearch can be quickly started for development or testing use with the following command:

    docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.6.1
    

    官网教程:https://www.elastic.co/guide/en/elasticsearch/reference/6.6/docker.html#docker-cli-run-dev-mode

    CentOS安装

    这个问题,问下运维的同学吧。

    ElasticSearch Head

    服务器安装

    地址:https://github.com/mobz/elasticsearch-head

    从git上下下来,解压并进入,运行如下命令:

    npm install
    npm run start
    

    浏览器插件

    搜索 ElasticSearch Head

    创建索引

    假设我们创建一个学校,有一个初2一班,学生的属性有:学号、姓名、年龄。

    PUT http://localhost:9200/school

    {
        "mappings":{
            "c2_1": {
                "properties":{
                    "no":{
                        "type":"keyword"
                    },
                    "name":{
                        "type":"text"
                    },
                    "age":{
                        "type":"integer"
                    }
                }
            }
        }
    }
    

    返回:

    {
        "acknowledged": true,
        "shards_acknowledged": true,
        "index": "school"
    }
    
    创建索引 索引结构

    这样就表示索引创建成功。

    基本操作CRUD

    增加

    我们向里面插入一条数据,例如学号201901,姓名张三,年龄20。

    POST http://localhost:9200/school/c2_1

    {
        "no":"201901",
        "name":"张三",
        "age":20
    }
    

    返回:

    {
        "_index": "school",
        "_type": "c2_1",
        "_id": "8ygbK2kBenMJLC7I-EaK",
        "_version": 1,
        "result": "created",
        "_shards": {
            "total": 2,
            "successful": 2,
            "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1
    }
    

    查询

    我们通过返回的id查询一下:

    GET http://localhost:9200/school/c2_1/8ygbK2kBenMJLC7I-EaK

    返回:

    {
        "_index": "school",
        "_type": "c2_1",
        "_id": "8ygbK2kBenMJLC7I-EaK",
        "_version": 1,
        "_seq_no": 0,
        "_primary_term": 1,
        "found": true,
        "_source": {
            "no": "201901",
            "name": "张三",
            "age": 20
        }
    }
    

    查询出来了,这说明我们的增加和查询操作都成功了。

    修改

    我们把张三的年龄修改为22.

    PUT http://localhost:9200/school/c2_1/8ygbK2kBenMJLC7I-EaK

    {
        "no":"201901",
        "name":"张三",
        "age":22
    }
    

    返回:

    {
        "_index": "school",
        "_type": "c2_1",
        "_id": "8ygbK2kBenMJLC7I-EaK",
        "_version": 2,
        "result": "updated",
        "_shards": {
            "total": 2,
            "successful": 2,
            "failed": 0
        },
        "_seq_no": 1,
        "_primary_term": 1
    }
    

    查询一下:

    验证查询

    与预期一致。

    删除

    我们根据ID删除数据

    DELETE http://localhost:9200/school/c2_1/8ygbK2kBenMJLC7I-EaK

    返回:

    {
        "_index": "school",
        "_type": "c2_1",
        "_id": "8ygbK2kBenMJLC7I-EaK",
        "_version": 3,
        "result": "deleted",
        "_shards": {
            "total": 2,
            "successful": 2,
            "failed": 0
        },
        "_seq_no": 2,
        "_primary_term": 1
    }
    

    再查询就没有,如下返回:

    {
        "_index": "school",
        "_type": "c2_1",
        "_id": "8ygbK2kBenMJLC7I-EaK",
        "found": false
    }
    

    删除索引

    删除索引之后,索引下面的索引文档都将被删除。

    DELETE http://localhost:9200/school

    返回:

    {
        "acknowledged": true
    }
    

    高级查询

    接口

    通用查询API接口:http://localhost:9200/book/_search

    提交方式,可以是GET,也可以是POST(JSON)。

    构造数据

    我们首先构造数据,如下

    测试:book 索引 测试:book 数据

    Query Context

    Query Context:在查询过程中,除了判断文档是否满足查询条件外,Elasticsearch还会计算一个 _score 来标识匹配的程度,旨在判断目标文档和查询条件匹配的 有多好。简单来说就是,匹配到了吗?有多吻合呢?

    常用查询

    全文本搜索:针对文本类型的数据

    字段级别查询:针对结构化数据,如数字、日期等

    全文本匹配

    搜索:

    {
        "query":{
            "match":{
                "author":"明日科技"
            }
        }
    }
    

    结果:

    {
        "took": 4,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 1,
            "max_score": 0.9808292,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EdWwL2kBvvhXwNB9h50h",
                    "_score": 0.9808292,
                    "_source": {
                        "title": "Java从入门到精通(第4版)(附光盘)",
                        "author": "明日科技",
                        "word_count": 10002,
                        "publish_date": "2016-09-01"
                    }
                }
            ]
        }
    }
    

    这是模糊匹配,先进行分词,然后会把相关的都会查询出来,如下:

    搜索:

    {
        "query":{
            "match":{
                "title":"elasticsearch入门"
            }
        }
    }
    

    结果:

    {
        "took": 4,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 4,
            "max_score": 0.3252806,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "FNW0L2kBvvhXwNB9pJ0W",
                    "_score": 0.3252806,
                    "_source": {
                        "title": "Python编程 从入门到实践",
                        "author": "[美]埃里克·马瑟斯(Eric Matthes)",
                        "word_count": 10010,
                        "publish_date": "2016-07-10"
                    }
                },
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EdWwL2kBvvhXwNB9h50h",
                    "_score": 0.28924954,
                    "_source": {
                        "title": "Java从入门到精通(第4版)(附光盘)",
                        "author": "明日科技",
                        "word_count": 10002,
                        "publish_date": "2016-09-01"
                    }
                },
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EtWxL2kBvvhXwNB93p1F",
                    "_score": 0.2876821,
                    "_source": {
                        "title": "Elasticsearch实战",
                        "author": "[美] 拉杜·乔戈(Radu Gheorghe) 马修·李·欣曼(Matthew",
                        "word_count": 10005,
                        "publish_date": "2018-10-2"
                    }
                },
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "E9WzL2kBvvhXwNB9MZ3I",
                    "_score": 0.21268348,
                    "_source": {
                        "title": "Laravel入门与实战 构建主流PHP应用开发框架 Laravel开发框架教程书籍 ",
                        "author": "拉杜 乔戈",
                        "word_count": 10007,
                        "publish_date": "2017-11-03"
                    }
                }
            ]
        }
    }
    

    这里将 elasticsearch入门 分解为 elasticsearch入门

    match_phrase

    如果我们并不想那样进行分割,那我们换一个关键字 match_phrase

    搜索:

    {
        "query":{
            "match_phrase":{
                "title":"elasticsearch入门"
            }
        }
    }
    

    结果:

    {
        "took": 3,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 0,
            "max_score": null,
            "hits": []
        }
    }
    

    就什么也没有了。

    我们再搜索:

    {
        "query":{
            "match_phrase":{
                "title":"elasticsearch"
            }
        }
    }
    

    结果:

    {
        "took": 3,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 1,
            "max_score": 0.2876821,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EtWxL2kBvvhXwNB93p1F",
                    "_score": 0.2876821,
                    "_source": {
                        "title": "Elasticsearch实战",
                        "author": "[美] 拉杜·乔戈(Radu Gheorghe) 马修·李·欣曼(Matthew",
                        "word_count": 10005,
                        "publish_date": "2018-10-2"
                    }
                }
            ]
        }
    }
    

    我们再换关键字:

    {
        "query":{
            "match_phrase":{
                "title":"入门"
            }
        }
    }
    

    结果:

    {
        "took": 11,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 3,
            "max_score": 0.3252806,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "FNW0L2kBvvhXwNB9pJ0W",
                    "_score": 0.3252806,
                    "_source": {
                        "title": "Python编程 从入门到实践",
                        "author": "[美]埃里克·马瑟斯(Eric Matthes)",
                        "word_count": 10010,
                        "publish_date": "2016-07-10"
                    }
                },
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EdWwL2kBvvhXwNB9h50h",
                    "_score": 0.28924954,
                    "_source": {
                        "title": "Java从入门到精通(第4版)(附光盘)",
                        "author": "明日科技",
                        "word_count": 10002,
                        "publish_date": "2016-09-01"
                    }
                },
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "E9WzL2kBvvhXwNB9MZ3I",
                    "_score": 0.21268348,
                    "_source": {
                        "title": "Laravel入门与实战 构建主流PHP应用开发框架 Laravel开发框架教程书籍 ",
                        "author": "拉杜 乔戈",
                        "word_count": 10007,
                        "publish_date": "2017-11-03"
                    }
                }
            ]
        }
    }
    

    多个字段查询

    搜索:

    {
        "query":{
            "multi_match":{
                "query":"java",
                "fields":["title", "author"]
            }
        }
    }
    

    结果:

    {
        "took": 2,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 1,
            "max_score": 1.0623134,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EdWwL2kBvvhXwNB9h50h",
                    "_score": 1.0623134,
                    "_source": {
                        "title": "Java从入门到精通(第4版)(附光盘)",
                        "author": "明日科技",
                        "word_count": 10002,
                        "publish_date": "2016-09-01"
                    }
                }
            ]
        }
    }
    

    语法查询

    搜索:

    {
        "query":{
            "query_string":{
                "query":"java"
            }
        }
    }
    

    结果:

    {
        "took": 6,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 1,
            "max_score": 1.0623134,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EdWwL2kBvvhXwNB9h50h",
                    "_score": 1.0623134,
                    "_source": {
                        "title": "Java从入门到精通(第4版)(附光盘)",
                        "author": "明日科技",
                        "word_count": 10002,
                        "publish_date": "2016-09-01"
                    }
                }
            ]
        }
    }
    

    搜索:

    {
        "query":{
            "query_string":{
                "query":"java and 入门"
            }
        }
    }
    

    结果:

    {
        "took": 3,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 3,
            "max_score": 1.351563,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EdWwL2kBvvhXwNB9h50h",
                    "_score": 1.351563,
                    "_source": {
                        "title": "Java从入门到精通(第4版)(附光盘)",
                        "author": "明日科技",
                        "word_count": 10002,
                        "publish_date": "2016-09-01"
                    }
                },
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "FNW0L2kBvvhXwNB9pJ0W",
                    "_score": 0.3252806,
                    "_source": {
                        "title": "Python编程 从入门到实践",
                        "author": "[美]埃里克·马瑟斯(Eric Matthes)",
                        "word_count": 10010,
                        "publish_date": "2016-07-10"
                    }
                },
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "E9WzL2kBvvhXwNB9MZ3I",
                    "_score": 0.21268348,
                    "_source": {
                        "title": "Laravel入门与实战 构建主流PHP应用开发框架 Laravel开发框架教程书籍 ",
                        "author": "拉杜 乔戈",
                        "word_count": 10007,
                        "publish_date": "2017-11-03"
                    }
                }
            ]
        }
    }
    

    这个给我的个人感觉不好玩,就小小的尝试一下吧。

    字段级别的查询

    搜索:

    {
        "query":{
            "term":{
                "word_count":"100"
            }
        }
    }
    

    结果:

    {
        "took": 14,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 1,
            "max_score": 1,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "FdW1L2kBvvhXwNB91Z2M",
                    "_score": 1,
                    "_source": {
                        "title": "数据结构(C语言版)",
                        "author": "严蔚敏",
                        "word_count": 100,
                        "publish_date": "2007-03-09"
                    }
                }
            ]
        }
    }
    

    term:用于查询特定值

    范围查询

    例如,我们搜索大于10000个字的书:

    {
        "query":{
            "range":{
                "word_count":{
                    "gte":10000
                }
            }
        }
    }
    

    结果:

    {
        "took": 5,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 4,
            "max_score": 1,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EtWxL2kBvvhXwNB93p1F",
                    "_score": 1,
                    "_source": {
                        "title": "Elasticsearch实战",
                        "author": "[美] 拉杜·乔戈(Radu Gheorghe) 马修·李·欣曼(Matthew",
                        "word_count": 10005,
                        "publish_date": "2018-10-2"
                    }
                },
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EdWwL2kBvvhXwNB9h50h",
                    "_score": 1,
                    "_source": {
                        "title": "Java从入门到精通(第4版)(附光盘)",
                        "author": "明日科技",
                        "word_count": 10002,
                        "publish_date": "2016-09-01"
                    }
                },
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "E9WzL2kBvvhXwNB9MZ3I",
                    "_score": 1,
                    "_source": {
                        "title": "Laravel入门与实战 构建主流PHP应用开发框架 Laravel开发框架教程书籍 ",
                        "author": "拉杜 乔戈",
                        "word_count": 10007,
                        "publish_date": "2017-11-03"
                    }
                },
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "FNW0L2kBvvhXwNB9pJ0W",
                    "_score": 1,
                    "_source": {
                        "title": "Python编程 从入门到实践",
                        "author": "[美]埃里克·马瑟斯(Eric Matthes)",
                        "word_count": 10010,
                        "publish_date": "2016-07-10"
                    }
                }
            ]
        }
    }
    

    时间也可以搜索范围,比如我们搜索 (2016-09-01, 2018-01-01) 这个时间段之间要出版书:

    {
        "query":{
            "range":{
                "publish_date":{
                    "gt":"2016-09-01",
                    "lt":"2018-01-01"
                }
            }
        }
    }
    

    结果:

    {
        "took": 2,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 1,
            "max_score": 1,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "E9WzL2kBvvhXwNB9MZ3I",
                    "_score": 1,
                    "_source": {
                        "title": "Laravel入门与实战 构建主流PHP应用开发框架 Laravel开发框架教程书籍 ",
                        "author": "拉杜 乔戈",
                        "word_count": 10007,
                        "publish_date": "2017-11-03"
                    }
                }
            ]
        }
    }
    

    取等吧,[2016-09-01, 2018-01-01) :

    {
        "query":{
            "range":{
                "publish_date":{
                    "gte":"2016-09-01",
                    "lt":"2018-01-01"
                }
            }
        }
    }
    

    结果:

    {
        "took": 2,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 2,
            "max_score": 1,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EdWwL2kBvvhXwNB9h50h",
                    "_score": 1,
                    "_source": {
                        "title": "Java从入门到精通(第4版)(附光盘)",
                        "author": "明日科技",
                        "word_count": 10002,
                        "publish_date": "2016-09-01"
                    }
                },
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "E9WzL2kBvvhXwNB9MZ3I",
                    "_score": 1,
                    "_source": {
                        "title": "Laravel入门与实战 构建主流PHP应用开发框架 Laravel开发框架教程书籍 ",
                        "author": "拉杜 乔戈",
                        "word_count": 10007,
                        "publish_date": "2017-11-03"
                    }
                }
            ]
        }
    }
    

    当前日期,可用关键字 now

    关键字:filter

    在查询过程中,只判断文档是否满足条件,只有Yes或者No。

    举个例子,我们要搜索字数是100的有哪些,搜索如下:

    {
        "query":{
            "bool":{
                "filter":{
                    "term":{
                        "word_count":100
                    }
                }
            }
        }
    }
    

    结果:

    {
        "took": 40,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 1,
            "max_score": 0,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "FdW1L2kBvvhXwNB91Z2M",
                    "_score": 0,
                    "_source": {
                        "title": "数据结构(C语言版)",
                        "author": "严蔚敏",
                        "word_count": 100,
                        "publish_date": "2007-03-09"
                    }
                }
            ]
        }
    }
    

    复杂查询

    固定分数查询:constant_score

    {
        "query":{
            "constant_score":{
                "filter":{
                    "match":{
                        "title":"ElasticSearch"
                    }
                },
                "boost":2
            }
            
        }
    }
    

    不支持 match查询,支持filter查询。

    布尔查询:bool

    搜索:

    {
        "query": {
            "bool": {
                "should": [
                    {
                        "match": {
                            "author": "明日科技"
                        }
                    },
                    {
                        "match": {
                            "title": "ElasticSearch"
                        }
                    }
                ]
            }
        }
    }
    

    结果:

    {
        "took": 4,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 2,
            "max_score": 0.9808292,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EdWwL2kBvvhXwNB9h50h",
                    "_score": 0.9808292,
                    "_source": {
                        "title": "Java从入门到精通(第4版)(附光盘)",
                        "author": "明日科技",
                        "word_count": 10002,
                        "publish_date": "2016-09-01"
                    }
                },
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EtWxL2kBvvhXwNB93p1F",
                    "_score": 0.2876821,
                    "_source": {
                        "title": "Elasticsearch实战",
                        "author": "[美] 拉杜·乔戈(Radu Gheorghe) 马修·李·欣曼(Matthew",
                        "word_count": 10005,
                        "publish_date": "2018-10-2"
                    }
                }
            ]
        }
    }
    

    OR 的关系

    看一下 AND 关系:

    {
        "query": {
            "bool": {
                "must": [
                    {
                        "match": {
                            "author": "明日科技"
                        }
                    },
                    {
                        "match": {
                            "title": "ElasticSearch"
                        }
                    }
                ]
            }
        }
    }
    

    结果:

    {
        "took": 3,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 0,
            "max_score": null,
            "hits": []
        }
    }
    

    我们换一下关键字,搜索:

    {
        "query": {
            "bool": {
                "must": [
                    {
                        "match": {
                            "author": "明日科技"
                        }
                    },
                    {
                        "match": {
                            "title": "java"
                        }
                    }
                ]
            }
        }
    }
    

    结果:

    {
        "took": 13,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 1,
            "max_score": 2.0431426,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EdWwL2kBvvhXwNB9h50h",
                    "_score": 2.0431426,
                    "_source": {
                        "title": "Java从入门到精通(第4版)(附光盘)",
                        "author": "明日科技",
                        "word_count": 10002,
                        "publish_date": "2016-09-01"
                    }
                }
            ]
        }
    }
    

    加拦截条件,我们查看字数是10000的:

    {
        "query": {
            "bool": {
                "must": [
                    {
                        "match": {
                            "author": "明日科技"
                        }
                    },
                    {
                        "match": {
                            "title": "java"
                        }
                    }
                ],
                "filter": [
                    {
                        "term": {
                            "word_count": 10000
                        }
                    }
                ]
            }
        }
    }
    

    结果:

    {
        "took": 2,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 0,
            "max_score": null,
            "hits": []
        }
    }
    

    好吧,我们将word_count换成10002:

    {
        "query": {
            "bool": {
                "must": [
                    {
                        "match": {
                            "author": "明日科技"
                        }
                    },
                    {
                        "match": {
                            "title": "java"
                        }
                    }
                ],
                "filter":[{
                    "term":{
                        "word_count":10002
                    }
                }]
            }
        }
    }
    

    结果:

    {
        "took": 3,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 1,
            "max_score": 2.0431426,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EdWwL2kBvvhXwNB9h50h",
                    "_score": 2.0431426,
                    "_source": {
                        "title": "Java从入门到精通(第4版)(附光盘)",
                        "author": "明日科技",
                        "word_count": 10002,
                        "publish_date": "2016-09-01"
                    }
                }
            ]
        }
    }
    

    关键字:must_not

    例如,我不看java:

    {
        "query":{
            "bool":{
                "must_not":{
                    "term":{
                        "title":"java"
                    }
                }
            }
        }
    }
    

    结果:

    {
        "took": 5,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": 4,
            "max_score": 1,
            "hits": [
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "FdW1L2kBvvhXwNB91Z2M",
                    "_score": 1,
                    "_source": {
                        "title": "数据结构(C语言版)",
                        "author": "严蔚敏",
                        "word_count": 100,
                        "publish_date": "2007-03-09"
                    }
                },
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "EtWxL2kBvvhXwNB93p1F",
                    "_score": 1,
                    "_source": {
                        "title": "Elasticsearch实战",
                        "author": "[美] 拉杜·乔戈(Radu Gheorghe) 马修·李·欣曼(Matthew",
                        "word_count": 10005,
                        "publish_date": "2018-10-2"
                    }
                },
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "E9WzL2kBvvhXwNB9MZ3I",
                    "_score": 1,
                    "_source": {
                        "title": "Laravel入门与实战 构建主流PHP应用开发框架 Laravel开发框架教程书籍 ",
                        "author": "拉杜 乔戈",
                        "word_count": 10007,
                        "publish_date": "2017-11-03"
                    }
                },
                {
                    "_index": "book",
                    "_type": "it",
                    "_id": "FNW0L2kBvvhXwNB9pJ0W",
                    "_score": 1,
                    "_source": {
                        "title": "Python编程 从入门到实践",
                        "author": "[美]埃里克·马瑟斯(Eric Matthes)",
                        "word_count": 10010,
                        "publish_date": "2016-07-10"
                    }
                }
            ]
        }
    }
    

    关于搜索,这只是入门,我们会在第三节会继续讨论搜索。

    链接

    ElasticSearch 学习系列

    相关文章

      网友评论

        本文标题:Elasticsearch入门篇——基础知识

        本文链接:https://www.haomeiwen.com/subject/ssuvuqtx.html