美文网首页
elasticsearch之七search搜索详解

elasticsearch之七search搜索详解

作者: Java及SpringBoot | 来源:发表于2020-03-06 15:45 被阅读0次

    个人专题目录


    1. search搜索入门

    1.1 搜索语法入门

    query phase

    • 搜索请求发送到某一个coordinate node,构构建一个priority queue,长度以paging操作from和size为准,默认为10
    • coordinate node将请求转发到所有shard,每个shard本地搜索,并构建一个本地的priority queue
    • 各个shard将自己的priority queue返回给coordinate node,并构建一个全局的priority queue

    replica shard如何提升搜索吞吐量

    一次请求要打到所有shard的一个replica/primary上去,如果每个shard都有多个replica,那么同时并发过来的搜索请求可以同时打到其他的replica上去

    query string search

    search的参数都是类似http请求头中的字符串参数提供搜索条件的。

    GET [/index_name/type_name/]_search[?parameter_name=parameter_value&...]

    GET /book/_search
    
    {
      "took" : 969,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 3,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "book",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0,
            "_source" : {
              "name" : "Bootstrap开发",
              "description" : "Bootstrap是由Twitter推出的一个前台页面开发css框架,是一个非常流行的开发框架,此框架集成了多种页面效果。此开发框架包含了大量的CSS、JS程序代码,可以帮助开发者(尤其是不擅长css页面开发的程序人员)轻松的实现一个css,不受浏览器限制的精美界面css效果。",
              "studymodel" : "201002",
              "price" : 38.6,
              "timestamp" : "2019-08-25 19:11:35",
              "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
              "tags" : [
                "bootstrap",
                "dev"
              ]
            }
          },
          {
            "_index" : "book",
            "_type" : "_doc",
            "_id" : "2",
            "_score" : 1.0,
            "_source" : {
              "name" : "java编程思想",
              "description" : "java语言是世界第一编程语言,在软件开发领域使用人数最多。",
              "studymodel" : "201001",
              "price" : 68.6,
              "timestamp" : "2019-08-25 19:11:35",
              "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
              "tags" : [
                "java",
                "dev"
              ]
            }
          },
          {
            "_index" : "book",
            "_type" : "_doc",
            "_id" : "3",
            "_score" : 1.0,
            "_source" : {
              "name" : "spring开发基础",
              "description" : "spring 在java领域非常流行,java程序员都在用。",
              "studymodel" : "201001",
              "price" : 88.6,
              "timestamp" : "2019-08-24 19:11:35",
              "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
              "tags" : [
                "spring",
                "java"
              ]
            }
          }
        ]
      }
    }
    

    解释

    took:耗费了几毫秒

    timed_out:是否超时,这里是没有

    _shards:到几个分片搜索,成功几个,跳过几个,失败几个。

    hits.total:查询结果的数量,3个document

    hits.max_score:score的含义,就是document对于一个search的相关度的匹配分数,越相关,就越匹配,分数也高

    hits.hits:包含了匹配搜索的document的所有详细数据

    传参

    与http请求传参类似

    GET /book/_search?q=name:java&sort=price:desc
    

    类比sql: select * from book where name like ’ %java%’ order by price desc

    {
      "took" : 2,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [
          {
            "_index" : "book",
            "_type" : "_doc",
            "_id" : "2",
            "_score" : null,
            "_source" : {
              "name" : "java编程思想",
              "description" : "java语言是世界第一编程语言,在软件开发领域使用人数最多。",
              "studymodel" : "201001",
              "price" : 68.6,
              "timestamp" : "2019-08-25 19:11:35",
              "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
              "tags" : [
                "java",
                "dev"
              ]
            },
            "sort" : [
              68.6
            ]
          }
        ]
      }
    }
    

    timeout

    timeout参数:是超时时长定义。代表每个节点上的每个shard执行搜索时最多耗时多久。不会影响响应的正常返回。只会影响返回响应中的数据数量。

    如:索引a中,有10亿数据。存储在5个shard中,假设每个shard中2亿数据,执行全数据搜索的时候,需要耗时1000毫秒。定义timeout为10毫秒,代表的是shard执行10毫秒,搜索出多少数据,直接返回。

    GET /book/_search?timeout=10ms

    全局设置:配置文件中设置 search.default_search_timeout:100ms。默认不超时。

    {
      "took": 144, #请求耗时多少毫秒
      "timed_out": false, #是否超时。默认情况下没有超时机制,也就是客户端等待Elasticsearch搜索结束(无论执行多久),提供超时机制的话,Elasticsearch则在指定时长内处理搜索,在指定时长结束的时候,将搜索的结果直接返回(无论是否搜索结束)。指定超时的方式是传递参数,参数单位是:毫秒-ms。秒-s。分钟-m。
      "_shards": {
        "total": 1, #请求发送到多少个shard上
        "successful": 1,#成功返回搜索结果的shard
        "skipped": 0, #停止服务的shard
        "failed": 0 #失败的shard
      },
      "hits": {
        "total": 1, #返回了多少结果
        "max_score": 1, #搜索结果中,最大的相关度分数,相关度越大分数越高,_score越大,排位越靠前。
        "hits": [ #搜索到的结果集合,默认查询前10条数据。
          {
            "_index": "test_index", #数据所在索引
            "_type": "test_type", #数据所在类型
            "_id": "1", #数据的id
            "_score": 1, #数据的搜索相关度分数
            "_source": { # 数据的具体内容。
              "field": "value"
            }
          }
        ]
      }
    }
    

    1.2 multi-index 多索引搜索

    multi-index搜索模式

    所谓的multi-index就是从多个index中搜索数据。相对使用较少,只有在复合数据搜索的时候,可能出现。一般来说,如果真使用复合数据搜索,都会使用_all。

    /_search:所有索引下的所有数据都搜索出来
    /index1/_search:指定一个index,搜索其下所有的数据
    /index1,index2/_search:同时搜索两个index下的数据
    /index*/_search:按照通配符去匹配多个索引
    

    应用场景:生产环境log索引可以按照日期分开。

    log_to_es_20190910

    log_to_es_20190911

    log_to_es_20180910

    1.3 分页搜索

    分页搜索的语法

    默认情况下,Elasticsearch搜索返回结果是10条数据。从第0条开始查询。

    GET /book/_search?size=10
    GET /book/_search?size=10&from=0
    GET /book/_search?size=10&from=20
    GET /book_search?from=0&size=3
    

    +/-搜索

    GET 索引名/_search?q=字段名:条件
    GET 索引名/_search?q=+字段名:条件
    GET 索引名/_search?q=-字段名:条件
    

    + :和不定义符号含义一样,就是搜索指定的字段中包含key words的数据

    - : 与+符号含义相反,就是搜索指定的字段中不包含key words的数据

    deep paging

    什么是deep paging

    根据相关度评分倒排序,所以分页过深,协调节点会将大量数据聚合分析。

    deep paging 性能问题

    1. 消耗网络带宽,因为所搜过深的话,各 shard 要把数据传递给 coordinate node,这个过程是有大量数据传递的,消耗网络。

    2. 消耗内存,各 shard 要把数据传送给 coordinate node,这个传递回来的数据,是被 coordinate node 保存在内存中的,这样会大量消耗内存。

    3. 消耗cup,coordinate node 要把传回来的数据进行排序,这个排序过程很消耗cpu。
      所以:鉴于deep paging的性能问题,所有应尽量减少使用。

    1.4 query string基础语法

    query string基础语法

    GET /book/_search?q=name:java

    GET /book/_search?q=+name:java

    GET /book/_search?q=-name:java

    _all metadata的原理和作用

    GET /book/_search?q=java
    

    直接可以搜索所有的field,任意一个field包含指定的关键字就可以搜索出来。我们在进行中搜索的时候,难道是对document中的每一个field都进行一次搜索吗?不是的。

    es中_all元数据。建立索引的时候,插入一条docunment,es会将所有的field值经行全量分词,把这些分词,放到_all field中。在搜索的时候,没有指定field,就在_all搜索。

    举例

    {
        name:jack
        email:123@qq.com
        address:beijing
    }
    

    _all : jack,123@qq.com,beijing 作为这一条document的_all field的值,同时进行分词后建立对应的倒排索引

    1.5 query DSL入门

    DSL

    DSL - Domain Specified Language , 特殊领域的语言。

    请求参数是请求体传递的。在Elasticsearch中,请求体的字符集默认为UTF-8。

    query string 后边的参数原来越多,搜索条件越来越复杂,不能满足需求。

    GET /book/_search?q=name:java&size=10&from=0&sort=price:desc
    

    DSL:Domain Specified Language,特定领域的语言

    es特有的搜索语言,可在请求体中携带搜索条件,功能强大。

    查询全部 GET /book/_search

    GET /book/_search
    {
      "query": { "match_all": {} }
    }
    

    排序 GET /book/_search?sort=price:desc

    GET /book/_search 
    {
        "query" : {
            "match" : {
                "name" : " java"
            }
        },
        "sort": [
            { "price": "desc" }
        ]
    }
    

    分页查询 GET /book/_search?size=10&from=0

    GET  /book/_search 
    {
      "query": { "match_all": {} },
      "from": 0,
      "size": 1
    }
    

    指定返回字段 GET /book/ _search? _source=name,studymodel

    GET /book/_search 
    {
      "query": { "match_all": {} },
      "_source": ["name", "studymodel"]
    }
    

    通过组合以上各种类型查询,实现复杂查询。

    Query DSL语法

    {
        QUERY_NAME: {
            ARGUMENT: VALUE,
            ARGUMENT: VALUE,...
        }
    }
    
    {
        QUERY_NAME: {
            FIELD_NAME: {
                ARGUMENT: VALUE,
                ARGUMENT: VALUE,...
            }
        }
    }
    
    GET /test_index/_search 
    {
      "query": {
        "match": {
          "test_field": "test"
        }
      }
    }
    

    组合多个搜索条件

    搜索需求:title必须包含elasticsearch,content可以包含elasticsearch也可以不包含,author_id必须不为111

    初始数据:

    POST /website/_doc/1
    {
              "title": "my hadoop article",
              "content": "hadoop is very bad",
              "author_id": 111
    }
    
    POST /website/_doc/2
    {
              "title": "my elasticsearch  article",
              "content": "es is very bad",
              "author_id": 112
    }
    POST /website/_doc/3
    {
              "title": "my elasticsearch article",
              "content": "es is very goods",
              "author_id": 111
    }
    

    搜索:

    GET /website/_doc/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "title": "elasticsearch"
              }
            }
          ],
          "should": [
            {
              "match": {
                "content": "elasticsearch"
              }
            }
          ],
          "must_not": [
            {
              "match": {
                "author_id": 111
              }
            }
          ]
        }
      }
    }
    

    返回:

    {
      "took" : 488,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 0.47000363,
        "hits" : [
          {
            "_index" : "website",
            "_type" : "_doc",
            "_id" : "2",
            "_score" : 0.47000363,
            "_source" : {
              "title" : "my elasticsearch  article",
              "content" : "es is very bad",
              "author_id" : 112
            }
          }
        ]
      }
    }
    

    更复杂的搜索需求:

    select * from test_index where name='tom' or (hired =true and (personality ='good' and rude != true ))

    GET /test_index/_search
    {
        "query": {
                "bool": {
                    "must": { "match":{ "name": "tom" }},
                    "should": [
                        { "match":{ "hired": true }},
                        { "bool": {
                            "must":{ "match": { "personality": "good" }},
                            "must_not": { "match": { "rude": true }}
                        }}
                    ],
                    "minimum_should_match": 1
                }
        }
    }
    

    1.6 full-text search 全文检索

    全文检索

    重新创建book索引

    PUT /book/
    {
      "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
      },
      "mappings": {
        "properties": {
          "name":{
            "type": "text",
            "analyzer": "ik_max_word",
            "search_analyzer": "ik_smart"
          },
          "description":{
            "type": "text",
            "analyzer": "ik_max_word",
            "search_analyzer": "ik_smart"
          },
          "studymodel":{
            "type": "keyword"
          },
          "price":{
            "type": "double"
          },
          "timestamp": {
             "type": "date",
             "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
          },
          "pic":{
            "type":"text",
            "index":false
          }
        }
      }
    }
    

    插入数据

    PUT /book/_doc/1
    {
    "name": "Bootstrap开发",
    "description": "Bootstrap是由Twitter推出的一个前台页面开发css框架,是一个非常流行的开发框架,此框架集成了多种页面效果。此开发框架包含了大量的CSS、JS程序代码,可以帮助开发者(尤其是不擅长css页面开发的程序人员)轻松的实现一个css,不受浏览器限制的精美界面css效果。",
    "studymodel": "201002",
    "price":38.6,
    "timestamp":"2019-08-25 19:11:35",
    "pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
    "tags": [ "bootstrap", "dev"]
    }
    
    PUT /book/_doc/2
    {
    "name": "java编程思想",
    "description": "java语言是世界第一编程语言,在软件开发领域使用人数最多。",
    "studymodel": "201001",
    "price":68.6,
    "timestamp":"2019-08-25 19:11:35",
    "pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
    "tags": [ "java", "dev"]
    }
    
    PUT /book/_doc/3
    {
    "name": "spring开发基础",
    "description": "spring 在java领域非常流行,java程序员都在用。",
    "studymodel": "201001",
    "price":88.6,
    "timestamp":"2019-08-24 19:11:35",
    "pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
    "tags": [ "spring", "java"]
    }
    

    搜索

    GET  /book/_search 
    {
        "query" : {
            "match" : {
                "description" : "java程序员"
            }
        }
    }
    

    1.7 评分机制 TF\IDF

    算法介绍

    relevance score算法,简单来说,就是计算出,一个索引中的文本,与搜索文本,他们之间的关联匹配程度。

    Elasticsearch使用的是 term frequency/inverse document frequency算法,简称为TF/IDF算法。TF词频(Term Frequency),IDF逆向文件频率(Inverse Document Frequency)

    Term frequency:搜索文本中的各个词条在field文本中出现了多少次,出现次数越多,就越相关。

    举例:搜索请求:hello world

    doc1 : hello you and me,and world is very good.

    doc2 : hello,how are you

    Inverse document frequency:搜索文本中的各个词条在整个索引的所有文档中出现了多少次,出现的次数越多,就越不相关.

    举例:搜索请求:hello world

    doc1 : hello ,today is very good

    doc2 : hi world ,how are you

    整个index中1亿条数据。hello的document 1000个,有world的document 有100个。

    doc2 更相关

    Field-length norm:field长度,field越长,相关度越弱

    举例:搜索请求:hello world

    doc1 : {"title":"hello article","content ":"balabalabal 1万个"}

    doc2 : {"title":"my article","content ":"balabalabal 1万个,world"}

    _score是如何被计算出来的

    GET /book/_search?explain=true
    {
      "query": {
        "match": {
          "description": "java程序员"
        }
      }
    }
    

    返回

    {
      "took" : 5,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 2,
          "relation" : "eq"
        },
        "max_score" : 2.137549,
        "hits" : [
          {
            "_shard" : "[book][0]",
            "_node" : "MDA45-r6SUGJ0ZyqyhTINA",
            "_index" : "book",
            "_type" : "_doc",
            "_id" : "3",
            "_score" : 2.137549,
            "_source" : {
              "name" : "spring开发基础",
              "description" : "spring 在java领域非常流行,java程序员都在用。",
              "studymodel" : "201001",
              "price" : 88.6,
              "timestamp" : "2019-08-24 19:11:35",
              "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
              "tags" : [
                "spring",
                "java"
              ]
            },
            "_explanation" : {
              "value" : 2.137549,
              "description" : "sum of:",
              "details" : [
                {
                  "value" : 0.7936629,
                  "description" : "weight(description:java in 0) [PerFieldSimilarity], result of:",
                  "details" : [
                    {
                      "value" : 0.7936629,
                      "description" : "score(freq=2.0), product of:",
                      "details" : [
                        {
                          "value" : 2.2,
                          "description" : "boost",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.47000363,
                          "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                          "details" : [
                            {
                              "value" : 2,
                              "description" : "n, number of documents containing term",
                              "details" : [ ]
                            },
                            {
                              "value" : 3,
                              "description" : "N, total number of documents with field",
                              "details" : [ ]
                            }
                          ]
                        },
                        {
                          "value" : 0.7675597,
                          "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                          "details" : [
                            {
                              "value" : 2.0,
                              "description" : "freq, occurrences of term within document",
                              "details" : [ ]
                            },
                            {
                              "value" : 1.2,
                              "description" : "k1, term saturation parameter",
                              "details" : [ ]
                            },
                            {
                              "value" : 0.75,
                              "description" : "b, length normalization parameter",
                              "details" : [ ]
                            },
                            {
                              "value" : 12.0,
                              "description" : "dl, length of field",
                              "details" : [ ]
                            },
                            {
                              "value" : 35.333332,
                              "description" : "avgdl, average length of field",
                              "details" : [ ]
                            }
                          ]
                        }
                      ]
                    }
                  ]
                },
                {
                  "value" : 1.3438859,
                  "description" : "weight(description:程序员 in 0) [PerFieldSimilarity], result of:",
                  "details" : [
                    {
                      "value" : 1.3438859,
                      "description" : "score(freq=1.0), product of:",
                      "details" : [
                        {
                          "value" : 2.2,
                          "description" : "boost",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.98082924,
                          "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                          "details" : [
                            {
                              "value" : 1,
                              "description" : "n, number of documents containing term",
                              "details" : [ ]
                            },
                            {
                              "value" : 3,
                              "description" : "N, total number of documents with field",
                              "details" : [ ]
                            }
                          ]
                        },
                        {
                          "value" : 0.6227967,
                          "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                          "details" : [
                            {
                              "value" : 1.0,
                              "description" : "freq, occurrences of term within document",
                              "details" : [ ]
                            },
                            {
                              "value" : 1.2,
                              "description" : "k1, term saturation parameter",
                              "details" : [ ]
                            },
                            {
                              "value" : 0.75,
                              "description" : "b, length normalization parameter",
                              "details" : [ ]
                            },
                            {
                              "value" : 12.0,
                              "description" : "dl, length of field",
                              "details" : [ ]
                            },
                            {
                              "value" : 35.333332,
                              "description" : "avgdl, average length of field",
                              "details" : [ ]
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          },
          {
            "_shard" : "[book][0]",
            "_node" : "MDA45-r6SUGJ0ZyqyhTINA",
            "_index" : "book",
            "_type" : "_doc",
            "_id" : "2",
            "_score" : 0.57961315,
            "_source" : {
              "name" : "java编程思想",
              "description" : "java语言是世界第一编程语言,在软件开发领域使用人数最多。",
              "studymodel" : "201001",
              "price" : 68.6,
              "timestamp" : "2019-08-25 19:11:35",
              "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
              "tags" : [
                "java",
                "dev"
              ]
            },
            "_explanation" : {
              "value" : 0.57961315,
              "description" : "sum of:",
              "details" : [
                {
                  "value" : 0.57961315,
                  "description" : "weight(description:java in 0) [PerFieldSimilarity], result of:",
                  "details" : [
                    {
                      "value" : 0.57961315,
                      "description" : "score(freq=1.0), product of:",
                      "details" : [
                        {
                          "value" : 2.2,
                          "description" : "boost",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.47000363,
                          "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                          "details" : [
                            {
                              "value" : 2,
                              "description" : "n, number of documents containing term",
                              "details" : [ ]
                            },
                            {
                              "value" : 3,
                              "description" : "N, total number of documents with field",
                              "details" : [ ]
                            }
                          ]
                        },
                        {
                          "value" : 0.56055,
                          "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                          "details" : [
                            {
                              "value" : 1.0,
                              "description" : "freq, occurrences of term within document",
                              "details" : [ ]
                            },
                            {
                              "value" : 1.2,
                              "description" : "k1, term saturation parameter",
                              "details" : [ ]
                            },
                            {
                              "value" : 0.75,
                              "description" : "b, length normalization parameter",
                              "details" : [ ]
                            },
                            {
                              "value" : 19.0,
                              "description" : "dl, length of field",
                              "details" : [ ]
                            },
                            {
                              "value" : 35.333332,
                              "description" : "avgdl, average length of field",
                              "details" : [ ]
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          }
        ]
      }
    }
    

    分析一个document是如何被匹配上的

    GET /book/_explain/3
    {
      "query": {
        "match": {
          "description": "java程序员"
        }
      }
    }
    

    1.8 Doc value

    搜索的时候,要依靠倒排索引;排序的时候,需要依靠正排索引,看到每个document的每个field,然后进行排序,所谓的正排索引,其实就是doc values

    在建立索引的时候,一方面会建立倒排索引,以供搜索用;一方面会建立正排索引,也就是doc values,以供排序,聚合,过滤等操作使用

    doc values是被保存在磁盘上的,此时如果内存足够,os会自动将其缓存在内存中,性能还是会很高;如果内存不足够,os会将其写入磁盘上

    倒排索引

    doc1: hello world you and me

    doc2: hi, world, how are you

    term doc1 doc2
    hello *
    world * *
    you * *
    and *
    me *
    hi *
    how *
    are *

    搜索时:

    hello you --> hello, you

    hello --> doc1

    you --> doc1,doc2

    doc1: hello world you and me

    doc2: hi, world, how are you

    sort by 出现问题

    正排索引

    doc1: { "name": "jack", "age": 27 }

    doc2: { "name": "tom", "age": 30 }

    document name age
    doc1 jack 27
    doc2 tom 30

    1.9 fetch phase

    fetch phbase工作流程

    • coordinate node构建完priority queue之后,就发送mget请求去所有shard上获取对应的document

    • 各个shard将document返回给coordinate node

    • coordinate node将合并后的document结果返回给client客户端

    一般搜索,如果不加from和size,就默认搜索前10条,按照_score排序

    短语检索。要求查询条件必须和具体数据完全匹配才算搜索结果。其特征是:1-搜索条件不做任何分词解析;2-在搜索字段对应的倒排索引(正排索引)中进行精确匹配,不再是简单的全文检索。

    GET 索引名/_search
    {
      "query": {
        "match_phrase": {
          "字段名": "搜索条件"
        }
      }
    }
    

    1.10 搜索参数小总结

    preference

    决定了哪些shard会被用来执行搜索操作

    _primary, _primary_first, _local, _only_node:xyz, _prefer_node:xyz, _shards:2,3

    bouncing results问题,两个document排序,field值相同;不同的shard上,可能排序不同;每次请求轮询打到不同的replica shard上;每次页面上看到的搜索结果的排序都不一样。这就是bouncing result,也就是跳跃的结果。

    搜索的时候,是轮询将搜索请求发送到每一个replica shard(primary shard),但是在不同的shard上,可能document的排序不同

    解决方案就是将preference设置为一个字符串,比如说user_id,让每个user每次搜索的时候,都使用同一个replica shard去执行,就不会看到bouncing results了

    timeout

    主要就是限定在一定时间内,将部分获取到的数据直接返回,避免查询耗时过长

    routing

    document文档路由,_id路由,routing=user_id,这样的话可以让同一个user对应的数据到一个shard上去

    search_type

    default:query_then_fetch

    dfs_query_then_fetch,可以提升revelance sort精准度

    相关文章

      网友评论

          本文标题:elasticsearch之七search搜索详解

          本文链接:https://www.haomeiwen.com/subject/xpfifhtx.html