美文网首页
50、初识搜索引擎_如何将一个field索引两次来解决字符串排序

50、初识搜索引擎_如何将一个field索引两次来解决字符串排序

作者: 拉提娜的爸爸 | 来源:发表于2020-01-09 10:06 被阅读0次

    如果对一个string field进行排序,结果往往不准确,因为分词后是多个单词,再排序就不是我们想要的结果了。

    通常解决方案是,将一个string field建立两次索引,一个分词,用来进行搜索;一个不分词,用来进行排序。

    1、怎样将string field建立两次索引

    示例:给title建立两次索引

    PUT /website
    {
      "mappings": {
        "article": {
          "properties": {
            "title":{
              "type": "text", 
              "fields": {
                "raw":{
                  "type": "string",
                  "index": "not_analyzed"
                }
              },
              "fielddata": true
            },
            "content":{
              "type": "text"
            },
            "post_date":{
              "type": "date"
            },
            "author_id":{
              "type": "long"
            }
          }
        }
      }
    }
    

    2、测试建立的索引

    1.新增document数据

    PUT /website/article/1
    {
      "title": "first article",
      "content": "this is my first article",
      "post_date": "2017-01-01",
      "author_id": 110
    }
    ........省略其他两条
    -------------------------------结果-------------------------------
    {
            "_index": "website",
            "_type": "article",
            "_id": "2",
            "_score": 1,
            "_source": {
              "title": "second article",
              "content": "this is my second article",
              "post_date": "2017-02-01",
              "author_id": 110
            }
          },
          {
            "_index": "website",
            "_type": "article",
            "_id": "1",
            "_score": 1,
            "_source": {
              "title": "first article",
              "content": "this is my first article",
              "post_date": "2017-01-01",
              "author_id": 110
            }
          },
          {
            "_index": "website",
            "_type": "article",
            "_id": "3",
            "_score": 1,
            "_source": {
              "title": "third article",
              "content": "this is my third article",
              "post_date": "2017-03-01",
              "author_id": 110
            }
          }
    

    2.查询数据并根据title进行正序排序
    先进行一般的排序

    GET /website/article/_search
    {
      "query": {
        "match_all": {}
      },
      "sort": [
        {
          "title": {
            "order": "asc"
          }
        }
      ]
    }
    -----------------------------------------结果-----------------------------------------
    {
      "took": 28,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 3,
        "max_score": null,
        "hits": [
          {
            "_index": "website",
            "_type": "article",
            "_id": "2",
            "_score": null,
            "_source": {
              "title": "second article",
              "content": "this is my second article",
              "post_date": "2017-02-01",
              "author_id": 110
            },
            "sort": [
              "article"
            ]
          },
          {
            "_index": "website",
            "_type": "article",
            "_id": "1",
            "_score": null,
            "_source": {
              "title": "first article",
              "content": "this is my first article",
              "post_date": "2017-01-01",
              "author_id": 110
            },
            "sort": [
              "article"
            ]
          },
          {
            "_index": "website",
            "_type": "article",
            "_id": "3",
            "_score": null,
            "_source": {
              "title": "third article",
              "content": "this is my third article",
              "post_date": "2017-03-01",
              "author_id": 110
            },
            "sort": [
              "article"
            ]
          }
        ]
      }
    }
    

    排序结果发现,因为title被分词,排序sort根据的是article排序,结果并不稳定,因为分词器将值分开,每次排序结果并不稳定。
    使用第二次建立的索引进行排序

    GET /website/article/_search
    {
      "query": {
        "match_all": {}
      },
      "sort": [
        {
          "title.raw": {
            "order": "asc"
          }
        }
      ]
    }
    ---------------------------------结果---------------------------------
    {
      "took": 3,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 3,
        "max_score": null,
        "hits": [
          {
            "_index": "website",
            "_type": "article",
            "_id": "1",
            "_score": null,
            "_source": {
              "title": "first article",
              "content": "this is my first article",
              "post_date": "2017-01-01",
              "author_id": 110
            },
            "sort": [
              "first article"
            ]
          },
          {
            "_index": "website",
            "_type": "article",
            "_id": "2",
            "_score": null,
            "_source": {
              "title": "second article",
              "content": "this is my second article",
              "post_date": "2017-02-01",
              "author_id": 110
            },
            "sort": [
              "second article"
            ]
          },
          {
            "_index": "website",
            "_type": "article",
            "_id": "3",
            "_score": null,
            "_source": {
              "title": "third article",
              "content": "this is my third article",
              "post_date": "2017-03-01",
              "author_id": 110
            },
            "sort": [
              "third article"
            ]
          }
        ]
      }
    }
    

    查询结果显示sort排序是根据整个title的值进行排序的

    相关文章

      网友评论

          本文标题:50、初识搜索引擎_如何将一个field索引两次来解决字符串排序

          本文链接:https://www.haomeiwen.com/subject/vzaeactx.html