美文网首页
4.11-Term&PhraseSuggester

4.11-Term&PhraseSuggester

作者: 落日彼岸 | 来源:发表于2020-04-04 22:28 被阅读0次

什么是搜索建议

image.png
  • 现代的搜索引擎,⼀般都会提供 Suggest as you type 的功能

  • 帮助⽤户在输⼊搜索的过程中,进⾏⾃动补全 或者纠错。通过协助⽤户输⼊更加精准的关键 词,提⾼后续搜索阶段⽂档匹配的程度

  • 在 google 上搜索,⼀开始会⾃动补全。当输⼊ 到⼀定⻓度,如因为单词拼写错误⽆法补全, 就会开始提示相似的词或者句⼦

Elasticsearch Suggester API

  • 搜索引擎中类似的功能,在 Elasticsearch 中是通过 Suggester API 实现的

  • 原理:将输⼊的⽂本分解为 Token,然后在索引的字典⾥查找相似的 Term 并返回

  • 根据不同的使⽤场景,Elasticsearch 设计了 4 种类别的 Suggesters

    • Term & Phrase Suggester

    • Complete & Context Suggester

Term Suggester

  • Suggester 就是⼀种特殊类型的搜索。”text” ⾥是
    调⽤时候提供的⽂本,通常来⾃于⽤户界⾯上⽤户 输⼊的内容

  • ⽤户输⼊的 “lucen” 是⼀个错误的拼写

  • 会到 指定的字段 “body” 上搜索,当⽆法搜索到结 果时 (missing),返回建议的词

POST /articles/_search
{
  "size": 1,
  "query": {
    "match": {
      "body": "lucen rock"
    }
  },
  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "missing",
        "field": "body"
      }
    }
  }
}

⼀些测试数据

  • 默认使⽤ standard 分词器

  • ⼤写转⼩写

  • rocks 和 rock 是两个词

POST articles/_bulk
{ "index" : { } }
{ "body": "lucene is very cool"}
{ "index" : { } }
{ "body": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "body": "Elasticsearch rocks"}
{ "index" : { } }
{ "body": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "body": "Elk stack rocks"}
{ "index" : {} }
{  "body": "elasticsearch is rock solid"}

Term Suggester – Missing Mode

  • 搜索 “lucen rock”:

    • 每个建议都包含了⼀个算分,相似性是通过 Levenshtein Edit Distance 的算法实现的。核⼼思想就是⼀个词改动 多少字符就可以和另外⼀个词⼀致。 提供了很多可选参数 来控制相似性的模糊程度。例如 “max_edits”
  • ⼏种 Suggestion Mode

    • Missing – 如索引中已经存在,就不提供建议

    • Popular – 推荐出现频率更加⾼的词

    • Always – ⽆论是否存在,都提供建议

image.png

Term Suggester – Popular Mode

POST /articles/_search
{

  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "popular",
        "field": "body"
      }
    }
  }
}
image.png

Sorting by Frequency & Prefix Length

  • 默认按照 score 排序,也可以按照 “frequency”

  • 默认⾸字⺟不⼀致就不会匹配推荐,但
    是如果将 prefix_length 设置为 0,就 会为 hock 建议 rock

POST /articles/_search
{

  "suggest": {
    "term-suggestion": {
      "text": "lucen hocks",
      "term": {
        "suggest_mode": "always",
        "field": "body",
        //"prefix_length":0,//必须匹配的最小前缀字符数才能成为建议的候选者。默认值为1。增加此数字可提高拼写检查性能。通常,拼写错误不会出现在terms开始时。
        "sort": "frequency"
      }
    }
  }
}

Phrase Suggester

  • Phrase Suggester 在 Term Suggester 上增加了⼀些额外的逻辑

  • ⼀些参数

    • Suggest Mode :missing, popular, always

    • Max Errors:最多可以拼错的 Terms 数

    • Confidence:限制返回结果数,默认为 1


POST /articles/_search
{
  "suggest": {
    "my-suggestion": {
      "text": "lucne and elasticsear rock hello world ",
      "phrase": {
        "field": "body",
        "max_errors":2,
        "confidence":0,
        "direct_generator":[{
          "field":"body",
          "suggest_mode":"always"
        }],
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}

本节知识点回顾

  • Term Suggester 和 Phrase Suggester 分别有三种不同类型的 Suggestion Mode

  • Missing / Popular / Always

  • 通过使⽤ Suggestion Phrase 可以提⾼搜索的 Precision 和 Recall

课程Demo

DELETE articles
PUT articles
{
  "mappings": {
    "properties": {
      "title_completion":{
        "type": "completion"
      }
    }
  }
}

POST articles/_bulk
{ "index" : { } }
{ "title_completion": "lucene is very cool"}
{ "index" : { } }
{ "title_completion": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title_completion": "Elasticsearch rocks"}
{ "index" : { } }
{ "title_completion": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title_completion": "Elk stack rocks"}
{ "index" : {} }


POST articles/_search?pretty
{
  "size": 0,
  "suggest": {
    "article-suggester": {
      "prefix": "elk ",
      "completion": {
        "field": "title_completion"
      }
    }
  }
}

DELETE articles

POST articles/_bulk
{ "index" : { } }
{ "body": "lucene is very cool"}
{ "index" : { } }
{ "body": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "body": "Elasticsearch rocks"}
{ "index" : { } }
{ "body": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "body": "Elk stack rocks"}
{ "index" : {} }
{  "body": "elasticsearch is rock solid"}


POST _analyze
{
  "analyzer": "standard",
  "text": ["Elk stack  rocks rock"]
}

POST /articles/_search
{
  "size": 1,
  "query": {
    "match": {
      "body": "lucen rock"
    }
  },
  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "missing",
        "field": "body"
      }
    }
  }
}


POST /articles/_search
{

  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "popular",
        "field": "body"
      }
    }
  }
}


POST /articles/_search
{

  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "always",
        "field": "body",
      }
    }
  }
}


POST /articles/_search
{

  "suggest": {
    "term-suggestion": {
      "text": "lucen hocks",
      "term": {
        "suggest_mode": "always",
        "field": "body",
        //"prefix_length":0,//必须匹配的最小前缀字符数才能成为建议的候选者。默认值为1。增加此数字可提高拼写检查性能。通常,拼写错误不会出现在terms开始时。
        "sort": "frequency"
      }
    }
  }
}


POST /articles/_search
{
  "suggest": {
    "my-suggestion": {
      "text": "lucne and elasticsear rock hello world ",
      "phrase": {
        "field": "body",
        "max_errors":2,
        "confidence":0,
        "direct_generator":[{
          "field":"body",
          "suggest_mode":"always"
        }],
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}

相关阅读

相关文章

  • 4.11-Term&PhraseSuggester

    什么是搜索建议 现代的搜索引擎,⼀般都会提供 Suggest as you type 的功能 帮助⽤户在输⼊搜索的...

网友评论

      本文标题:4.11-Term&PhraseSuggester

      本文链接:https://www.haomeiwen.com/subject/kaohphtx.html