美文网首页
Term & Phrase Suggester -- 搜索关键词

Term & Phrase Suggester -- 搜索关键词

作者: 滴流乱转的小胖子 | 来源:发表于2020-08-07 06:26 被阅读0次

一、什么是搜索建议

image.png
  • 现代的搜索引擎,一般都会提供Suggest as you type的功能
  • 帮助用户在输入搜索的工程中,进行自动补全或者纠错。通过协助用户输入更加精准的关键词,提高后续搜索阶段文档匹配的程度
  • 在google上搜索,一开始会自动补全。 当输入到一定长度,如因为单词拼写错误无法补全,就会开始提示相似的词或者句子。

二、Elasticsearch Suggester API

  • 搜索引擎中类似的功能,在Elasticsearch中通过Suggester API 实现的
  • 原理:将输入的文本分解为Token,然后在索引的字典里查找相似的Term并返回
  • 根据不同的使用场景,Elasticsearch 设计了4中类型的Suggesters
    Term & Phrase Suggester
    Complete & Context Suggester

三、Term Suggester

  • Suggester就是一种特殊类型的搜索,”text“里是调用时候提供的文本,通常来自于用户界面上用户输入的内容
  • 用户输入的”lucen“是一个错误的拼写
  • 会到指定的字段”body“上搜索,当无法搜索到结果时(missing),返回建议的词

默认使用standard分词器,大写转写,rocks和rock是两个词

DELETE articles
PUT articles
{
  "mappings": {
    "properties": {
      "title_completion":{
        "type": "completion"
      }
    }
  }
}

POST articles/_bulk
{ "index" : { } }
{ "title_completion": "lucene is very cool"}
{ "index" : { } }
{ "title_completion": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title_completion": "Elasticsearch rocks"}
{ "index" : { } }
{ "title_completion": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title_completion": "Elk stack rocks"}
{ "index" : {} }


POST articles/_search?pretty
{
  "size": 0,
  "suggest": {
    "article-suggester": {
      "prefix": "elk ",
      "completion": {
        "field": "title_completion"
      }
    }
  }
}

DELETE articles

POST articles/_bulk
{ "index" : { } }
{ "body": "lucene is very cool"}
{ "index" : { } }
{ "body": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "body": "Elasticsearch rocks"}
{ "index" : { } }
{ "body": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "body": "Elk stack rocks"}
{ "index" : {} }
{  "body": "elasticsearch is rock solid"}


POST _analyze
{
  "analyzer": "standard",
  "text": ["Elk stack  rocks rock"]
}

POST /articles/_search
{
  "size": 1,
  "query": {
    "match": {
      "body": "lucen rock"
    }
  },
  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "missing",
        "field": "body"
      }
    }
  }
}

搜索”lecen rock“

  • 每个建议都包含一个算分,相似性是通过Levenshtein Edit Distance的算法实现的。核心思想就是一个改动多少字符就可以和另外一个词一致。提供了很多可选参数来控制相似性的模糊程度。例如”max_edits“
  • 几种Suggestion Mode
    Missing --- 如索引中已经存在,就不提供建议
    popular -- 推荐出现频率更加高的词
    Always -- 无论是否存在,都提供建议

四、Term Suggester -- Popular Mode

POST /articles/_search
{

  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "popular",
        "field": "body"
      }
    }
  }
}
image.png

五、Term Suggester -- Always Mode

POST /articles/_search
{

  "suggest": {
    "term-suggestion": {
      "text": "lucen rock",
      "term": {
        "suggest_mode": "always",
        "field": "body",
      }
    }
  }
}

六、Sorting by Frequency & Prefix Length

POST /articles/_search
{

  "suggest": {
    "term-suggestion": {
      "text": "lucen hocks",
      "term": {
        "suggest_mode": "always",
        "field": "body",
        "prefix_length":0,
        "sort": "frequency"
      }
    }
  }
}
  • 默认按照score排序,也可以按照”frequency“
  • 默认首字母不一致就不会匹配推荐,但是如果将prefix_length设置为0,就会为hock建议rock

七、Phrase Suggester

  • Phrase Suggester 在Term Suggester上增加了一些额外的逻辑
  • 一些参数
    Suggest Mode :missing, popular, always
    Max Errors:最多可以拼错的 Terms 数
    Confidence:限制返回结果数,默认为 1
POST /articles/_search
{
  "suggest": {
    "my-suggestion": {
      "text": "lucne and elasticsear rock hello world ",
      "phrase": {
        "field": "body",
        "max_errors":2,
        "confidence":0,
        "direct_generator":[{
          "field":"body",
          "suggest_mode":"always"
        }],
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}

相关文章

网友评论

      本文标题:Term & Phrase Suggester -- 搜索关键词

      本文链接:https://www.haomeiwen.com/subject/cceirktx.html