什么是搜索建议
![](https://img.haomeiwen.com/i11385145/55a089ffe3e91596.png)
-
现代的搜索引擎,⼀般都会提供 Suggest as you type 的功能
-
帮助⽤户在输⼊搜索的过程中,进⾏⾃动补全 或者纠错。通过协助⽤户输⼊更加精准的关键 词,提⾼后续搜索阶段⽂档匹配的程度
-
在 google 上搜索,⼀开始会⾃动补全。当输⼊ 到⼀定⻓度,如因为单词拼写错误⽆法补全, 就会开始提示相似的词或者句⼦
Elasticsearch Suggester API
-
搜索引擎中类似的功能,在 Elasticsearch 中是通过 Suggester API 实现的
-
原理:将输⼊的⽂本分解为 Token,然后在索引的字典⾥查找相似的 Term 并返回
-
根据不同的使⽤场景,Elasticsearch 设计了 4 种类别的 Suggesters
-
Term & Phrase Suggester
-
Complete & Context Suggester
-
Term Suggester
-
Suggester 就是⼀种特殊类型的搜索。”text” ⾥是
调⽤时候提供的⽂本,通常来⾃于⽤户界⾯上⽤户 输⼊的内容 -
⽤户输⼊的 “lucen” 是⼀个错误的拼写
-
会到 指定的字段 “body” 上搜索,当⽆法搜索到结 果时 (missing),返回建议的词
POST /articles/_search
{
"size": 1,
"query": {
"match": {
"body": "lucen rock"
}
},
"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "missing",
"field": "body"
}
}
}
}
⼀些测试数据
-
默认使⽤ standard 分词器
-
⼤写转⼩写
-
rocks 和 rock 是两个词
POST articles/_bulk
{ "index" : { } }
{ "body": "lucene is very cool"}
{ "index" : { } }
{ "body": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "body": "Elasticsearch rocks"}
{ "index" : { } }
{ "body": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "body": "Elk stack rocks"}
{ "index" : {} }
{ "body": "elasticsearch is rock solid"}
Term Suggester – Missing Mode
-
搜索 “lucen rock”:
- 每个建议都包含了⼀个算分,相似性是通过 Levenshtein Edit Distance 的算法实现的。核⼼思想就是⼀个词改动 多少字符就可以和另外⼀个词⼀致。 提供了很多可选参数 来控制相似性的模糊程度。例如 “max_edits”
-
⼏种 Suggestion Mode
-
Missing – 如索引中已经存在,就不提供建议
-
Popular – 推荐出现频率更加⾼的词
-
Always – ⽆论是否存在,都提供建议
-
![](https://img.haomeiwen.com/i11385145/b24e409bdc609a7e.png)
Term Suggester – Popular Mode
POST /articles/_search
{
"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "popular",
"field": "body"
}
}
}
}
![](https://img.haomeiwen.com/i11385145/7af60899894628ed.png)
Sorting by Frequency & Prefix Length
-
默认按照 score 排序,也可以按照 “frequency”
-
默认⾸字⺟不⼀致就不会匹配推荐,但
是如果将 prefix_length 设置为 0,就 会为 hock 建议 rock
POST /articles/_search
{
"suggest": {
"term-suggestion": {
"text": "lucen hocks",
"term": {
"suggest_mode": "always",
"field": "body",
//"prefix_length":0,//必须匹配的最小前缀字符数才能成为建议的候选者。默认值为1。增加此数字可提高拼写检查性能。通常,拼写错误不会出现在terms开始时。
"sort": "frequency"
}
}
}
}
Phrase Suggester
-
Phrase Suggester 在 Term Suggester 上增加了⼀些额外的逻辑
-
⼀些参数
-
Suggest Mode :missing, popular, always
-
Max Errors:最多可以拼错的 Terms 数
-
Confidence:限制返回结果数,默认为 1
-
POST /articles/_search
{
"suggest": {
"my-suggestion": {
"text": "lucne and elasticsear rock hello world ",
"phrase": {
"field": "body",
"max_errors":2,
"confidence":0,
"direct_generator":[{
"field":"body",
"suggest_mode":"always"
}],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
本节知识点回顾
-
Term Suggester 和 Phrase Suggester 分别有三种不同类型的 Suggestion Mode
-
Missing / Popular / Always
-
通过使⽤ Suggestion Phrase 可以提⾼搜索的 Precision 和 Recall
课程Demo
DELETE articles
PUT articles
{
"mappings": {
"properties": {
"title_completion":{
"type": "completion"
}
}
}
}
POST articles/_bulk
{ "index" : { } }
{ "title_completion": "lucene is very cool"}
{ "index" : { } }
{ "title_completion": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title_completion": "Elasticsearch rocks"}
{ "index" : { } }
{ "title_completion": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title_completion": "Elk stack rocks"}
{ "index" : {} }
POST articles/_search?pretty
{
"size": 0,
"suggest": {
"article-suggester": {
"prefix": "elk ",
"completion": {
"field": "title_completion"
}
}
}
}
DELETE articles
POST articles/_bulk
{ "index" : { } }
{ "body": "lucene is very cool"}
{ "index" : { } }
{ "body": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "body": "Elasticsearch rocks"}
{ "index" : { } }
{ "body": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "body": "Elk stack rocks"}
{ "index" : {} }
{ "body": "elasticsearch is rock solid"}
POST _analyze
{
"analyzer": "standard",
"text": ["Elk stack rocks rock"]
}
POST /articles/_search
{
"size": 1,
"query": {
"match": {
"body": "lucen rock"
}
},
"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "missing",
"field": "body"
}
}
}
}
POST /articles/_search
{
"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "popular",
"field": "body"
}
}
}
}
POST /articles/_search
{
"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "always",
"field": "body",
}
}
}
}
POST /articles/_search
{
"suggest": {
"term-suggestion": {
"text": "lucen hocks",
"term": {
"suggest_mode": "always",
"field": "body",
//"prefix_length":0,//必须匹配的最小前缀字符数才能成为建议的候选者。默认值为1。增加此数字可提高拼写检查性能。通常,拼写错误不会出现在terms开始时。
"sort": "frequency"
}
}
}
}
POST /articles/_search
{
"suggest": {
"my-suggestion": {
"text": "lucne and elasticsear rock hello world ",
"phrase": {
"field": "body",
"max_errors":2,
"confidence":0,
"direct_generator":[{
"field":"body",
"suggest_mode":"always"
}],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
相关阅读
网友评论