一、什么是搜索建议

- 现代的搜索引擎,一般都会提供Suggest as you type的功能
- 帮助用户在输入搜索的工程中,进行自动补全或者纠错。通过协助用户输入更加精准的关键词,提高后续搜索阶段文档匹配的程度
- 在google上搜索,一开始会自动补全。 当输入到一定长度,如因为单词拼写错误无法补全,就会开始提示相似的词或者句子。
二、Elasticsearch Suggester API
- 搜索引擎中类似的功能,在Elasticsearch中通过Suggester API 实现的
- 原理:将输入的文本分解为Token,然后在索引的字典里查找相似的Term并返回
- 根据不同的使用场景,Elasticsearch 设计了4中类型的Suggesters
Term & Phrase Suggester
Complete & Context Suggester
三、Term Suggester
- Suggester就是一种特殊类型的搜索,”text“里是调用时候提供的文本,通常来自于用户界面上用户输入的内容
- 用户输入的”lucen“是一个错误的拼写
- 会到指定的字段”body“上搜索,当无法搜索到结果时(missing),返回建议的词
默认使用standard分词器,大写转写,rocks和rock是两个词
DELETE articles
PUT articles
{
"mappings": {
"properties": {
"title_completion":{
"type": "completion"
}
}
}
}
POST articles/_bulk
{ "index" : { } }
{ "title_completion": "lucene is very cool"}
{ "index" : { } }
{ "title_completion": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title_completion": "Elasticsearch rocks"}
{ "index" : { } }
{ "title_completion": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title_completion": "Elk stack rocks"}
{ "index" : {} }
POST articles/_search?pretty
{
"size": 0,
"suggest": {
"article-suggester": {
"prefix": "elk ",
"completion": {
"field": "title_completion"
}
}
}
}
DELETE articles
POST articles/_bulk
{ "index" : { } }
{ "body": "lucene is very cool"}
{ "index" : { } }
{ "body": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "body": "Elasticsearch rocks"}
{ "index" : { } }
{ "body": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "body": "Elk stack rocks"}
{ "index" : {} }
{ "body": "elasticsearch is rock solid"}
POST _analyze
{
"analyzer": "standard",
"text": ["Elk stack rocks rock"]
}
POST /articles/_search
{
"size": 1,
"query": {
"match": {
"body": "lucen rock"
}
},
"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "missing",
"field": "body"
}
}
}
}
搜索”lecen rock“
- 每个建议都包含一个算分,相似性是通过Levenshtein Edit Distance的算法实现的。核心思想就是一个改动多少字符就可以和另外一个词一致。提供了很多可选参数来控制相似性的模糊程度。例如”max_edits“
- 几种Suggestion Mode
Missing --- 如索引中已经存在,就不提供建议
popular -- 推荐出现频率更加高的词
Always -- 无论是否存在,都提供建议
四、Term Suggester -- Popular Mode
POST /articles/_search
{
"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "popular",
"field": "body"
}
}
}
}

五、Term Suggester -- Always Mode
POST /articles/_search
{
"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "always",
"field": "body",
}
}
}
}
六、Sorting by Frequency & Prefix Length
POST /articles/_search
{
"suggest": {
"term-suggestion": {
"text": "lucen hocks",
"term": {
"suggest_mode": "always",
"field": "body",
"prefix_length":0,
"sort": "frequency"
}
}
}
}
- 默认按照score排序,也可以按照”frequency“
- 默认首字母不一致就不会匹配推荐,但是如果将prefix_length设置为0,就会为hock建议rock
七、Phrase Suggester
- Phrase Suggester 在Term Suggester上增加了一些额外的逻辑
- 一些参数
Suggest Mode :missing, popular, always
Max Errors:最多可以拼错的 Terms 数
Confidence:限制返回结果数,默认为 1
POST /articles/_search
{
"suggest": {
"my-suggestion": {
"text": "lucne and elasticsear rock hello world ",
"phrase": {
"field": "body",
"max_errors":2,
"confidence":0,
"direct_generator":[{
"field":"body",
"suggest_mode":"always"
}],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
网友评论