26、ElasticSearch 7.x 前缀搜索，通配符搜索，

作者: 众神开挂 | 来源:发表于2020-04-10 11:16 被阅读0次

26、ElasticSearch 7.x 前缀搜索，通配符搜索，
二十二、Elasticsearch前缀匹配、通配符搜索和正则搜索
二十三、elasticSearch前缀搜索/通配符/ngram分
ElasticSearch 拼音和中文搜索
十二、SQL 通配符
sql通配符
Elasticsearch基本搜索
elasticsearch搜索
Elasticsearch 搜索
ElasticSearch搜索

主要的内容 : 前缀搜索，通配符搜索，正则搜索，以及两种推荐搜索方式的实现

1、前缀搜索

1.1、实战

不用帖子的案例背景，因为比较简单，直接用自己手动建的新索引

PUT my_index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "keyword"
      }
    }
  }
}

插入测试数据!

POST /_bulk
{"create":{"_index":"my_index"}}
{"title":"C3D0-KD345"}
{"create":{"_index":"my_index"}}
{"title":"C3K5-DFG65"}
{"create":{"_index":"my_index"}}
{"title":"C4I8-UI365"}

C3 --> 上面这两个都搜索出来 --> 根据字符串的前缀去搜索

GET my_index/_search
{
  "query": {
    "prefix": {
      "title": {
        "value": "C3"
      }
    }
  }
}

1.2、前缀搜索的原理

prefix query不计算relevance score，与prefix filter唯一的区别就是，filter会cache bitset

前缀越短，要处理的doc越多，性能越差，尽可能用长前缀搜索

prefix query会扫描整个倒排索引，match性能要高于prefix query

2、通配符搜索

跟前缀搜索类似，功能更加强大，使用通配符去表达更加复杂的模糊搜索的语义

5字符-D任意个字符5 可以表达为5?-*5

?：任意字符
*：0个或任意多个字符

GET my_index/_search
{
  "query": {
    "wildcard": {
      "title": {
        "value": "C?K*5"
      }
    }
  }
}

性能一样差，必须扫描整个倒排索引

3、正则搜索

使用普通的正则表达式

[0-9]：指定范围内的数字
[a-z]：指定范围内的字母
.：一个字符
+：前面的正则表达式可以出现一次或多次

GET /my_index/_search
{
  "query": {
    "regexp": {
      "title": "C[0-9].+"
    }
  }
}

wildcard和regexp，与prefix原理一致，都会扫描整个索引，性能很差

主要是给大家介绍一些高级的搜索语法。在实际应用中，能不用尽量别用。性能太差了。

4、match_phrase_prefix实现search-time搜索（搜索推荐）

搜索推荐，search as you type，搜索提示，解释一下什么意思

DELETE my_index
## 新建text映射
PUT my_index
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text"   
      }
    }
  }
}
## 插入数据
PUT /_bulk
{"create":{"_index":"my_index"}}
{"content":"hello world"}
{"create":{"_index":"my_index"}}
{"content":"hello w"}
{"create":{"_index":"my_index"}}
{"content":"hello dog"}

##搜索推荐
GET /my_index/_search
{
  "query": {
    "match_phrase_prefix": {
      "content": {
        "query": "hello w",
        "max_expansions": 3,
        "slop": 2
      }
    }
  }
}

参考文档：

Match phrase prefix query | Elasticsearch Reference [7.6] | Elastic https://www.elastic.co/guide/en/elasticsearch/reference/7.6/query-dsl-match-query-phrase-prefix.html

原理跟match_phrase类似，唯一的区别，就是把最后一个term作为前缀去搜索

hello就是去进行match，搜索对应的doc
w，会作为前缀，去扫描整个倒排索引，找到所有w开头的doc
然后找到所有doc中，即包含hello，又包含w开头的字符的doc
根据你的slop去计算，看在slop范围内正好跟doc中的hello和w开头的单词的position相匹配的单词。

max_expansions：指定prefix最多匹配多少个term，超过这个数量就不继续匹配了，限定性能

5、ngram和index-time搜索推荐原理

5.1 、edge ngram

quick，5种长度下的ngram

gram length=1，q u i c k
gram length=2，qu ui ic ck
gram length=3，qui uic ick
gram length=4，quic uick
gram length=5，quick

什么是edge ngram

quick，anchor首字母后进行ngram

q
qu
qui
quic
quick

使用edge ngram将每个单词都进行进一步的分词切分，用切分后的ngram来实现前缀搜索推荐功能

搜索的时候，不用再根据一个前缀，然后扫描整个倒排索引了; 简单的拿前缀去倒排索引中匹配即可，如果匹配上了，那么就好了; 类似于match

5.2、实验 ngram

建立分词器

PUT /my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,    
          "max_gram": 20   ##设置ngram长度的范围
        }
      },
      "analyzer": {
        "autocomplete": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  }
}

查看分词器分析,看是否成功建立分词器

GET /my_index/_analyze
{
  "analyzer": "autocomplete",
  "text": "quick brown"
}

建立映射并插入数据


PUT /my_index/_mapping
{
  "properties": {
    "content": {
      "type": "text",
      "analyzer": "autocomplete",
      "search_analyzer": "standard"
    }
  }
}
##插入数据
PUT /_bulk
{"create":{"_index":"my_index"}}
{"content":"hello world"}
{"create":{"_index":"my_index"}}
{"content":"hello w"}
{"create":{"_index":"my_index"}}
{"content":"hello dog"}

查询

GET /my_index/_search 
{
  "query": {
    "match_phrase": {   ##不推荐使用match
      "content": "hello w"
    }
  }
}

如果用match，只有hello的也会出来，全文检索，只是分数比较低
推荐使用match_phrase，要求每个term都有，而且position刚好靠着1位，符合我们的期望的

主要的内容：前缀搜索，通配符搜索，正则搜索，以及两种推荐搜索方式的实现

网友评论

ElasticSearch实战笔记

本文标题：26、ElasticSearch 7.x 前缀搜索，通配符搜索，

本文链接：https://www.haomeiwen.com/subject/iuvnuhtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！