基于 Term 的查询
-
Term 的重要性
- Term 是表达语意的最⼩单位。搜索和利⽤统计语⾔模型进⾏⾃然语⾔处理都需要处理 Term
-
特点
-
Term Level Query: Term Query / Range Query / Exists Query / Prefix Query /Wildcard Query
-
在 ES 中,Term 查询,对输⼊不做分词。会将输⼊作为⼀个整体,在倒排索引中查找准确的词项,并且使⽤相关度算分公式为每个包含该词项的⽂档进⾏相关度算分 – 例如“Apple Store”
-
可以通过 Constant Score 将查询转换成⼀个 Filtering,避免算分,并利⽤缓存,提⾼性
-
关于 Term 查询的例子
POST /products/_bulk
{ "index": { "_id": 1 }}
{ "productID" : "XHDK-A-1293-#fJ3","desc":"iPhone" }
{ "index": { "_id": 2 }}
{ "productID" : "KDKE-B-9947-#kL5","desc":"iPad" }
{ "index": { "_id": 3 }}
{ "productID" : "JODL-X-1937-#pV7","desc":"MBP" }
-
几个查询的结果分别是什么?
-
如果搜不不到,为什么?
-
应该如何解决
GET /products
POST /products/_search
{
"query": {
"term": {
"desc": {
//"value": "iPhone" //查不到结果
"value":"iphone" //可以查到结果
}
}
}
}
POST /products/_search
{
"query": {
"term": {
"desc.keyword": {
"value": "iPhone" //可以查到结果
//"value":"iphone" //查不到结果
}
}
}
}
POST /products/_search
{
"query": {
"term": {
"productID": {
"value": "XHDK-A-1293-#fJ3" //查不到结果
//"value": "xhdk" //可以查到结果,根据分词分析
//"value": "xhdk-a-1293-#fJ3" //查不到结果
}
}
}
}
POST /products/_search
{
//"explain": true,
"query": {
"term": {
"productID.keyword": {
"value": "XHDK-A-1293-#fJ3"//可以查到结果
}
}
}
}
//查看分词结果
POST /_analyze
{
"analyzer": "standard",
"text": ["XHDK-A-1293-#fJ3"]
}
//res
{
"tokens" : [
{
"token" : "xhdk",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "a",
"start_offset" : 5,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "1293",
"start_offset" : 7,
"end_offset" : 11,
"type" : "<NUM>",
"position" : 2
},
{
"token" : "fj3",
"start_offset" : 13,
"end_offset" : 16,
"type" : "<ALPHANUM>",
"position" : 3
}
]
}
多字段 Mapping 和 Term查询
GET products/_mapping
//res
{
"products" : {
"mappings" : {
"properties" : {
"desc" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"productID" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
-
使用
keyword
关键字进行查询,严格匹配 -
term查询会返回算分结果
复合查询 – Constant Score 转为 Filter
-
将 Query 转成 Filter,忽略 TF-IDF 计算,避免相关性算分的开销
-
Filter 可以有效利⽤缓存
POST /products/_search
{
"explain": true,
"query": {
"constant_score": {
"filter": {
"term": {
"productID.keyword": "XHDK-A-1293-#fJ3"
}
}
}
}
}
//res
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_shard" : "[products][0]",
"_node" : "BsfHcVuGT8-7CROZ1odZUg",
"_index" : "products",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"productID" : "XHDK-A-1293-#fJ3",
"desc" : "iPhone"
},
"_explanation" : {
"value" : 1.0,
"description" : "ConstantScore(productID.keyword:XHDK-A-1293-#fJ3)",
"details" : [ ]
}
}
]
}
}
基于全⽂的查询
-
基于全⽂本的查找
- Match Query / Match Phrase Query / Query String Query
-
特点
-
索引和搜索时都会进⾏分词,查询字符串先传递到⼀个合适的分词器,然后⽣成⼀个供查询的词项列表
-
查询时候,先会对输⼊的查询进⾏分词,然后每个词项逐个进⾏底层的查询,最终将结果进⾏合并。并为每个⽂档⽣成⼀个算分。
- 例如查 “Matrix reloaded”,会查到包括Matrix 或者 reload的所有结果。
Match Query Result
POST /movies/_search
{
"profile": "true",
"query": {
"match": {
"title": {
"query": "Matrix reload" // or
}
}
}
}
//res
"hits" : [
{
"_index" : "movies",
"_type" : "_doc",
"_id" : "2571",
"_score" : 9.095142, //返回相关的算分结果
"_source" : {
"genre" : [
"Action",
"Sci-Fi",
"Thriller"
],
"title" : "Matrix, The",
"year" : 1999,
"@version" : "1",
"id" : "2571"
}
}
]
Operator
POST /movies/_search
{
"profile": "true",
"query": {
"match": {
"title": {
"query": "Matrix reload"
, "operator": "and" //精准筛选
}
}
}
}
//res
"profile" : {
"shards" : [
{
"id" : "[QG8Co41UQGKuwzGrkvpzOA][movies][0]",
"searches" : [
{
"query" : [
{
"type" : "BooleanQuery",
"description" : "+title:matrix +title:reload",//精准筛选
"time_in_nanos" : 2900408,
Minimum_should_match
POST /movies/_search
{
"profile": "true",
"query": {
"match": {
"title": {
"query": "Matrix reload",
"minimum_should_match": 2
}
}
}
}
//res
"profile" : {
"shards" : [
{
"id" : "[BsfHcVuGT8-7CROZ1odZUg][movies][0]",
"searches" : [
{
"query" : [
{
"type" : "BooleanQuery",
"description" : "(title:matrix title:reload)~2",
"time_in_nanos" : 5050509,
Match Phrase Query
POST /movies/_search
{
"profile": "true",
"query": {
"match_phrase": {
"title": {
"query": "Matrix reload",
"slop": 1
}
}
}
}
//res
"profile" : {
"shards" : [
{
"id" : "[BsfHcVuGT8-7CROZ1odZUg][movies][0]",
"searches" : [
{
"query" : [
{
"type" : "PhraseQuery",
"description" : """title:"matrix reload"~1""",
Match Query 查询过程
Match Query 查询过程-
基于全⽂本的查找
- Match Query / Match Phrase Query / Query String Query
-
基于全⽂本的查询的特点
-
索引和搜索时都会进⾏分词,查询字符串先传递到⼀个合适的分词器,然后⽣成⼀个供查询的词项列表
-
查询会对每个词项逐个进⾏底层的查询,再将结果进⾏合并。并为每个⽂档⽣成⼀个算分
-
本节知识点回顾
-
基于词项的查找 vs 基于全⽂的查找
-
通过字段 Mapping 控制字段的分词
- "Text" vs "Keyword"
-
通过参数控制查询的 Precision & Recall
-
复合查询 – Constant Score 查询
-
即便是对 Keyword 进⾏ Term 查询,同样会进⾏算分
-
可以将查询转为 Filtering,取消相关性算分的环节,以提升性能
-
课程demo
DELETE products
PUT products
{
"settings": {
"number_of_shards": 1
}
}
POST /products/_bulk
{ "index": { "_id": 1 }}
{ "productID" : "XHDK-A-1293-#fJ3","desc":"iPhone" }
{ "index": { "_id": 2 }}
{ "productID" : "KDKE-B-9947-#kL5","desc":"iPad" }
{ "index": { "_id": 3 }}
{ "productID" : "JODL-X-1937-#pV7","desc":"MBP" }
GET /products
POST /products/_search
{
"query": {
"term": {
"desc": {
//"value": "iPhone"
"value":"iphone"
}
}
}
}
POST /products/_search
{
"query": {
"term": {
"desc.keyword": {
//"value": "iPhone"
//"value":"iphone"
}
}
}
}
POST /products/_search
{
"query": {
"term": {
"productID": {
"value": "XHDK-A-1293-#fJ3"
}
}
}
}
POST /products/_search
{
//"explain": true,
"query": {
"term": {
"productID.keyword": {
"value": "XHDK-A-1293-#fJ3"
}
}
}
}
POST /products/_search
{
"explain": true,
"query": {
"constant_score": {
"filter": {
"term": {
"productID.keyword": "XHDK-A-1293-#fJ3"
}
}
}
}
}
#设置 position_increment_gap
DELETE groups
PUT groups
{
"mappings": {
"properties": {
"names":{
"type": "text",
"position_increment_gap": 0
}
}
}
}
GET groups/_mapping
POST groups/_doc
{
"names": [ "John Water", "Water Smith"]
}
POST groups/_search
{
"query": {
"match_phrase": {
"names": {
"query": "Water Water",
"slop": 100
}
}
}
}
POST groups/_search
{
"query": {
"match_phrase": {
"names": "Water Smith"
}
}
}
相关阅读
https://www.elastic.co/guide/en/elasticsearch/reference/7.1/term-level-queries.html
https://www.elastic.co/guide/en/elasticsearch/reference/7.1/full-text-queries.html
网友评论