elasticsearch 深入搜索-结构化搜索

作者: 觉释 | 来源:发表于2020-08-28 08:06 被阅读0次

elasticsearch 深入搜索-结构化搜索
Elasticsearch 深入搜索-全文搜索
Elasticsearch 深入搜索
Elasticsearch 深入搜索-多字段搜索
Elasticsearch 结构化搜索、keyword、Term
ElasticSearch入门
Elasticsearch - 结构化搜索
ElasticSearch - 结构化搜索
Elasticsearch - 聚合
Elasticsearch入门笔记

精确值查找

term 查询数字

我们首先来看最为常用的 term 查询，可以用它处理数字（numbers）、布尔值（Booleans）、日期（dates）以及文本（text）。
让我们以下面的例子开始介绍，

POST /my_store/ _bulk
{ "index": { "_id": 1 }}
{ "price" : 10, "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20, "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30, "productID" : "QQPX-R-3956-#aD8" }

GET /my_store/ _search
{
    "query" : {
        "constant_score" : { 
            "filter" : {
                "term" : { 
                    "price" : 20
                }
            }
        }
    }
}

GET /my_store/ _search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "term" : {
                    "productID" : "XHDK-A-1293-#fJ3"
                }
            }
        }
    }
}

GET /my_store/_analyze
{
  "field": "productID",
  "text": "XHDK-A-1293-#fJ3"
}

DELETE /my_store
PUT /my_store 
{
    "mappings" : {
        "products" : {
            "properties" : {
                "productID" : {
                    "type" : "string",
                    "index" : "not_analyzed" 
                }
            }
        }
    }

}

删除索引是必须的，因为我们不能更新已存在的映射。
在索引被删除后，我们可以创建新的索引并为其指定自定义映射。
这里我们告诉 Elasticsearch ，我们不想对 productID 做任何分析。

GET /my_store/ _search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "term" : {
                    "productID" : "XHDK-A-1293-#fJ3"
                }
            }
        }
    }
}

组合过滤器

布尔过滤器

{
   "bool" : {
      "must" :     [],
      "should" :   [],
      "must_not" : [],
   }
}

must
所有的语句都必须（must）匹配，与 AND 等价。
must_not
所有的语句都不能（must not）匹配，与 NOT 等价。
should
至少有一个语句要匹配，与 OR 等价。
就这么简单！当我们需要多个过滤器时，只须将它们置入 bool 过滤器的不同部分即可。

GET /my_store /_search
{
   "query" : {
      "filtered" : { 
         "filter" : {
            "bool" : {
              "should" : [
                 { "term" : {"price" : 20}}, 
                 { "term" : {"productID" : "XHDK-A-1293-#fJ3"}} 
              ],
              "must_not" : {
                 "term" : {"price" : 30} 
              }
           }
         }
      }
   }
}

嵌套布尔过滤器

GET /my_store /_search
{
   "query" : {
      "filtered" : {
         "filter" : {
            "bool" : {
              "should" : [
                { "term" : {"productID" : "KDKE-B-9947-#kL5"}}, 
                { "bool" : { 
                  "must" : [
                    { "term" : {"productID" : "JODL-X-1937-#pV7"}}, 
                    { "term" : {"price" : 30}} 
                  ]
                }}
              ]
           }
         }
      }
   }
}

查找多个精确值

GET /my_store /_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "terms" : { 
                    "price" : [20, 30]
                }
            }
        }
    }
}

包含而不是相等

一定要了解 term 和 terms 是包含（contains）操作，而非等值（equals）（判断）。如何理解这句话呢？

如果我们有一个 term（词项）过滤器 { "term" : { "tags" : "search" } } ，它会与以下两个文档同时匹配：

{ "tags" : ["search"] }
{ "tags" : ["search", "open_source"] }

尽管第二个文档包含除 search 以外的其他词，它还是被匹配并作为结果返回。

精确相等

如果一定期望得到我们前面说的那种行为（即整个字段完全相等），最好的方式是增加并索引另一个字段，这个字段用以存储该字段包含词项的数量，同样以上面提到的两个文档为例，现在我们包括了一个维护标签数的新字段：

{ "tags" : ["search"], "tag_count" : 1 }
{ "tags" : ["search", "open_source"], "tag_count" : 2 }

一旦增加这个用来索引项 term 数目信息的字段，我们就可以构造一个 constant_score 查询，来确保结果中的文档所包含的词项数量与要求是一致的：

GET /my_index/my_type/_search
{
    "query": {
        "constant_score" : {
            "filter" : {
                 "bool" : {
                    "must" : [
                        { "term" : { "tags" : "search" } }, 
                        { "term" : { "tag_count" : 1 } } 
                    ]
                }
            }
        }
    }
}

查找所有包含 term search 的文档。
确保文档只有一个标签。

范围查找

range 查询可同时提供包含（inclusive）和不包含（exclusive）这两种范围表达式，可供组合的选项如下：
• gt: > 大于（greater than）
• lt: < 小于（less than）
• gte: >= 大于或等于（greater than or equal to）
• lte: <= 小于或等于（less than or equal to）

GET /my_store /_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "range" : {
                    "price" : {
                        "gte" : 20,
                        "lt"  : 40
                    }
                }
            }
        }
    }
}

日期范围

range 查询同样可以应用在日期字段上：

"range" : {
    "timestamp" : {
        "gt" : "2014-01-01 00:00:00",
        "lt" : "2014-01-07 00:00:00"
    }
}

当使用它处理日期字段时， range 查询支持对日期计算（date math）进行操作，比方说，如果我们想查找时间戳在过去一小时内的所有文档：

"range" : {
"timestamp" : {
"gt" : "now-1h"
}
}
这个过滤器会一直查找时间戳在过去一个小时内的所有文档，让过滤器作为一个时间滑动窗口（sliding window）来过滤文档。

日期计算还可以被应用到某个具体的时间，并非只能是一个像 now 这样的占位符。只要在某个日期后加上一个双管符号 (||) 并紧跟一个日期数学表达式就能做到：

"range" : {
    "timestamp" : {
        "gt" : "2014-01-01 00:00:00",
        "lt" : "2014-01-01 00:00:00||+1M" 
    }
}

早于 2014 年 1 月 1 日加 1 月（2014 年 2 月 1 日零时）

字符串范围

range 查询同样可以处理字符串字段，字符串范围可采用字典顺序（lexicographically）或字母顺序（alphabetically）。例如，下面这些字符串是采用字典序（lexicographically）排序的：

5, 50, 6, B, C, a, ab, abb, abc, b
在倒排索引中的词项就是采取字典顺序（lexicographically）排列的，这也是字符串范围可以使用这个顺序来确定的原因。

如果我们想查找从 a 到 b （不包含）的字符串，同样可以使用 range 查询语法：

"range" : {
    "title" : {
        "gte" : "a",
        "lt" :  "b"
    }
}

处理Null值

存在查询
POST /my_index/posts/_bulk
{ "index": { "_id": "1"              }}
{ "tags" : ["search"]                }  
{ "index": { "_id": "2"              }}
{ "tags" : ["search", "open_source"] }  
{ "index": { "_id": "3"              }}
{ "other_field" : "some data"        }  
{ "index": { "_id": "4"              }}
{ "tags" : null                      }  
{ "index": { "_id": "5"              }}
{ "tags" : ["search", null]          }

tags 字段有 1 个值。
tags 字段有 2 个值。
tags 字段缺失。
tags 字段被置为 null 。
tags 字段有 1 个值和 1 个 null 。

在 Elasticsearch 中，使用 exists 查询的方式如下：

GET /my_index/posts/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "exists" : { "field" : "tags" }
            }
        }
    }
}

缺失查询

我们将前面例子中 exists 查询换成 missing 查询：

GET /my_index/posts/_search
{
    "query" : {
        "constant_score" : {
            "filter": {
                "missing" : { "field" : "tags" }
            }
        }
    }
}

网友评论

本文标题：elasticsearch 深入搜索-结构化搜索

本文链接：https://www.haomeiwen.com/subject/zwxcsktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！