【elasticsearch】16、单字符串多字段查询

作者: cutieagain | 来源:发表于2020-03-15 21:17 被阅读0次

dis max query

单字符串查询实例

博客标题
- 文档1中出现“brown”
博客内容
- 文档1中出现了“brown”
- “brown fox”在文档2中全部出现，并且保持和查询一直的顺序（目测相关性最高）
  
  image.png

算分过程

查询should语句中的两个查询
加和两个查询的评分
乘以匹配语句的总数
除以所有语句的总数

disjunction max query查询

上例中，title和body互相竞争
- 不应该讲分数简单叠加，而是应该找到单个最佳匹配的字段的评分
disjunction max query
- 将任何与任意查询匹配的文档作为结果返回，采用字段上最匹配的评分最终评分返回

通过tie breaker参数调整

获得最佳匹配语句的评分 _score
将其他匹配语句的评分与tie_breaker相乘
对以上评分求和并规范化
tier breaker是一个介于0-1之间的浮点数，0代表使用最佳匹配，1代表所有语句同等重要

image.png
image.png

PUT /blogs/_doc/1
{
    "title": "Quick brown rabbits",
    "body":  "Brown rabbits are commonly seen."
}

PUT /blogs/_doc/2
{
    "title": "Keeping pets healthy",
    "body":  "My quick brown fox eats rabbits on a regular basis."
}

POST /blogs/_search
{
    "query": {
        "bool": {
            "should": [
                { "match": { "title": "Brown fox" }},
                { "match": { "body":  "Brown fox" }}
            ]
        }
    }
}

POST blogs/_search
{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "Quick pets" }},
                { "match": { "body":  "Quick pets" }}
            ]
        }
    }
}


POST blogs/_search
{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "Quick pets" }},
                { "match": { "body":  "Quick pets" }}
            ],
            "tie_breaker": 0.2
        }
    }
}

multi match

三种场景

最佳字段 best fileds
- 当字段之间相互竞争，又相互关联，例如title和body这样的字段，评分来自最匹配字段
多数字段 most fields
- 处理英文内容是：一种常见的手段是，在主字段（English analyzerr），抽取次干，加入同义词，以匹配更多的文档。相同的文本，加入子字段（standard analyzer），以提供更加精确的匹配。其他字段作为匹配文档提高相关度的信号。匹配字段越多则越好
混合字段 cross fields
- 对于某些实体，例如任命，地址，图书信息。需要在多个字段中确定信息，单个字段只能作为整体的一部分。希望在任何这些列出的字段中找到尽可能多的词

multi match query

best fields是默认类型，可以不用指定
minimum should match等参数可以传递到生成的query中

image.png

一个查询案例

英文分词器，导致精确度降低，时态信息丢失
standard analyzer不会对时态做一些变动，可以精确查询

image.png

使用多数字段匹配解决

用广度匹配字段title包括尽可能多的文档，以提升召回率，同时又使用字段title.std作为信号将相关度更高的文档置于结果顶部
每个字段对于最终评分的贡献可以通过自定义值boost来控制。比如，使title字段更为重要，这样同时也降低了其他信号字段的作用

image.png
image.png

跨字段的搜索

无法使用operator
可以用copy_to解决，但是需要额外的存储空间

image.png
image.png
支持使用operator
与copy_to相比，其中一个优势就是它可以在搜索时为单个字段提升权重

cross_fields

POST blogs/_search
{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "Quick pets" }},
                { "match": { "body":  "Quick pets" }}
            ],
            "tie_breaker": 0.2
        }
    }
}

POST blogs/_search
{
  "query": {
    "multi_match": {
      "type": "best_fields",
      "query": "Quick pets",
      "fields": ["title","body"],
      "tie_breaker": 0.2,
      "minimum_should_match": "20%"
    }
  }
}



POST books/_search
{
    "multi_match": {
        "query":  "Quick brown fox",
        "fields": "*_title"
    }
}


POST books/_search
{
    "multi_match": {
        "query":  "Quick brown fox",
        "fields": [ "*_title", "chapter_title^2" ]
    }
}



DELETE /titles
PUT /titles
{
    "settings": { "number_of_shards": 1 },
    "mappings": {
        "my_type": {
            "properties": {
                "title": {
                    "type":     "string",
                    "analyzer": "english",
                    "fields": {
                        "std":   {
                            "type":     "string",
                            "analyzer": "standard"
                        }
                    }
                }
            }
        }
    }
}

PUT /titles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english"
      }
    }
  }
}

POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }


GET titles/_search
{
  "query": {
    "match": {
      "title": "barking dogs"
    }
  }
}

DELETE /titles
PUT /titles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english",
        "fields": {"std": {"type": "text","analyzer": "standard"}}
      }
    }
  }
}

POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }

GET /titles/_search
{
   "query": {
        "multi_match": {
            "query":  "barking dogs",
            "type":   "most_fields",
            "fields": [ "title", "title.std" ]
        }
    }
}

GET /titles/_search
{
   "query": {
        "multi_match": {
            "query":  "barking dogs",
            "type":   "most_fields",
            "fields": [ "title^10", "title.std" ]
        }
    }
}

网友评论

本文标题：【elasticsearch】16、单字符串多字段查询

本文链接：https://www.haomeiwen.com/subject/zxwbehtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

【elasticsearch】16、单字符串多字段查询

dis max query

单字符串查询实例

算分过程

disjunction max query查询

通过tie breaker参数调整

multi match

三种场景

multi match query

一个查询案例

使用多数字段匹配解决

跨字段的搜索

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读