ElasticSearch 复杂查询

作者: tingshuo123 | 来源:发表于2018-11-27 23:50 被阅读21次

测试数据

首先下载 elastic 官方的测试数据:下载地址
通过 curl 上传测试数据

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"
curl "localhost:9200/_cat/indices?v"

也可以使用 postman 上传 localhost:9200/bank/_doc/_bulk?pretty&refresh

image.png

不管使用什么方式上传，如果上传成功，返回的 response 都应该是这样的:

{
    "took": 1016,
    "errors": false,
    "items": [
        {
            "index": {
                "_index": "bank",
                "_type": "_doc",
                "_id": "1",
                "_version": 1,
                "result": "created",
                "forced_refresh": true,
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 0,
                "_primary_term": 1,
                "status": 201
            }
        }
        ......
    ]
}

查询

elastic 可以通过两种方式查询，一种是通过 URL 传参方式，另种是通过 Body 传 JSON 格式字符串的方式查询。

URL 传参方式查询

首先我们来看一下 URL 传参的方式

GET /bank/_search?q=*&sort=account_number:asc&pretty

Response

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "0",
        "_score" : null,
        "_source" : {
          "account_number" : 0,
          "balance" : 16623,
          "firstname" : "Bradshaw",
          "lastname" : "Mckenzie",
          "age" : 29,
          "gender" : "F",
          "address" : "244 Columbus Place",
          "employer" : "Euron",
          "email" : "bradshawmckenzie@euron.com",
          "city" : "Hobucken",
          "state" : "CO"
        },
        "sort" : [
          0
        ]
      }, ...
    ]
}

仔细看一下就会能够发现，返回的的数据是以 account_number 升序排列的。

解释一下上面查询语句的作用，/bank/_search 查询 bank 索引， q=* 表示所有文档，sort=account_number 表示以 account_number 字段作为排序依据，asc 表示升序排列，&pretty 是用来返回可读性好的数据。

Body 传 JSON 字符方式查询

向下面这种 JSON 风格的查询语句，elastic 称为 DSL 语句，它效果跟上面作用是一样的，很明显 DSL 可读性比明显好于上面的，下面将重点介绍 DSL 语句。

GET /bank/_search
{
  "query": {
    "match_all": {
      
    }
  }
  , "sort": [
    {
      "account_number": {
        "order": "asc" # desc 降序
      }
    }
  ]
}

DSL 查询语句

首先来看个最简单的例子，查询 bank 索引中的所有文档

GET /bank/_search
{
  "query": { "match_all": {} }
}

size 指定返回文档数量，默认值是10(之前只有十条记录就是这个原因），配合 form 可以达到跟 MySQL 中 limit 的类似的效果。

GET /bank/_search
{
  "query": { "match_all": {} }
  , "size": 2
}

查询 10 ~ 19 的记录 [10，20），from 的默认值是 0

GET /bank/_search
{
  "query": { "match_all": {} },
  "from": 10,
  "size": 10
}

通过上面我们已经学会了排序，跟获取指定区间文档了，现在来继续学习，通过看上面的返回值可以知道我们默认返回的是整个文档，我们能不能向 MySQL 那样只查询特定字段呢，答案是可以的，看下面示例。

只查询文档中的 account_number 和 balance 字段

GET /bank/_search
{
  "query": { "match_all": {} },
  "_source": ["account_number", "balance"]
}

Response （只截取了部分记录）

{
  "_index" : "bank",
  "_type" : "_doc",
  "_id" : "25",
  "_score" : 1.0,
  "_source" : {
    "account_number" : 25,
    "balance" : 40540
  }
},
{
  "_index" : "bank",
  "_type" : "_doc",
  "_id" : "44",
  "_score" : 1.0,
  "_source" : {
    "account_number" : 44,
    "balance" : 34487
  }
}

精确查询(对于字段类型为数值的记录)，获取 age = 37 的文档

GET /bank/_search
{
  "query": { "match": { "age": 37}}
}

模糊查询（对于句子），查找 adress 中包含 mill 的文档（不区分大小写）

GET /bank/_search
{
  "query": { "match": { "address": "mill" } }
}

查找 adress 中包含 mill 或 lane 的文档（不区分大小写）

GET /bank/_search
{
  "query": { "match": { "address": "mill lane" } }
}

返回文档中元数据中的 hits.hits._sroce 表示匹配得分，1 表示完全匹配，分数越高排的越靠前。另外还需要注意的是 match 只能指定一个字段。这些规则对于下面的也适用。

bool qurey

must 需要同时匹配，and 关系，这个返回 adress 中包含 mill 和 lane 的文档

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

should 匹配其中一个就可以了，or 关系，这个返回 adress 中包含 mill 或
lane的文档

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

must_not 不匹配其中任何一个，是 nor（或非）的的关系，这个返回adress 中不包含 mill 或
lane的文档

GET /bank/_search
{
  "query": {
    "bool": {
      "must_not": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}

他们还可以组合使用，让我们看下面这个例子，查询 age = 24， state ！= ID 的顾客。

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "24" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
  , "size": 1
}

Response

  "hits" : {
    "total" : 42,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "335",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 335,
          "balance" : 35433,
          "firstname" : "Vera",
          "lastname" : "Hansen",
          "age" : 24,
          "gender" : "M",
          "address" : "252 Bushwick Avenue",
          "employer" : "Zanilla",
          "email" : "verahansen@zanilla.com",
          "city" : "Manila",
          "state" : "TN"
        }
      }
    ]
  }

Filter 过滤条件

常用的条件操作符有：

(>) 大于 - gt
(<)小于 - lt
(>=)大于等于 - gte
(<= ) 小于等于 - lte

查询 age <= 20的用户

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "age": {
            "lte": 20
          }
        }
      }
    }
  }
}

还可以组合使用，下面的查询 2000 <= balance <= 3000 的用户

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}

聚合

看写下面这个例子，统计男女人数（默认是取前10，按统计数量降序排列），查询效果相当于 SQL：SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC LIMIT 10;

GET /bank/_search
{a
  "size": 0,
  "aggs": {
    "group_by_gender": {
      "terms": {
        "field": "gender.keyword"
      }
    }
  }
}

上面的size = 0 表示只看聚合结果，如果要聚合的字段是数值类型，直接使用字段名，后面不用加 .keyword，例如统计年龄人数 "field": "age"

Response:

  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_gender" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "M",
          "doc_count" : 507
        },
        {
          "key" : "F",
          "doc_count" : 493
        }
      ]
    }
  }

设置数量

"terms": {
        "field": "age"
        , "size": 100
      }

接下来我们来个稍微复杂一点的查询，查询 哪个年龄的用户平均存款最多

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_age": {  # 可以自定义名字
      "terms": {
        "field": "age",
        "order": {
          "average_balance": "desc" # 依据 average_balance 降序排列
        }
        , "size": 1 # 显示最前面的 1 个
      },
      "aggs": {
        "average_balance": {  # 可以自定义名字
          "avg": {
            "field": "balance" # 分组后 balance 的平均值
          }
        }
      }
    }
  }
}

Response:

"aggregations" : {
    "group_by_age" : {
      "doc_count_error_upper_bound" : -1,
      "sum_other_doc_count" : 976,
      "buckets" : [
        {
          "key" : 29,
          "doc_count" : 24,
          "average_balance" : {
            "value" : 33540.666666666664
          }
        }
      ]
    }
  }

关于查询就先写到这里，上面的都是看官方记录的笔记，想了解更多内容可以看官方文档

ElasticSearch 复杂查询

测试数据

查询

URL 传参方式查询

Body 传 JSON 字符方式查询

DSL 查询语句

bool qurey

Filter 过滤条件

聚合

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读