elastic探索数据

作者: 席梦思 | 来源:发表于2017-04-11 22:30 被阅读23次

加载样本数据

curl -u elastic:changeme -XPOST 'localhost:9200/bank/account/_bulk?pretty&refresh' --data-binary "@accounts.json"

使用Search API

两种基本方式使用搜索:

  • REST request URI

GET /bank/_search?q=*&sort=account_number:asc&pretty

  • REST request body

GET /bank/_search

{

"query":{"match_all":{}},

"sort":[{

{"account_number":"asc"}

}]

}

返回值的含义:

  • took - elastic执行搜索的时间(以毫秒为单位)

  • timed_out - 搜索是否超时

  • _shards - 搜索的分片数量,以及搜索成功/失败的分片数量

  • hits - 搜索返回的结果

  • hits.total - 符合搜索条件的Document数量

  • hits.hits - 实施搜索结果的数组(默认为前10个文档)

  • hits.sort - 排序结果关键字(如果按照分数排序,则不显示)

查询语言

elastic提供了一种用于执行查询的Json风格的特定域的语言

GET /bank/_search

{

"query":{"match_all": {}}

}

query部分代表查询定义,match_all部分代表要查询的类型

可以使用其他参数影响查询结果,例如只返回一条记录

GET /bank/_search

{

"query":{"match_all": {}},

"size":1

}

返回11到20行的记录

GET /bank/_search

{

"query":{"match_all": {}},

"from":10,

"size":10

}

from参数用于指定要从哪个document索引下表开始,size参数指定from参数开始返回多少个document

GET /bank/_search

{

"query": { "match_all": {} },

"sort": { "balance": { "order": "desc" } }

}

执行搜索

返回document的字段

GET /bank/_search

{

"query": { "match_all": {} },

"_source": ["account_number", "balance"]

}

返回account_number为20的记录

GET /bank/_search

{

"query": { "match": { "account_number": 20 } }

}

查询返回地址是mill的记录

GET /bank/_search

{

"query": { "match": { "address": "mill" } }

}

查询返回地址是mill或lane的记录

GET /bank/_search

{

"query": { "match": { "address": "mill lane" } }

}

查询返回地址中包含mill lane的记录

GET /bank/_search

{

"query": { "match_phrase": { "address": "mill lane" } }

}

bool查询允许使用bool逻辑将更下的查询组合成较大的查询(组合条件查询)

GET /bank/_search

{

"query": {

"bool": {

  "must": [

    { "match": { "address": "mill" } },

    { "match": { "address": "lane" } }

  ]

}

}

}

GET /bank/_search

{

"query": {

"bool": {

  "should": [

    { "match": { "address": "mill" } },

    { "match": { "address": "lane" } }

  ]

}

}

}

GET /bank/_search

{

"query": {

"bool": {

  "must_not": [

    { "match": { "address": "mill" } },

    { "match": { "address": "lane" } }

  ]

}

}

}

GET /bank/_search

{

"query": {

"bool": {

  "must": [

    { "match": { "age": "40" } }

  ],

  "must_not": [

    { "match": { "state": "ID" } }

  ]

}

}

}

执行过滤

score是数值类型,代表文档与搜索查询匹配的相对度量。分数越高,文档越相关,分数越低,文档的相关性就越低。

查询并不是总产生分数,特别是当它们仅用于“过滤”文档集合时。

GET /bank/_search

{

"query": {

"bool": {

  "must": { "match_all": {} },

  "filter": {

    "range": {

      "balance": {

        "gte": 20000,

        "lte": 30000

      }

    }

  }

}

}

}

执行聚合

聚合提供从数据中分组和提取统计信息的功能

GET /bank/_search

{

"size": 0,

"aggs": {

"group_by_state": {

  "terms": {

    "field": "state.keyword"

  }

}

}

}

相当于SQL语句为:SELECT state, COUNT() FROM bank GROUP BY state ORDER BY COUNT() DESC。

设置size参数为0是因为要查看显示聚合的结果。

按照state字段分组,并计算balance的平均值

GET /bank/_search

{

"size": 0,

"aggs": {

"group_by_state": {

  "terms": {

    "field": "state.keyword"

  },

  "aggs": {

    "average_balance": {

      "avg": {

        "field": "balance"

      }

    }

  }

}

}

}

按照平均balacne进行排序

GET /bank/_search

{

"size": 0,

"aggs": {

"group_by_state": {

  "terms": {

    "field": "state.keyword",

    "order": {

      "average_balance": "desc"

    }

  },

  "aggs": {

    "average_balance": {

      "avg": {

        "field": "balance"

      }

    }

  }

}

}

}

按照年龄区间和性别分组,并计算balance

GET /bank/_search

{

"size": 0,

"aggs": {

"group_by_age": {

  "range": {

    "field": "age",

    "ranges": [

      {

        "from": 20,

        "to": 30

      },

      {

        "from": 30,

        "to": 40

      },

      {

        "from": 40,

        "to": 50

      }

    ]

  },

  "aggs": {

    "group_by_gender": {

      "terms": {

        "field": "gender.keyword"

      },

      "aggs": {

        "average_balance": {

          "avg": {

            "field": "balance"

          }

        }

      }

    }

  }

}

}

}

相关文章

网友评论

    本文标题:elastic探索数据

    本文链接:https://www.haomeiwen.com/subject/xccpattx.html