美文网首页
elasticsearch Aggregations

elasticsearch Aggregations

作者: yangyangrenren | 来源:发表于2018-10-03 13:41 被阅读0次

    terms-aggregation这里有介绍聚合的结果。

    There are two error values which can be shown on the terms aggregation. The first gives a value for the aggregation as a whole which represents the maximum potential document count for a term which did not make it into the final list of terms. This is calculated as the sum of the document count from the last term returned from each shard. For the example given above the value would be 46 (2 + 15 + 29). This means that in the worst case scenario a term which was not returned could have the 4th highest document count.

    Per bucket document count error

    GET /_search
    {
        "aggs" : {
            "products" : {
                "terms" : {
                    "field" : "product",
                    "size" : 5,
                    "show_term_doc_count_error": true
                }
            }
        }
    }
    
    {
        ...
        "aggregations" : {
            "products" : {
                "doc_count_error_upper_bound" : 46,
                "sum_other_doc_count" : 79,
                "buckets" : [
                    {
                        "key" : "Product A",
                        "doc_count" : 100,
                        "doc_count_error_upper_bound" : 0
                    },
                    {
                        "key" : "Product Z",
                        "doc_count" : 52,
                        "doc_count_error_upper_bound" : 2
                    }
                    ...
                ]
            }
        }
    }
    

    These errors can only be calculated in this way when the terms are ordered by descending document count. When the aggregation is ordered by the terms values themselves (either ascending or descending) there is no error in the document count since if a shard does not return a particular term which appears in the results from another shard, it must not have that term in its index. When the aggregation is either sorted by a sub aggregation or in order of ascending document count, the error in the document counts cannot be determined and is given a value of -1 to indicate this.

    其他还有order、script等用法。

    摘抄于elastic

    嵌套桶

    在我们使用不同的嵌套方案时,聚合的力量才能真正得以显现。在前例中,我们已经看到如何将一个度量嵌入桶中,它的功能已经十分强大了。

    但真正令人激动的分析来自于将桶嵌套进 另外一个桶 所能得到的结果。 现在,我们想知道每个颜色的汽车制造商的分布:

    curl -X GET "localhost:9200/cars/transactions/_search" -H 'Content-Type: application/json' -d'
    {
       "size" : 0,
       "aggs": {
          "colors": {
             "terms": {
                "field": "color"
             },
             "aggs": {
                "avg_price": { 
                   "avg": {
                      "field": "price"
                   }
                },
                "make": { 
                    "terms": {
                        "field": "make" 
                    }
                }
             }
          }
       }
    }
    '
    
    

    这里发生了一些有趣的事。首先,我们可能会观察到之前例子中的 avg_price 度量完全没有变化,还在原来的位置。 一个聚合的每个 层级 都可以有多个度量或桶, avg_price 度量告诉我们每种颜色汽车的平均价格。它与其他的桶和度量相互独立。

    这对我们的应用非常重要,因为这里面有很多相互关联,但又完全不同的度量需要收集。聚合使我们能够用一次数据请求获得所有的这些信息。

    另外一件值得注意的重要事情是我们新增的这个 make 聚合,它是一个 terms 桶(嵌套在 colorsterms 桶内)。这意味着它会为数据集中的每个唯一组合生成( colormake )元组。

    让我们看看返回的响应(为了简单我们只显示部分结果):

    {
    ...
       "aggregations": {
          "colors": {
             "buckets": [
                {
                   "key": "red",
                   "doc_count": 4,
                   "make": { 
                      "buckets": [
                         {
                            "key": "honda", 
                            "doc_count": 3
                         },
                         {
                            "key": "bmw",
                            "doc_count": 1
                         }
                      ]
                   },
                   "avg_price": {
                      "value": 32500 
                   }
                },
    
    ...
    }
    

    响应结果告诉我们以下几点:

    • 红色车有四辆。
    • 红色车的平均售价是 $32,500 美元。
    • 其中三辆是 Honda 本田制造,一辆是 BMW 宝马制造。

    对于上面的执行,将只会返回默认的十条数据结果。

     "terms": {
                "field": "color"
             }
    

    加上"size": 1000,改成下面这样,即可。size是自定义的数量,范围在整型数据范围之内。特别说明:我的实验是在elasticsearch6.3.1下完成的。

    "terms" : { "field" : "color", "size": 1000}
    
    {
      "size": 0,
      "aggs": {
        "group_by_state": {
          "terms": {
            "field": "state"
          }
        }
      }
    }
    

    这个查询条件类似关系数据库中的group by:

    SELECT state, COUNT(*) FROM customer GROUP BY state ORDER BY COUNT(*) DESC
    

    以下内容摘抄于[Elasticsearch 2.20入门篇:聚合操作](https://my.oschina.net/secisland/blog/614127)
    演示了如何通过年龄组(年龄20-29岁,30-39岁,40-49),然后通过性别,最后得到是每个年龄段,每个性别的平均账户余额

    {
      "size": 0,
      "aggs": {
        "group_by_age": {
          "range": {
            "field": "age",
            "ranges": [
              {
                "from": 20,
                "to": 30
              },
              {
                "from": 30,
                "to": 40
              },
              {
                "from": 40,
                "to": 50
              }
            ]
          },
          "aggs": {
            "group_by_gender": {
              "terms": {
                "field": "gender"
              },
              "aggs": {
                "average_balance": {
                  "avg": {
                    "field": "balance"
                  }
                }
              }
            }
          }
        }
      }
    }
    
    {
      "took" : 15,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 5,
        "max_score" : 0.0,
        "hits" : [ ]
      },
      "aggregations" : {
        "group_by_age" : {
          "buckets" : [ {
            "key" : "20.0-30.0",
            "from" : 20.0,
            "from_as_string" : "20.0",
            "to" : 30.0,
            "to_as_string" : "30.0",
            "doc_count" : 1,
            "group_by_gender" : {
              "doc_count_error_upper_bound" : 0,
              "sum_other_doc_count" : 0,
              "buckets" : [ {
                "key" : "woman",
                "doc_count" : 1,
                "average_balance" : {
                  "value" : 87.0
                }
              } ]
            }
          }, {
            "key" : "30.0-40.0",
            "from" : 30.0,
            "from_as_string" : "30.0",
            "to" : 40.0,
            "to_as_string" : "40.0",
            "doc_count" : 3,
            "group_by_gender" : {
              "doc_count_error_upper_bound" : 0,
              "sum_other_doc_count" : 0,
              "buckets" : [ {
                "key" : "man",
                "doc_count" : 2,
                "average_balance" : {
                  "value" : 93.0
                }
              }, {
                "key" : "woman",
                "doc_count" : 1,
                "average_balance" : {
                  "value" : 99.0
                }
              } ]
            }
          }, {
            "key" : "40.0-50.0",
            "from" : 40.0,
            "from_as_string" : "40.0",
            "to" : 50.0,
            "to_as_string" : "50.0",
            "doc_count" : 1,
            "group_by_gender" : {
              "doc_count_error_upper_bound" : 0,
              "sum_other_doc_count" : 0,
              "buckets" : [ {
                "key" : "woman",
                "doc_count" : 1,
                "average_balance" : {
                  "value" : 78.0
                }
              } ]
            }
          } ]
        }
      }
    }
    

    相关文章

      网友评论

          本文标题:elasticsearch Aggregations

          本文链接:https://www.haomeiwen.com/subject/hxmnoftx.html