美文网首页
通过 Elasticsearch 实现聚合检索 (分组统计)

通过 Elasticsearch 实现聚合检索 (分组统计)

作者: 觉释 | 来源:发表于2020-09-21 08:48 被阅读0次
    GET test_index/_search
    {
        "size": 0,           
        "aggs": {
            "group_by_tags": { 
                "terms": {
                    "field": "tags"
                }
            }
        }
    }
    

    报错

    {
      "error" : {
        "root_cause" : [
          {
            "type" : "illegal_argument_exception",
            "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [tags] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
          }
        ],
        "type" : "search_phase_execution_exception",
        "reason" : "all shards failed",
        "phase" : "query",
        "grouped" : true,
        "failed_shards" : [
          {
            "shard" : 0,
            "index" : "test_index",
            "node" : "lt3frTKnQ7aUqcNa4CT_ww",
            "reason" : {
              "type" : "illegal_argument_exception",
              "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [tags] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
            }
          }
        ],
        "caused_by" : {
          "type" : "illegal_argument_exception",
          "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [tags] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [tags] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
          }
        }
      },
      "status" : 400
    }
    
    

    错误信息: Set fielddata=true on [xxxx] ......
    错误分析: 默认情况下, Elasticsearch 对 text 类型的字段(field)禁用了 fielddata;
    text 类型的字段在创建索引时会进行分词处理, 而聚合操作必须基于字段的原始值进行分析;
    所以如果要对 text 类型的字段进行聚合操作, 就需要存储其原始值 —— 创建mapping时指定fielddata=true, 以便通过反转倒排索引(即正排索引)将索引数据加载至内存中.

    解决方法

    解决方案一: 对text类型的字段开启fielddata属性:

    将要分组统计的text field(即tags)的fielddata设置为true:

    PUT test_index/_mapping/
    {
        "properties": {
            "tags": {
                "type": "text",
                "fielddata": true
            }
        }
    }
    

    解决方法二: 使用内置keyword字段:

    开启fielddata将占用大量的内存.

    GET  test_index/_search
    {
       "size": 0,
       "aggs": {
           "group_by_tags": {
               "terms": {
                   "field": "tags.keyword" // 使用text类型的内置keyword字段
               }
           }
       }
    }
    
    

    先检索, 再聚合

    #######(1) 统计name中含有“zhangsan”的中每个tag的文档数量, 请求语法:

    GET test_index/_search
    {
        "query": {
            "match": { "name": "zhangsan" }
        }, 
        "aggs": {
            "group_by_tags": {  // 聚合结果的名称, 需要自定义. 下面使用内置的keyword字段: 
                "terms": { "field": "tags.keyword" }
            }
        }
    }
    
    

    扩展: fielddata和keyword的聚合比较
    为某个 text 类型的字段开启fielddata字段后, 聚合分析操作会对这个字段的所有分词分别进行聚合, 获得的结果大多数情况下并不符合我们的需求.

    使用keyword内置字段, 不会对相关的分词进行聚合, 结果可能更有用.

    —— 推荐使用text类型字段的内置keyword进行聚合操作.

    先分组, 再聚合统计

    (1) 先按tags分组, 再计算每个tag下图书的平均价格, 请求语法:
    GET test_index/_search
    {
        "size": 0, 
        "aggs": {
            "group_by_tags": {
                "terms": { "field": "tags.keyword" },
                "aggs": {
                    "avg_price": {
                        "avg": { "field": "price" }
                    }
                }
            }
        }
    }
    

    先分组, 组内再分组, 然后统计、排序

    (1) 先按价格区间分组, 组内再按tags分组, 计算每个tags组的平均价格, 查询语法:

    GET test_index/_search
    {
        "size": 0, 
        "aggs": {
            "group_by_price": {
                "range": {
                    "field": "price", 
                    "ranges": [
                        { "from": 00,  "to": 100 },
                        { "from": 100, "to": 150 }
                    ]
                }, 
                "aggs": {
                    "group_by_tags": {
                        "terms": { "field": "tags.keyword" }, 
                        "aggs": {
                            "avg_price": {
                                "avg": { "field": "price" }
                            }
                        }
                    }
                }
            }
        }
    }
    
    

    相关文章

      网友评论

          本文标题:通过 Elasticsearch 实现聚合检索 (分组统计)

          本文链接:https://www.haomeiwen.com/subject/mpmdyktx.html