美文网首页我爱编程
对Elasticsearch字段进行去重,结果保存为文件

对Elasticsearch字段进行去重,结果保存为文件

作者: 幸运猪x | 来源:发表于2018-06-11 18:05 被阅读0次

前情提要

据任务要求:从ES集群中查询出ip字段,对ip字段去重,并且将纯净的ip保存到文件中。
这里基于某个字段去重,其实就是wordcount问题

1. 首先通过python制造样例数据

# -*- coding: utf-8 -*-
# 生成ip列字段
ip = []
for i in range(1, 50):
    ip.append("192.168.100." + bytes(i))

# 将样例数据写入json文件
with open("data.json", "w") as f:
    i = 1
    for ipp in ip:
        for j in range(i, i + 100):
            line = '{"index":{"_index":"data","_type":"log","_id":'+bytes(j)+'}}\n{"color":"green","state":"open","address":"'+ipp+'","time":"2018-06-11"}\n'
            f.write(line)
        i = i + 100

部分样例数据:

{"index":{"_index":"data","_type":"log","_id":1}}
{"color":"green","state":"open","address":"192.168.100.1","time":"2018-06-11"}
{"index":{"_index":"data","_type":"log","_id":2}}
{"color":"green","state":"open","address":"192.168.100.1","time":"2018-06-11"}
{"index":{"_index":"data","_type":"log","_id":3}}
{"color":"green","state":"open","address":"192.168.100.1","time":"2018-06-11"}
{"index":{"_index":"data","_type":"log","_id":4}}
{"color":"green","state":"open","address":"192.168.100.1","time":"2018-06-11"}

2. 将样例数据批量导入到ES中

# 导入数据
curl -PUT localhost:9200/_bulk --data-binary @data.json

此时ES中已经有样例数据了

curl -X GET localhost:9200/data/log/101
###
{"_index":"data","_type":"log","_id":"101","_version":1,"found":true,"_source":{"color":"green","state":"open",}
###

3. ES的去重并保存为文件

结果处理有两种方式

  • 利用jq工具,将结果保存为csv文件
# wordcount and save results as csv
curl -X GET 'http://localhost:9200/data/log/_search' -d '
{
    "size": 0,
    "aggs": {
        "group_by_state": {
            "terms": {
                "field": "address", # 指定字段为address
                "size": 0 # 0,返回所有结果
            }
        }
    }
}' | jq -r '.aggregations|.group_by_state|.buckets[]|[.key, .doc_count]|@csv' >> result.csv

结果样例

"192.168.100.1",100
"192.168.100.10",100
"192.168.100.11",100
etc ...
  • 利用grep的正则表达式对结果进行解析
# wordcount and save results as txt
curl -X GET 'http://localhost:9200/data/log/_search' -d '
{
    "size": 0,
    "aggs": {
        "group_by_state": {
            "terms": {
                "field": "address",
                "size": 0
            }
        }
    }
}' | grep -Po 'key[" :]+\K[^"]+' >> result

样例结果

192.168.100.1
192.168.100.10
192.168.100.11
192.168.100.12
192.168.100.13
192.168.100.14
192.168.100.15
192.168.100.16
192.168.100.17
192.168.100.18
192.168.100.19
etc ...

相关文章

网友评论

    本文标题:对Elasticsearch字段进行去重,结果保存为文件

    本文链接:https://www.haomeiwen.com/subject/rubheftx.html