前情提要
据任务要求:从ES集群中查询出ip字段,对ip字段去重,并且将纯净的ip保存到文件中。
这里基于某个字段去重,其实就是wordcount
问题
1. 首先通过python制造样例数据
# -*- coding: utf-8 -*-
# 生成ip列字段
ip = []
for i in range(1, 50):
ip.append("192.168.100." + bytes(i))
# 将样例数据写入json文件
with open("data.json", "w") as f:
i = 1
for ipp in ip:
for j in range(i, i + 100):
line = '{"index":{"_index":"data","_type":"log","_id":'+bytes(j)+'}}\n{"color":"green","state":"open","address":"'+ipp+'","time":"2018-06-11"}\n'
f.write(line)
i = i + 100
部分样例数据:
{"index":{"_index":"data","_type":"log","_id":1}}
{"color":"green","state":"open","address":"192.168.100.1","time":"2018-06-11"}
{"index":{"_index":"data","_type":"log","_id":2}}
{"color":"green","state":"open","address":"192.168.100.1","time":"2018-06-11"}
{"index":{"_index":"data","_type":"log","_id":3}}
{"color":"green","state":"open","address":"192.168.100.1","time":"2018-06-11"}
{"index":{"_index":"data","_type":"log","_id":4}}
{"color":"green","state":"open","address":"192.168.100.1","time":"2018-06-11"}
2. 将样例数据批量导入到ES中
# 导入数据
curl -PUT localhost:9200/_bulk --data-binary @data.json
此时ES中已经有样例数据了
curl -X GET localhost:9200/data/log/101
###
{"_index":"data","_type":"log","_id":"101","_version":1,"found":true,"_source":{"color":"green","state":"open",}
###
3. ES的去重并保存为文件
结果处理有两种方式
- 利用jq工具,将结果保存为csv文件
# wordcount and save results as csv
curl -X GET 'http://localhost:9200/data/log/_search' -d '
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "address", # 指定字段为address
"size": 0 # 0,返回所有结果
}
}
}
}' | jq -r '.aggregations|.group_by_state|.buckets[]|[.key, .doc_count]|@csv' >> result.csv
结果样例
"192.168.100.1",100
"192.168.100.10",100
"192.168.100.11",100
etc ...
- 利用grep的正则表达式对结果进行解析
# wordcount and save results as txt
curl -X GET 'http://localhost:9200/data/log/_search' -d '
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "address",
"size": 0
}
}
}
}' | grep -Po 'key[" :]+\K[^"]+' >> result
样例结果
192.168.100.1
192.168.100.10
192.168.100.11
192.168.100.12
192.168.100.13
192.168.100.14
192.168.100.15
192.168.100.16
192.168.100.17
192.168.100.18
192.168.100.19
etc ...
网友评论