美文网首页
EFK 配置geo-ip落地实践(三)经纬度数据查询及格式化输出

EFK 配置geo-ip落地实践(三)经纬度数据查询及格式化输出

作者: 宋奕Ekis | 来源:发表于2018-11-20 16:47 被阅读14次

    经过之前的工作,目前已经完成了数据地图的数据格式化和录入记录,目前我们的数据地图项目已经进行到最后阶段,所以现在需要一个接口,进行格式化数据并输出,其中需要用到Elasticsearch的全文检索,检索出数据后,使用php接口格式化数据输出

    一、全文检索

    1. 搜索条件(时间,空间)
    2. 输出结果(用户数量)

    例如,一个小时内,在中国范围内,各个经纬度坐标的,有操作行为的,用户个数

    由此需求,可以得到相应的Elasticsearch的搜索语句,如下:

    {
    "size": 0,
    "aggs": {
        "filter_agg": {
            "filter": {
                "geo_bounding_box": {
                    "location": {
                        "top_left": {
                            "lat": 90,
                            "lon": -34.453125
                        },
                        "bottom_right": {
                            "lat": -90,
                            "lon": 34.453125
                        }
                    }
                }
            },
            "aggs": {
                "2": {
                    "geohash_grid": {
                        "field": "location",
                        "precision": 2
                    },
                    "aggs": {
                        "3": {
                            "geo_centroid": {
                                "field": "location"
                            }
                        }
                    }
                }
            }
        }
    },
    "stored_fields": [
        "*"
    ],
    "docvalue_fields": [
        "@timestamp"
    ],
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "@timestamp": {
                            "gte": 1542692193461,
                            "lte": 1542695793461,
                            "format": "epoch_millis"
                        }
                    }
                }
            ]
        }
    }
    }
    
    1. size=0表示不分页
    2. query为搜索主体,其中的必要条件为时间参数,即,搜索此段时间内的所有数据
    3. aggs中相当于spl中的where条件,而其中geo_bounding_box为地理范围,由左上角经纬度点到右下角经纬度点所界定的一个矩形方框。
    4. aggs嵌套,即上层条件的结果上,继续做筛选
    5. geohash_grid表示,按照你定义的精度计算每一个点的 geohash 值而将附近的位置聚合在一起,其中field为目前筛选的的字段, precision为经度,单位为km
    6. 最后,通过geo_centroid得到key为location的聚合数据

    结果数据格式如下:

    {
    "took": 428,
    "timed_out": false,
    "_shards": {
        "total": 131,
        "successful": 126,
        "skipped": 121,
        "failed": 5,
        "failures": [
            {
                "shard": 0,
                "index": "elastalert_status_status",
                "node": "w10b9zEBRpuUEQsWvNqEig",
                "reason": {
                    "type": "query_shard_exception",
                    "reason": "failed to find geo_point field [location]",
                    "index_uuid": "Dm4dpUtTTHitYN-TZFC-1g",
                    "index": "elastalert_status_status"
                }
            }
        ]
    },
    "hits": {
        "total": 360942,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "filter_agg": {
            "2": {
                "buckets": [
                    {
                        "3": {
                            "location": {
                                "lat": 48.58949514372008,
                                "lon": 7.584022147181843
                            },
                            "count": 252
                        },
                        "key": "u0",
                        "doc_count": 252
                    },
                    {
                        "3": {
                            "location": {
                                "lat": 54.420127907268785,
                                "lon": -3.120888938036495
                            },
                            "count": 181
                        },
                        "key": "gc",
                        "doc_count": 181
                    },
                    {
                        "3": {
                            "location": {
                                "lat": 42.32862451614172,
                                "lon": 3.7518564593602917
                            },
                            "count": 67
                        },
                        "key": "sp",
                        "doc_count": 67
                    },
                    {
                        "3": {
                            "location": {
                                "lat": 45.40799999143928,
                                "lon": 11.88589995726943
                            },
                            "count": 21
                        },
                        "key": "u2",
                        "doc_count": 21
                    },
                    {
                        "3": {
                            "location": {
                                "lat": 46.65579996071756,
                                "lon": 32.61779992841184
                            },
                            "count": 1
                        },
                        "key": "u8",
                        "doc_count": 1
                    }
                ]
            },
            "doc_count": 522
        }
    }
    }
    
    1. aggregations中是我们最终需要的数据
    2. 其中location为聚合的经纬度坐标,紧跟着的count则指的是,在此点2km*2km范围之内的用户数。

    自此,我们首先明白了,在Elasticsearch,如何书写search语句查询我们想要的东西。
    接下来,我们需要书写相应的php接口,来格式化输出数据

    二、接口书写

    1. 使用Elasticseach的PHP API
    2. 确定输入参数:时间范围,空间范围
    3. 确定输出数据结构,并格式化数据输出

    代码如下,有注释:

    <?php
    /**
     * Created by PhpStorm.
     * User: ekisong
     * Date: 2018/11/13
     * Time: 15:55
     */
    require 'vendor/autoload.php';
    ini_set('display_errors','on');
    error_reporting(E_ALL);
    
    use Elasticsearch\ClientBuilder;
    
    //创建Elasticsearch 的搜索对象client
    $client = ClientBuilder::create()->setHosts(["localhost:9200"])->build();
    
    //需要被筛选的字段名,默认值为location
    $fieldName = isset($_GET['field']) ? $_GET['field'] : 'location';
    
    //地理围栏左上角纬度,默认值90
    $topLeftLat = isset($_GET['top_left_lat']) ? $_GET['top_left_lat'] : 90;
    
    //地理围栏左上角经度,默认值-180
    $topLeftLon = isset($_GET['top_left_lon']) ? $_GET['top_left_lon'] : -180;
    
    //地理围栏右下角纬度,默认值-90
    $bottomRightLat = isset($_GET['bottom_right_lat']) ? $_GET['bottom_right_lat'] : -90;
    
    //地理围栏右下角经度,默认值180
    $bottomRightLon = isset($_GET['bottom_right_lon']) ? $_GET['bottom_right_lon'] : 180;
    
    //时间范围结束时间,默认当前时间
    $endTime = isset($_GET['end_time']) ? $_GET['end_time'] : time()*1000;
    
    //时间范围其实时间,默认当前时间前15分钟
    $startTime = isset($_GET['start_time']) ? $_GET['start_time'] : $endTime - 15*60*1000;
    
    //创建查询结构体
    $body = [
        'size' => 0,
        'query' => [
            'bool' => [
                'must' => [
                    [
                        'range' => [
                            '@timestamp' => [
                                'gte' => $startTime,
                                'lte' => $endTime,
                                'format' => 'epoch_millis'
                            ]
                        ]
                    ]
                ]
            ]
        ],
        'aggs' => [
            'filter_agg' => [
                'filter' => [
                    'geo_bounding_box' => [
                        'location' => [
                            'top_left' => [
                                'lat' => $topLeftLat,
                                'lon' => $topLeftLon
                            ],
                            'bottom_right' => [
                                'lat' => $bottomRightLat,
                                'lon' => $bottomRightLon
                            ]
                        ]
                    ]
                ],
                'aggs' => [
                    '2' => [
                        'geohash_grid' => [
                            'field' => $fieldName,
                            'precision' => 1
                        ],
                        'aggs' => [
                            '3' => [
                                'geo_centroid' => [
                                    'field' => $fieldName
                                ]
                            ]
                        ]
                    ]
                ]
            ]
        ],
        'stored_fields' => [
            '*'
        ],
        'docvalue_fields' => [
            '@timestamp'
        ]
    ];
    
    //搜索参数
    $params = [
        'index' => 'logstash-*',
        'body' => $body
    ];
    
    //Elasticsearch搜索结果原始数据
    $response = $client->search($params);
    
    $resultTmp = $response['aggregations']['filter_agg']['2']['buckets'];
    
    $data = array();
    
    //格式化数据
    foreach ($resultTmp as $doc)
    {
        $lat = $doc['3'][$fieldName]['lat'];
        $lon = $doc['3'][$fieldName]['lon'];
        $count = $doc['doc_count'];
        $tmp = [
            'count' => $count,
            'geometry' => [
                'type' => 'Point',
                'coordinates' => [$lon,$lat]
            ]
        ];
        $data[] = $tmp;
    }
    
    $result = array('data'=>$data,'error_msg'=>'','flag'=>1);
    
    if (empty($data))
    {
        $result['error_msg'] = 'no data';
        $result['flag'] = 0;
    }
    
    //最终输出
    echo json_encode($result);
    exit();
    

    由于H5页面插件限制,所以需要特定的数据格式。所以最终输出结果如下:

    [{
        "count": 6,
        "geometry": {
            "type": "Point",
            "coordinates": ["116.395645", "39.929986"]
        }
    }, {
        "count": 6,
        "geometry": {
            "type": "Point",
            "coordinates": ["121.487899", "31.249162"]
        }
    }, {
        "count": 5,
        "geometry": {
            "type": "Point",
            "coordinates": ["117.210813", "39.14393"]
        }
    }, {
        "count": 4,
        "geometry": {
            "type": "Point",
            "coordinates": ["106.530635", "29.544606"]
        }
    }]
    

    至此,我们数据地图项目在数据方面的工作暂且告一段落。

    参考文档:

    https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/_configuration.html

    https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html

    https://www.elastic.co/guide/cn/elasticsearch/guide/current/geohash-grid-agg.html

    相关文章

      网友评论

          本文标题:EFK 配置geo-ip落地实践(三)经纬度数据查询及格式化输出

          本文链接:https://www.haomeiwen.com/subject/lquxqqtx.html