美文网首页
使用elasticsearch做热词统计

使用elasticsearch做热词统计

作者: randyjia | 来源:发表于2016-06-07 13:52 被阅读4058次

    大致学习了elasticsearch后,我在想,能不能用elasticsearch来做热词分析呢?这样,就可以看到特定业务,比如玩家反馈、玩家世界聊天、错误日志中,出现集中的词语,可以了解到非常多的信息。elasticsearch如何做热词分析呢?对于英文分析来说,非常简单,使用默认的分词器,外加聚合操作即可。
    <pre>
    curl localhost:9200/top-terms/_search?pretty -d '{
    "aggs": {
    "top-terms-aggregation": {
    "terms": { "field" : "text","size":5 }
    }
    }
    }'
    </pre>

    测试脚本如下:
    <pre>

    !/bin/sh

    test_document="{
    "text": "a this is pen dst is a apple"
    }"

    test_document1="{
    "text": "a blue is always pen dst apple"
    }"

    test_document2="{
    "text": "a hello world "
    }"

    if curl -fs -X HEAD localhost:9200/top-terms; then
    echo "Clear the old test index"
    curl -X DELETE localhost:9200/top-terms; echo "\n"
    fi

    echo "Create our first test index"
    curl -X POST localhost:9200/top-terms; echo "\n"

    echo "Index our test document"
    curl -X POST localhost:9200/top-terms/test/1?refresh=true -d "${test_document}"; echo "\n"
    curl -X POST localhost:9200/top-terms/test/2?refresh=true -d "${test_document1}"; echo "\n"
    curl -X POST localhost:9200/top-terms/test/3?refresh=true -d "${test_document2}"; echo "\n"

    echo "Our first test, aggregations, only counts the number of documents that a term matches."
    curl localhost:9200/top-terms/_search?pretty -d '{
    "aggs": {
    "top-terms-aggregation": {
    "terms": { "field" : "text","size":5 }
    }
    }
    }'
    echo
    </pre>
    注意,
    <pre>
    "terms": { "field" : "text","size":5 }这里选出top 5的词
    </pre>

    执行结果如下
    <pre>
    {
    "took" : 3,
    "timed_out" : false,
    "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
    },
    "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [ {
    "_index" : "top-terms",
    "_type" : "test",
    "_id" : "2",
    "_score" : 1.0,
    "_source" : {
    "text" : "a blue is always pen dst apple"
    }
    }, {
    "_index" : "top-terms",
    "_type" : "test",
    "_id" : "1",
    "_score" : 1.0,
    "_source" : {
    "text" : "a this is pen dst is a apple"
    }
    }, {
    "_index" : "top-terms",
    "_type" : "test",
    "_id" : "3",
    "_score" : 1.0,
    "_source" : {
    "text" : "a hello world "
    }
    } ]
    },
    "aggregations" : {
    "top-terms-aggregation" : {
    "doc_count_error_upper_bound" : 0,
    "sum_other_doc_count" : 5,
    "buckets" : [ {
    "key" : "a",
    "doc_count" : 3
    }, {
    "key" : "apple",
    "doc_count" : 2
    }, {
    "key" : "dst",
    "doc_count" : 2
    }, {
    "key" : "is",
    "doc_count" : 2
    }, {
    "key" : "pen",
    "doc_count" : 2
    } ]
    }
    }
    }
    </pre>

    注意buckets的结果就是我们需要的结果

    相关文章

      网友评论

          本文标题:使用elasticsearch做热词统计

          本文链接:https://www.haomeiwen.com/subject/okbddttx.html