大致学习了elasticsearch后,我在想,能不能用elasticsearch来做热词分析呢?这样,就可以看到特定业务,比如玩家反馈、玩家世界聊天、错误日志中,出现集中的词语,可以了解到非常多的信息。elasticsearch如何做热词分析呢?对于英文分析来说,非常简单,使用默认的分词器,外加聚合操作即可。
<pre>
curl localhost:9200/top-terms/_search?pretty -d '{
"aggs": {
"top-terms-aggregation": {
"terms": { "field" : "text","size":5 }
}
}
}'
</pre>
测试脚本如下:
<pre>
!/bin/sh
test_document="{
"text": "a this is pen dst is a apple"
}"
test_document1="{
"text": "a blue is always pen dst apple"
}"
test_document2="{
"text": "a hello world "
}"
if curl -fs -X HEAD localhost:9200/top-terms; then
echo "Clear the old test index"
curl -X DELETE localhost:9200/top-terms; echo "\n"
fi
echo "Create our first test index"
curl -X POST localhost:9200/top-terms; echo "\n"
echo "Index our test document"
curl -X POST localhost:9200/top-terms/test/1?refresh=true -d "${test_document}"; echo "\n"
curl -X POST localhost:9200/top-terms/test/2?refresh=true -d "${test_document1}"; echo "\n"
curl -X POST localhost:9200/top-terms/test/3?refresh=true -d "${test_document2}"; echo "\n"
echo "Our first test, aggregations, only counts the number of documents that a term matches."
curl localhost:9200/top-terms/_search?pretty -d '{
"aggs": {
"top-terms-aggregation": {
"terms": { "field" : "text","size":5 }
}
}
}'
echo
</pre>
注意,
<pre>
"terms": { "field" : "text","size":5 }这里选出top 5的词
</pre>
执行结果如下
<pre>
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [ {
"_index" : "top-terms",
"_type" : "test",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"text" : "a blue is always pen dst apple"
}
}, {
"_index" : "top-terms",
"_type" : "test",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"text" : "a this is pen dst is a apple"
}
}, {
"_index" : "top-terms",
"_type" : "test",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"text" : "a hello world "
}
} ]
},
"aggregations" : {
"top-terms-aggregation" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 5,
"buckets" : [ {
"key" : "a",
"doc_count" : 3
}, {
"key" : "apple",
"doc_count" : 2
}, {
"key" : "dst",
"doc_count" : 2
}, {
"key" : "is",
"doc_count" : 2
}, {
"key" : "pen",
"doc_count" : 2
} ]
}
}
}
</pre>
注意buckets的结果就是我们需要的结果
网友评论