17.es 深度翻页

作者: 不怕天黑_0819 | 来源:发表于2021-05-25 17:02 被阅读0次

17.es 深度翻页
mysql深度翻页优化
翻页，再翻页
关于设计的思考
动画翻页书 —— 不忘初心童心童趣
17.ES集成到Django
每天一条linux命令——less
如何解决Elasticsearch的深度翻页问题
如何解决Elasticsearch的深度翻页问题
翻页

本文集主要是总结自己在项目中使用ES 的经验教训，包括各种实战和调优。

index.max_result_window 通过在yml文件中修改这个配置可以调整最大查询数（强烈不建议修改）

es默认只能查询10000条数据，那么针对深度翻页，根据官网的示例，主要有两种方式。scroll和search after。

两种方式的优缺点：the Scroll api is recommended for efficient deep scrolling but scroll contexts are costly and it is not recommended to use it for real time user requests. The search_after parameter circumvents this problem by providing a live cursor. The idea is to use the results from the previous page to help the retrieval of the next page.

总的来说，Scroll实现简单，但是性能花费较大，不适合于实时用户请求。search after实现原理类似于通过上一页的数据来获取下一页的数据。

但是应该避免深度翻页的，对性能损耗较大，

CPU
内存
IO
网络带宽

CPU、内存和IO消耗容易理解，网络带宽问题稍难理解一点。在 query 阶段，每个shards需要返回 1,000,100 条数据给 coordinating node，而 coordinating node 需要接收 10 * 1,000,100 条数据，即使每条数据只有 _doc _id 和 _score，这数据量也很大了，而且，这才一个查询请求，那如果再乘以100呢？

深度翻页的应用场景

这种深度分页的需求确实存在，比如，被爬虫了，这个时候，直接干掉深度分页就好；又或者，业务上有遍历数据的需要，比如，有1千万粉丝的微信大V，要给所有粉丝群发消息，或者给某省粉丝群发，这时候就需要取得所有符合条件的粉丝，而最容易想到的就是利用 from + size 来实现，不过，这个是不现实的，这时就需要深度翻页。

1.Scroll

官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/5.3/search-request-scroll.html 可以通过response.getHits().getHitAt(i) 来获取指定数据

如果不要求顺序，可以使用Scroll-Scan

SearchResponse response = clientOnline.prepareSearch("subscribe_content_manage").setTypes("content_manage_info")

.setScroll(new TimeValue(60000)).setQuery(query).setSize(1000).execute().actionGet();

do {

for (SearchHit searchHit : response.getHits().getHits()) {//这里的for循环次数==size

i++;

hitMap = searchHit.getSource();

articleSyncService.test(hitMap.get("articleId").toString());

}

response = clientOnline.prepareSearchScroll(response.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();

} while (response.getHits().getHits().length != 0);

boolean clearScroll = clearScroll(response.getScrollId());

if (!clearScroll){

log.warn("clear scroll id false,scrollId:"+response.getScrollId());

}

2.search after

官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/5.3/search-request-search-after.html

GET subscribe_content_manage/content_manage_info/_search
{
"size": 1000,
"query": {
"match_all": {}
},
"search_after": [ "content_manage_info#C8E1RAGK0521807P",1481907301000],
"sort": [
{"_uid": "desc"},
{"updateTime": "asc"}
]
}

上面是kibana使用的实现范例，java代码的没有实践，但是原理是一样的，另外文档建议，使用uid这一类的唯一id也作为排序规则之一，避免相同的排序值无法确定具体的文档。通过传入上一页的排序值来获取下一页的数据。search after不是用来解决随机跳转的，搜索参数也是无状态的。

关于scroll 深度翻页的笔记：

使用scroll实现Elasticsearch数据遍历和深度分页