美文网首页
ES深度分页踩坑

ES深度分页踩坑

作者: _空格键_ | 来源:发表于2021-02-10 14:20 被阅读0次

    先来看一个异常

    org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
            at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177)
            at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:618)
            at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:594)
            at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:501)
            at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:474)
            at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:391)
            at com.xxxx.assets.service.es.factory.rest.EsHighClientService.queryByPage(EsHighClientService.java:82)
            ... 21 common frames omitted
            Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://abc.xxxx.com:9900], URI [/blood_relation_index/blood_relation/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&search_type=dfs_query_then_fetch&batched_reduce_size=512], status line [HTTP/1.1 500 Internal Server Error]{"error":{"root_cause":[{"type":"query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [20000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"blood_relation_index","node":"RKah0wB7RDeQMmmawJqMHA","reason":{"type":"query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [20000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}}]},"status":500}
                    at org.elasticsearch.client.RestClient$1.completed(RestClient.java:357)
                    at org.elasticsearch.client.RestClient$1.completed(RestClient.java:346)
                    at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122)
                    at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177)
                    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436)
                    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326)
                    at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
                    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
                    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
                    at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
                    at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
                    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
                    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
                    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
                    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
                    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
                    ... 1 common frames omitted
    

    关键信息摘录出来: ResponseException POST 500

    {
        "error": {
            "root_cause": [
                {
                    "type": "query_phase_execution_exception",
                    "reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [20000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
                }
            ],
            "type": "search_phase_execution_exception",
            "reason": "all shards failed",
            "phase": "query",
            "grouped": true,
            "failed_shards": [
                {
                    "shard": 0,
                    "index": "blood_relation_index",
                    "node": "RKah0wB7RDeQMmmawJqMHA",
                    "reason": {
                        "type": "query_phase_execution_exception",
                        "reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [20000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
                    }
                }
            ]
        },
        "status": 500
    }
    

    返回的结果window太大,from+size 必须<=[10000],但是当前查询是[20000]。请求大数据集的更有效的方式可参阅scroll api。也可通过更改[index.max_result_window] 进行设置。

    分析

    ES服务器设置的 index.max_result_window=10000,我们查询的返回结果超出了这个限制。

    问题:为什么会查过?

    普通ES分页查询

    假设分页查询,每页size=100,你查询第100页,from和size分别是from=(100 - 1) * 100=9900, size=100,这时ES需要从各个分片上跟别取出10000条数据,如果是3各分片,总共就是3*10000条数据,然后汇总排序、过滤,再取出最终符合条件的100条数据。如果查询 第101页,这时from=10000,ES从各分片取出10100条数据。

    深度查询问题

    显然,随着分页的越深入,ES从各分片上查询的数据量越大,性能时指数级下降。

    为什么要设置 index.max_result_window=10000,就是出于这种考虑,防止耗尽ES内存资源,产生OOM。

    优化解决

    可以根据场景区分:
    1、对于深度翻阅查询没要求的需求,可以限制查询的翻页深度和数据量。
    2、或者限制操作行为,禁止跳跃翻页查询,这时可以使用scroll进行滚动查询。

    相关文章

      网友评论

          本文标题:ES深度分页踩坑

          本文链接:https://www.haomeiwen.com/subject/hbohxltx.html