ElasticSearch Python Client Read

作者: momo1023 | 来源:发表于2019-12-03 11:36 被阅读0次

ElasticSearch Python Client Read
Elasticsearch1.7到2.3升级实践总结
Raft vs ZAB
nodejs：帐号密码链接elasticsearch
Elasticsearch第17节 Java客户端
地址迷思 Chinese Address in English
2019-04-27 ElasticSearch7.0 Low
Golang jsonrpc
elasticsearch——Rest Client
HDFS 开启 SCR 对 Hbase 的性能提升

ElasticSearch Python Client ReadTimeout

异常和报错

ElasticSearch Python Client API，Bulk操作时，当ElasticSearch服务端的性能不足时，Client可能会超时，打印类似异常：

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "E:\git_project\CV_client\py_client\algorithm\save_thread.py", line 263, in run
    self.kafak.send(imageInfoJson)
  File "E:\git_project\CV_client\py_client\algorithm\kafka_tool.py", line 38, in send
    res = self.es.index(index=index, doc_type=doc_type, body=data, id=id, request_timeout=3)
  File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\client\utils.py", line 76, in _wrapped
    return func(*args, params=params, **kwargs)
  File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\client\__init__.py", line 319, in index
    _make_path(index, doc_type, id), params=params, body=body)
  File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\transport.py", line 318, in perform_request
    status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
  File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\connection\http_urllib3.py", line 180, in perform_request
    raise ConnectionTimeout('TIMEOUT', str(e), e)
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='192.168.55.66', port=9200): Read timed out. (read timeout=3))

2017-09-27 12:37:42.228 25934/MainThread W base.py:96 POST http://localhost:9200/_bulk [status:N/A request:10.011s]
Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/elasticsearch/connection/http_urllib3.py", line 114, in perform_request
response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, **kw)
File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 649, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/util/retry.py", line 333, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 388, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 308, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
ReadTimeoutError: HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=10)

解决办法

简单的解决方法是加入timeout和重试相关参数（参考：https://stackoverflow.com/questions/25908484/how-to-fix-read-timed-out-in-elasticsearch）

Increase the default timeout Globally when you create the ES client by passing the timeout parameter. Example in Python

es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)

Set the timeout per request made by the client. Taken from Elasticsearch Python docs below.

only wait for 1 second, regardless of the client's default

es.cluster.health(wait_for_status='yellow', request_timeout=1)

我设置timeout=100，max_retries=3，因为，当ElasticSearch在做大量查询的时候，会消耗掉所有的读IO，此时bluk数据，可能POST成功，但等待服务端返回确认结果timeout了，如果timeout时间设置太短，而max_retries设置太多，会导致数据重复插入max_retries次。

API参数介绍

>>> help(elasticsearch.Elasticsearch)

Help on class Elasticsearch in module elasticsearch.client:
class Elasticsearch(__builtin__.object)
| Elasticsearch low-level client. Provides a straightforward mapping from
| Python to ES REST endpoints.
| __init__(self, hosts=None, transport_class=<class 'elasticsearch.transport.Transport'>, **kwargs)
| :arg hosts: list of nodes we should connect to. Node should be a
| dictionary ({"host": "localhost", "port": 9200}), the entire dictionary
| will be passed to the :class:`~elasticsearch.Connection` class as
| kwargs, or a string in the format of ``host[:port]`` which will be
| translated to a dictionary automatically. If no value is given the
| :class:`~elasticsearch.Urllib3HttpConnection` class defaults will be used.
|
| :arg transport_class: :class:`~elasticsearch.Transport` subclass to use.
|
| :arg kwargs: any additional arguments will be passed on to the
| :class:`~elasticsearch.Transport` class and, subsequently, to the
| :class:`~elasticsearch.Connection` instances.


>>> help(elasticsearch.Transport)

Help on class Transport in module elasticsearch.transport:
class Transport(__builtin__.object)
| Encapsulation of transport-related to logic. Handles instantiation of the
| individual connections as well as creating a connection pool to hold them.
|
| Main interface is the `perform_request` method.
|
| Methods defined here:
|
| __init__(self, hosts, connection_class=<class 'elasticsearch.connection.http_urllib3.Urllib3HttpConnection'>, connection_pool_class=<class 'elasticsearch.connection_pool.ConnectionPool'>, host_info_callback=<function get_host_info>, sniff_on_start=False, sniffer_timeout=None, sniff_timeout=0.1, sniff_on_connection_fail=False, serializer=<elasticsearch.serializer.JSONSerializer object>, serializers=None, default_mimetype='application/json', max_retries=3, retry_on_status=(502, 503, 504), retry_on_timeout=False, send_get_body_as='GET', **kwargs)
| :arg max_retries: maximum number of retries before an exception is propagated
| :arg retry_on_status: set of HTTP status codes on which we should retry
| on a different node. defaults to ``(502, 503, 504)``
| :arg retry_on_timeout: should timeout trigger a retry on different
| node? (default `False`)
|
| Any extra keyword arguments will be passed to the `connection_class`
| when creating and instance unless overriden by that connection's
| options provided as part of the hosts parameter.

这里显示，默认max_retries为3，retry_on_timeout为False，retry_on_status为(502, 503, 504)。

>>> help(elasticsearch.connection.http_urllib3.Urllib3HttpConnection)

Help on class Urllib3HttpConnection in module elasticsearch.connection.http_urllib3:
class Urllib3HttpConnection(elasticsearch.connection.base.Connection)
| Default connection class using the `urllib3` library and the http protocol.
|
| :arg host: hostname of the node (default: localhost)
| :arg port: port to use (integer, default: 9200)
| :arg url_prefix: optional url prefix for elasticsearch
| :arg timeout: default timeout in seconds (float, default: 10)

可以看出，这里原来默认timeout只有10秒。

转载自：https://blog.csdn.net/jacke121/article/details/86062773

网友评论

本文标题：ElasticSearch Python Client Read

本文链接：https://www.haomeiwen.com/subject/enkcgctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

ElasticSearch Python Client Read

ElasticSearch Python Client ReadTimeout

异常和报错

解决办法

API参数介绍

相关文章