ElasticSearch Python Client ReadTimeout
异常和报错
ElasticSearch Python Client API,Bulk操作时,当ElasticSearch服务端的性能不足时,Client可能会超时,打印类似异常:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:\git_project\CV_client\py_client\algorithm\save_thread.py", line 263, in run
self.kafak.send(imageInfoJson)
File "E:\git_project\CV_client\py_client\algorithm\kafka_tool.py", line 38, in send
res = self.es.index(index=index, doc_type=doc_type, body=data, id=id, request_timeout=3)
File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\client\utils.py", line 76, in _wrapped
return func(*args, params=params, **kwargs)
File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\client\__init__.py", line 319, in index
_make_path(index, doc_type, id), params=params, body=body)
File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\transport.py", line 318, in perform_request
status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
File "D:\Users\Administrator\Miniconda3\envs\python3.6\lib\site-packages\elasticsearch\connection\http_urllib3.py", line 180, in perform_request
raise ConnectionTimeout('TIMEOUT', str(e), e)
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='192.168.55.66', port=9200): Read timed out. (read timeout=3))
2017-09-27 12:37:42.228 25934/MainThread W base.py:96 POST http://localhost:9200/_bulk [status:N/A request:10.011s]
Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/elasticsearch/connection/http_urllib3.py", line 114, in perform_request
response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, **kw)
File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 649, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/util/retry.py", line 333, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 388, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/home/fantom/share/Python-2.7/lib/site-packages/urllib3-1.21.1-py2.7.egg/urllib3/connectionpool.py", line 308, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
ReadTimeoutError: HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=10)
解决办法
简单的解决方法是加入timeout和重试相关参数(参考:https://stackoverflow.com/questions/25908484/how-to-fix-read-timed-out-in-elasticsearch)
Increase the default timeout Globally when you create the ES client by passing the timeout parameter. Example in Python
es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)
Set the timeout per request made by the client. Taken from Elasticsearch Python docs below.
only wait for 1 second, regardless of the client's default
es.cluster.health(wait_for_status='yellow', request_timeout=1)
我设置timeout=100,max_retries=3,因为,当ElasticSearch在做大量查询的时候,会消耗掉所有的读IO,此时bluk数据,可能POST成功,但等待服务端返回确认结果timeout了,如果timeout时间设置太短,而max_retries设置太多,会导致数据重复插入max_retries次。
API参数介绍
>>> help(elasticsearch.Elasticsearch)
Help on class Elasticsearch in module elasticsearch.client:
class Elasticsearch(__builtin__.object)
| Elasticsearch low-level client. Provides a straightforward mapping from
| Python to ES REST endpoints.
| __init__(self, hosts=None, transport_class=<class 'elasticsearch.transport.Transport'>, **kwargs)
| :arg hosts: list of nodes we should connect to. Node should be a
| dictionary ({"host": "localhost", "port": 9200}), the entire dictionary
| will be passed to the :class:`~elasticsearch.Connection` class as
| kwargs, or a string in the format of ``host[:port]`` which will be
| translated to a dictionary automatically. If no value is given the
| :class:`~elasticsearch.Urllib3HttpConnection` class defaults will be used.
|
| :arg transport_class: :class:`~elasticsearch.Transport` subclass to use.
|
| :arg kwargs: any additional arguments will be passed on to the
| :class:`~elasticsearch.Transport` class and, subsequently, to the
| :class:`~elasticsearch.Connection` instances.
>>> help(elasticsearch.Transport)
Help on class Transport in module elasticsearch.transport:
class Transport(__builtin__.object)
| Encapsulation of transport-related to logic. Handles instantiation of the
| individual connections as well as creating a connection pool to hold them.
|
| Main interface is the `perform_request` method.
|
| Methods defined here:
|
| __init__(self, hosts, connection_class=<class 'elasticsearch.connection.http_urllib3.Urllib3HttpConnection'>, connection_pool_class=<class 'elasticsearch.connection_pool.ConnectionPool'>, host_info_callback=<function get_host_info>, sniff_on_start=False, sniffer_timeout=None, sniff_timeout=0.1, sniff_on_connection_fail=False, serializer=<elasticsearch.serializer.JSONSerializer object>, serializers=None, default_mimetype='application/json', max_retries=3, retry_on_status=(502, 503, 504), retry_on_timeout=False, send_get_body_as='GET', **kwargs)
| :arg max_retries: maximum number of retries before an exception is propagated
| :arg retry_on_status: set of HTTP status codes on which we should retry
| on a different node. defaults to ``(502, 503, 504)``
| :arg retry_on_timeout: should timeout trigger a retry on different
| node? (default `False`)
|
| Any extra keyword arguments will be passed to the `connection_class`
| when creating and instance unless overriden by that connection's
| options provided as part of the hosts parameter.
这里显示,默认max_retries为3,retry_on_timeout为False,retry_on_status为(502, 503, 504)。
>>> help(elasticsearch.connection.http_urllib3.Urllib3HttpConnection)
Help on class Urllib3HttpConnection in module elasticsearch.connection.http_urllib3:
class Urllib3HttpConnection(elasticsearch.connection.base.Connection)
| Default connection class using the `urllib3` library and the http protocol.
|
| :arg host: hostname of the node (default: localhost)
| :arg port: port to use (integer, default: 9200)
| :arg url_prefix: optional url prefix for elasticsearch
| :arg timeout: default timeout in seconds (float, default: 10)
可以看出,这里原来默认timeout只有10秒。
网友评论