The bulk API makes it possible to perform many index/delete operations in a single API call. This can greatly increase the indexing speed.
批量 API 可以在单个 API 调用中执行许多索引/删除操作。这可以大大提高索引速度。
bulk示例
POST _bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
The endpoints are /_bulk, /{index}/_bulk, and {index}/{type}/_bulk. When the index or the index/type are provided, they will be used by default on bulk items that don’t provide them explicitly.
端点是 /_bulk、/{index}/_bulk 和 {index}/{type}/_bulk。当提供索引或索引/类型时,默认情况下它们将用于未明确提供它们的批量项目。
A note on the format. The idea here is to make processing of this as fast as possible. As some of the actions will be redirected to other shards on other nodes, only action_meta_data is parsed on the receiving node side.
Client libraries using this protocol should try and strive to do something similar on the client side, and reduce buffering as much as possible.
The response to a bulk action is a large JSON structure with the individual results of each action that was performed. The failure of a single action does not affect the remaining actions.
对批量操作的响应是一个大型 JSON 结构,其中包含已执行的每个操作的单独结果。单个操作的失败不会影响其余操作。
There is no "correct" number of actions to perform in a single bulk call. You should experiment with different settings to find the optimum size for your particular workload.
在单个批量调用中没有要执行的“正确”数量的操作。您应该尝试不同的设置以找到适合您特定工作负载的最佳大小。
If using the HTTP API, make sure that the client does not send HTTP chunks, as this will slow things down.
如果使用 HTTP API,请确保客户端不发送 HTTP 块,因为这会减慢速度。
Update
When using update action _retry_on_conflict can be used as field in the action itself (not in the extra payload line), to specify how many times an update should be retried in the case of a version conflict.
使用更新操作时,_retry_on_conflict 可以用作操作本身的字段(而不是在额外的有效负载行中),以指定在版本冲突的情况下应重试更新的次数。
The update action payload, supports the following options: doc (partial document), upsert, doc_as_upsert, script, params (for script), lang (for script) and _source. See update documentation for details on the options. Example with update actions:
更新操作负载,支持以下选项:doc(部分文档)、upsert、doc_as_upsert、script、params(用于脚本)、lang(用于脚本)和_source。有关选项的详细信息,请参阅更新文档。更新操作示例
POST _bulk
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"} }
{ "update" : { "_id" : "0", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
{ "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
{ "update" : {"_id" : "2", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
{ "update" : {"_id" : "3", "_type" : "type1", "_index" : "index1", "_source" : true} }
{ "doc" : {"field" : "value"} }
{ "update" : {"_id" : "4", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field" : "value"}, "_source": true}
网友评论