Elasticsearch 入门: _bulk 批量导入数据

作者: 王兵 | 来源:发表于2018-05-10 17:36 被阅读2029次

Elasticsearch 入门: _bulk 批量导入数据
Elasticsearch 7.x 深入数据准备
mysql数据同步ES问题汇总
ElasticSearch bulk 批量同步数据
elasticsearch搜索引擎简易教程（下）
Elasticsearch 批量导入数据
Elasticsearch 批量操作bulk
批量导入请求_bulk
ES 批量操作bulk
django orm

批量导入数据

使用 Elasticsearch Bulk API /_bulk批量 update

步骤：

需求：我希望批量导入一个 movie type 的名词列表到 wordbank index 索引。

准备数据：

根据官方文档，Json 数据要准备成这个格式的：

action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n

其中 action 需要是 index, create, delete and update 中的一个。

接下来准备这样的数据：

{"index": {"_index": "wordbank", "_type": "movie", "_id": 1}}
{"doc": {"name": "权力的游戏"}}
{"index": {"_index": "wordbank", "_type": "movie", "_id": 2}}
{"doc": {"name": "熊出没"}}

POST bulk

curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' --data-binary @movie_names

批量 update 成功

{"took":50,"errors":false,"items":[{"index":{"_index":"wordbank","_type":"movie","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}},{"index":{"_index":"wordbank","_type":"movie","_id":"2","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}}]}

遇到过的坑：

illegal_argument_exception:

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The bulk request must be terminated by a newline [\n]"}],"type":"illegal_argument_exception","reason":"The bulk request must be terminated by a newline [\n]"},"status":400}

原因：批量导入的 json 文件最后必须要以\n结尾，也就是需要一个空行。
解决：在 json 文件末尾加多一个回车。

header 问题：
```
{"error":"Content-Type header [application/x-www-form-urlencoded] is not supported","status":406}
```
- 原因：Elasticsearch 6.x 之后 curl 的 content-type 更严格了。
- 解决：在 curl 命令后多加一条 -H 'Content-Type: application/json'

action_request_validation_exception：

{"error":{"root_cause":[{"type":"action_request_validation_exception","reason":"Validation Failed: 1: script or doc is missing;2: script or doc is missing;"}],"type":"action_request_validation_exception","reason":"Validation Failed: 1: script or doc is missing;2: script or doc is missing;"},"status":400}

原因：bulk update 时，更新的文本需要放到 "doc" 字典下，另外 update 在这里就只是 update，如果文档不存在会报错。
解决：

{ "field1" : "value1", "field2" : "value2" }
--> { "doc" : { "field1" : "value1", "field2" : "value2" } }

不要直接在 terminal 把 curl 的结果显示出来
- 原因：因为 curl 返回的结果是个单行 json 当批量处理条目多的时候，这个单行的 json 很长。而且-s 也silent 模式是不会把这个结果去掉的，因为 -s 是 curl 的参数，会屏蔽掉 curl 的 log，但 Elasticsearch 的返回 json 是不会被屏蔽掉的。
- 解决：把输出结果导到文件
```
curl -s 'http://example.com' > /dev/null
```

据说不要重复指定 index 和 type：来源，可能是我数据量比较小，2w条，差距不大。不过前者确实省文档空间。

推荐使用这种：

POST /website/log/_bulk
{ "index": {}}
{ "event": "User logged in" }

而不是这种：

POST /_bulk
{ "index": { "_index": "website" , "_type": "blog" , }}
{ "title": "Overriding the default type" }

网友评论

程序员

本文标题：Elasticsearch 入门: _bulk 批量导入数据

本文链接：https://www.haomeiwen.com/subject/ksrtdftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Elasticsearch 入门: _bulk 批量导入数据

批量导入数据

步骤：

遇到过的坑：

相关文章

Elasticsearch 入门: _bulk 批量导入数据

Elasticsearch 7.x 深入数据准备

mysql数据同步ES问题汇总

ElasticSearch bulk 批量同步数据

elasticsearch搜索引擎简易教程（下）

Elasticsearch 批量导入数据

Elasticsearch 批量操作bulk

批量导入请求_bulk

ES 批量操作bulk

django orm

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

程序员