美文网首页部署运维大数据平台搭建大数据
Elasticsearch学习教程系列(2)-命令学习(二)批处

Elasticsearch学习教程系列(2)-命令学习(二)批处

作者: 抹布先生M | 来源:发表于2019-03-19 00:02 被阅读4次

    目前本系列文章有:
    Elasticsearch学习教程系列(0)-入门与安装
    Elasticsearch学习教程系列(1)-命令学习(一) 集群健康、索引、文档操作
    Elasticsearch学习教程系列(2)-命令学习(二)批处理、数据操作、搜索

    上一篇文章中,我们介绍了Elasticsearch集群运行情况、索引、文档的CRUD操作了,下面让我们来愉快地学习一些新的命令吧。

    批处理

    Elasticsearch除了能够索引,更新和删除单个文档之外,还提供了使用_bulkAPI批量执行上述任何操作的功能。此功能非常重要,因为它提供了一种非常有效的机制,可以尽可能快地执行多个操作,并尽可能少地进行网络往返。
    作为一个简单示例,以下调用在一个批量操作中索引两个文档(ID 1 - John Doe和ID 2 - Jane Doe):

    [builder @master~] $ curl - X POST "localhost:9200/customer/_doc/_bulk?pretty" - H 'Content-Type: application/json' - d '
    {
        "index":
        {
            "_id": "1"
        }
    }
    {
        "name": "John Doe"
    }
    {
        "index":
        {
            "_id": "2"
        }
    }
    {
        "name": "Jane Doe"
    }
    '
    {
        "took": 19,
        "errors": false,
        "items": [
        {
            "index":
            {
                "_index": "customer",
                "_type": "_doc",
                "_id": "1",
                "_version": 6,
                "result": "updated",
                "_shards":
                {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 5,
                "_primary_term": 2,
                "status": 200
            }
        },
        {
            "index":
            {
                "_index": "customer",
                "_type": "_doc",
                "_id": "2",
                "_version": 1,
                "result": "created",
                "_shards":
                {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 0,
                "_primary_term": 2,
                "status": 201
            }
        }]
    }
    

    下面示例更新第一个文档( ID为1), 然后在一个批量操作中删除第二个文档( ID为2):

    [builder @master~] $ curl - X POST "localhost:9200/customer/_doc/_bulk?pretty" - H 'Content-Type: application/json' - d '
    {
        "index":
        {
            "_id": "1"
        }
    }
    {
        "name": "John Doe"
    }
    {
        "index":
        {
            "_id": "2"
        }
    }
    {
        "name": "Jane Doe"
    }
    '
    {
        "took": 15,
        "errors": false,
        "items": [
        {
            "index":
            {
                "_index": "customer",
                "_type": "_doc",
                "_id": "1",
                "_version": 7,
                "result": "updated",
                "_shards":
                {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 6,
                "_primary_term": 2,
                "status": 200
            }
        },
        {
            "index":
            {
                "_index": "customer",
                "_type": "_doc",
                "_id": "2",
                "_version": 2,
                "result": "updated",
                "_shards":
                {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 1,
                "_primary_term": 2,
                "status": 200
            }
        }]
    }
    

    请注意,对于删除操作,之后没有相应的源文档,因为删除只需要删除文档的ID。
    Bulk API不会因其中一个操作失败而失败。如果单个操作因任何原因失败,它将继续处理其后的其余操作。批量API返回时,它将为每个操作提供一个状态(按照发送的顺序),以便您可以检查特定操作是否失败。

    数据操作

    导入数据

    下面我们在某个文件夹中保存着1个json文件,内容如下:

    [builder@master ~/Developer/esTempData]$ cat accounts.json
    {"index":{"_id":"1"}}
    {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
    
    {"index":{"_id":"6"}}
    {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
    ...
    # 我们的json文件中有1000条数据,上面只展示2条数据
    

    导入我们的json文件数据到Elasticsearch,需要在json文件的当前路径下执行:

    [builder@master ~/Developer/esTempData]$ curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"
    ...
    
    ## 查看索引数据,可以看到第2行的数据有1000条记录了
    [builder@master ~]$  curl -X GET "http://localhost:9200/_cat/indices?v"
    health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
    green  open   .kibana_1 WShElb71RVigvhHRsU5vIA   1   0          3            0     11.9kb         11.9kb
    yellow open   bank      LLrKZKoNT-ifFpiv-dBc9w   5   1       1000            0    474.6kb        474.6kb
    yellow open   customer  jrkOIUCjTLec7_5OcwldYw   5   1          3            0     10.9kb         10.9kb
    

    搜索API

    搜索有两种基本方式:一种是通过发送搜索参数REST请求URI和其他通过发送他们REST请求JSON主体。请求JSON体方法更具表现力,并以更可读的JSON格式定义搜索。我们将尝试一个请求URI方法的示例,但是对于本教程的其余部分,我们将专门使用请求体方法。

    可以从_search端点访问用于搜索的REST API 。此示例返回bank索引中的所有文档:

    [builder@master ~]$ curl -X GET 'localhost:9200/bank/_search?q=*&sort=account_number:asc&pretty'
    

    q=*参数指示Elasticsearch匹配索引中的所有文档。该sort=account_number:asc参数指示使用account_number每个文档的字段以升序对结果进行排序。该pretty参数再次告诉Elasticsearch返回漂亮的JSON结果。

    响应(部分显示):

    {
      "took" : 63,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 1000,
        "max_score" : null,
        "hits" : [ {
          "_index" : "bank",
          "_type" : "_doc",
          "_id" : "0",
          "sort": [0],
          "_score" : null,
          "_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
        }, {
          "_index" : "bank",
          "_type" : "_doc",
          "_id" : "1",
          "sort": [1],
          "_score" : null,
          "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
        }, ...
        ]
      }
    }
    

    关于response响应字段含义:

    • took - Elasticsearch执行搜索的时间(以毫秒为单位)
    • timed_out - 告诉我们搜索是否超时
    • _shards - 告诉我们搜索了多少个分片,以及搜索成功/失败分片的计数
    • hits - 搜索结果
    • hits.total - 符合我们搜索条件的文档总数
    • hits.hits - 实际的搜索结果数组(默认为前10个文档)
    • hits.sort - 对结果进行排序键(如果按分数排序则丢失)
    • hits._scoremax_score- 暂时忽略这些字段

    下面使用JSON请求体的方法完成上述相同的搜索:

    $ curl -X GET 'localhost:9200/bank/_search' -d '{
      "query": { "match_all": {} },
      "sort": [
        { "account_number": "asc" }
      ]
    }' -H "Content-Type:application/json"
    
    # response 报文:
    {"took":14,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":1000,"max_score":null,"hits":[{"_index":"bank","_type":"_doc","_id":"0","_score":null,"_source":{"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"},"sort":[0]}...
    
    -End- 扫描关注一下,可以互相交流学习哟

    相关文章

      网友评论

        本文标题:Elasticsearch学习教程系列(2)-命令学习(二)批处

        本文链接:https://www.haomeiwen.com/subject/nkfwmqtx.html