elasticsearch 的 python API

作者: momo1023 | 来源:发表于2019-12-05 17:33 被阅读0次

Python Elasticsearch
elasticsearch 的 python API
999 - Elasticsearch 快速上手
Elasticsearch使用API心得
八、常用资料链接
Elasticsearch1.7到2.3升级实践总结
ElasticSearch5.x入门
ElasticSearch Java客户端初始化
精读Elasticsearch
［6］elasticsearch源码深入分析——API源码分析

导入 es

from elasticsearch import Elasticsearch

创建索引

es = Elasticsearch()
 
result = es.indices.create(index='news', ignore=400)
print(result)

{'error': {'root_cause': [{'type': 'resource_already_exists_exception', 'reason': 'index [news/habhrfkzSey5_GR-WmZPYA] already exists', 'index_uuid': 'habhrfkzSey5_GR-WmZPYA', 'index': 'news'}], 'type': 'resource_already_exists_exception', 'reason': 'index [news/habhrfkzSey5_GR-WmZPYA] already exists', 'index_uuid': 'habhrfkzSey5_GR-WmZPYA', 'index': 'news'}, 'status': 400}

其中的 acknowledged 字段表示创建操作执行成功

重复创建索引，会引发 400 错误，因此我们需要 ignore 参数来屏蔽 400 错误

es = Elasticsearch()
 
result = es.indices.create(index='news', ignore=400)
print(result)

{'error': {'root_cause': [{'type': 'resource_already_exists_exception', 'reason': 'index [news/habhrfkzSey5_GR-WmZPYA] already exists', 'index_uuid': 'habhrfkzSey5_GR-WmZPYA', 'index': 'news'}], 'type': 'resource_already_exists_exception', 'reason': 'index [news/habhrfkzSey5_GR-WmZPYA] already exists', 'index_uuid': 'habhrfkzSey5_GR-WmZPYA', 'index': 'news'}, 'status': 400}

删除索引

es = Elasticsearch()
 
result = es.indices.delete(index='news', ignore=[400, 404])
print(result)

{'acknowledged': True}

这里也是使用了 ignore 参数，来忽略 Index 不存在而删除失败导致程序中断的问题

es = Elasticsearch()
 
result = es.indices.delete(index='faq', ignore=[400, 404])
print(result)

{'acknowledged': True}

插入数据

Elasticsearch 就像 MongoDB 一样，在插入数据的时候可以直接插入结构化字典数据，插入数据可以调用 create() 方法

from elasticsearch import Elasticsearch
 
es = Elasticsearch()

es.indices.create(index='news', ignore=400)
 
data = {'title': '美国留给伊拉克的是个烂摊子吗', 'url': 'http://view.news.qq.com/zt2011/usa_iraq/index.htm'}
result = es.create(index='news', doc_type='faq', id=1, body=data)
print(result)

{'_index': 'news', '_type': 'faq', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}

首先声明了一条新闻数据 data，包括标题和链接，然后通过调用 create() 方法插入了这条数据，在调用 create() 方法时，我们传入了四个参数，index 参数代表了索引名称，doc_type 代表了文档类型，body 则代表了文档具体内容，id 则是数据的唯一标识 ID

结果中 result 字段为 created，代表该数据插入成功

我们也可以使用 index() 方法来插入数据，但与 create() 不同的是，create() 方法需要我们指定 id 字段来唯一标识该条数据，而 index() 方法则不需要，如果不指定 id，会自动生成一个 id

es.index(index='news', body=data)

{'_index': 'news',
 '_type': '_doc',
 '_id': '33Fm1W4BMbliRpYyFApv',
 '_version': 1,
 'result': 'created',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 1,
 '_primary_term': 1}

create() 方法内部其实也是调用了 index() 方法，是对 index() 方法的封装

更新数据

更新数据需要指定数据的 id 和内容，调用 update() 方法即可
不知道为什么一直会报错，没想明白原因，求大佬们指点迷津

es = Elasticsearch()
 
data = {'title': '美国留给伊拉克的是个烂摊子吗', 'url': 'http://view.news.qq.com/zt2011/usa_iraq/index.htm', 'date': '2019'}
result = es.update(index='news', doc_type='faq', body=data, id=1)
print(result)

---------------------------------------------------------------------------

RequestError                              Traceback (most recent call last)

<ipython-input-9-769542d9ad85> in <module>
      2 
      3 data = {'title': '美国留给伊拉克的是个烂摊子吗', 'url': 'http://view.news.qq.com/zt2011/usa_iraq/index.htm', 'date': '2019'}
----> 4 result = es.update(index='news', doc_type='faq', body=data, id=1)
      5 print(result)


/usr/local/lib/python3.7/site-packages/elasticsearch/client/utils.py in _wrapped(*args, **kwargs)
     82                 if p in kwargs:
     83                     params[p] = kwargs.pop(p)
---> 84             return func(*args, params=params, **kwargs)
     85 
     86         return _wrapped


/usr/local/lib/python3.7/site-packages/elasticsearch/client/__init__.py in update(self, index, id, doc_type, body, params)
    656                 raise ValueError("Empty value passed for a required argument.")
    657         return self.transport.perform_request(
--> 658             "POST", _make_path(index, doc_type, id, "_update"), params=params, body=body
    659         )
    660 


/usr/local/lib/python3.7/site-packages/elasticsearch/transport.py in perform_request(self, method, url, headers, params, body)
    356                     headers=headers,
    357                     ignore=ignore,
--> 358                     timeout=timeout,
    359                 )
    360 


/usr/local/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py in perform_request(self, method, url, params, body, timeout, ignore, headers)
    255                 method, full_url, url, body, duration, response.status, raw_data
    256             )
--> 257             self._raise_error(response.status, raw_data)
    258 
    259         self.log_request_success(


/usr/local/lib/python3.7/site-packages/elasticsearch/connection/base.py in _raise_error(self, status_code, raw_data)
    180 
    181         raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
--> 182             status_code, error_message, additional_info
    183         )
    184 


RequestError: RequestError(400, 'x_content_parse_exception', '[1:2] [UpdateRequest] unknown field [title], parser not found')

删除数据

删除一条数据可以调用 delete() 方法，指定需要删除的数据 id 即可

es = Elasticsearch()
 
result = es.delete(index='news', id=1)
print(result)

{'_index': 'news', '_type': '_doc', '_id': '1', '_version': 2, 'result': 'deleted', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 2, '_primary_term': 1}

查询数据

对于中文来说，我们需要安装一个分词插件，这里使用的是 elasticsearch-analysis-ik，GitHub 链接为：https://github.com/medcl/elasticsearch-analysis-ik ，这里我们使用 Elasticsearch 的另一个命令行工具 elasticsearch-plugin 来安装，这里安装的版本是 7.0.1，请确保和 Elasticsearch 的版本对应起来，命令如下：

elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.0.1/elasticsearch

这里的版本号请替换成你的 Elasticsearch 的版本号

安装之后重新启动 Elasticsearch 就可以了，它会自动加载安装好的插件

es = Elasticsearch()
 
mapping = {
    'properties': {
        'title': {
            'type': 'text',
            'analyzer': 'ik_max_word',
            'search_analyzer': 'ik_max_word'
        }
    }
}
 
es.indices.delete(index='news', ignore=[400, 404])
 
es.indices.create(index='news', ignore=400)
 
result = es.indices.put_mapping(index='news', doc_type='faq', body=mapping, include_type_name=True)
print(result)

{'acknowledged': True}

这里我们先将之前的索引删除了，然后新建了一个索引，然后更新了它的 mapping 信息，mapping 信息中指定了分词的字段，指定了字段的类型 type 为 text，分词器 analyzer 和搜索分词器 search_analyzer 为 ik_max_word，即使用我们刚才安装的中文分词插件。如果不指定的话则使用默认的英文分词器。

接下来我们插入几条新的数据

datas = [
    {
        'title': '美国留给伊拉克的是个烂摊子吗',
        'url': 'http://view.news.qq.com/zt2011/usa_iraq/index.htm',
        'date': '2011-12-16'
    },
    {
        'title': '公安部：各地校车将享最高路权',
        'url': 'http://www.chinanews.com/gn/2011/12-16/3536077.shtml',
        'date': '2011-12-16'
    }, 
    { 
        'title': '中韩渔警冲突调查：韩警平均每天扣1艘中国渔船', 
        'url': 'https://news.qq.com/a/20111216/001044.htm',
        'date': '2011-12-17'
    },
    {
        'title': '中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首',
        'url': 'http://news.ifeng.com/world/detail_2011_12/16/11372558_0.shtml',
        'date': '2011-12-18'
    }
]

for data in datas:
    es.index(index='news', body=data)

根据关键词查询一下相关内容

result = es.search(index='news')
print(result)

{'took': 2, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 4, 'relation': 'eq'}, 'max_score': 1.0, 'hits': [{'_index': 'news', '_type': 'faq', '_id': '4HFm1W4BMbliRpYyaAqj', '_score': 1.0, '_source': {'title': '美国留给伊拉克的是个烂摊子吗', 'url': 'http://view.news.qq.com/zt2011/usa_iraq/index.htm', 'date': '2011-12-16'}}, {'_index': 'news', '_type': 'faq', '_id': '4XFm1W4BMbliRpYyaQoE', '_score': 1.0, '_source': {'title': '公安部：各地校车将享最高路权', 'url': 'http://www.chinanews.com/gn/2011/12-16/3536077.shtml', 'date': '2011-12-16'}}, {'_index': 'news', '_type': 'faq', '_id': '4nFm1W4BMbliRpYyaQoc', '_score': 1.0, '_source': {'title': '中韩渔警冲突调查：韩警平均每天扣1艘中国渔船', 'url': 'https://news.qq.com/a/20111216/001044.htm', 'date': '2011-12-17'}}, {'_index': 'news', '_type': 'faq', '_id': '43Fm1W4BMbliRpYyaQo6', '_score': 1.0, '_source': {'title': '中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首', 'url': 'http://news.ifeng.com/world/detail_2011_12/16/11372558_0.shtml', 'date': '2011-12-18'}}]}}

可以看到返回结果会出现在 hits 字段里面，然后其中有 total 字段标明了查询的结果条目数，还有 max_score 代表了最大匹配分数

另外我们还可以进行全文检索，这才是体现 Elasticsearch 搜索引擎特性的地方

使用 Elasticsearch 支持的 DSL 语句来进行查询，使用 match 指定全文检索，检索的字段是 title，内容是“中国领事馆”，搜索结果如下

import json

dsl = {
    'query': {
        'match': {
            'title': '中国领事馆'
        }
    }
}
 
es = Elasticsearch()
result = es.search(index='news', body=dsl)
print(json.dumps(result, indent=2, ensure_ascii=False))

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 3.7446182,
    "hits": [
      {
        "_index": "news",
        "_type": "faq",
        "_id": "43Fm1W4BMbliRpYyaQo6",
        "_score": 3.7446182,
        "_source": {
          "title": "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首",
          "url": "http://news.ifeng.com/world/detail_2011_12/16/11372558_0.shtml",
          "date": "2011-12-18"
        }
      },
      {
        "_index": "news",
        "_type": "faq",
        "_id": "4nFm1W4BMbliRpYyaQoc",
        "_score": 0.60291106,
        "_source": {
          "title": "中韩渔警冲突调查：韩警平均每天扣1艘中国渔船",
          "url": "https://news.qq.com/a/20111216/001044.htm",
          "date": "2011-12-17"
        }
      }
    ]
  }
}

这里我们看到匹配的结果有两条，第一条的分数为 3.99，第二条的分数为 0.64，这是因为第一条匹配的数据中含有“中国”和“领事馆”两个词，第二条匹配的数据中不包含“领事馆”，但是包含了“中国”这个词，所以也被检索出来了，但是分数比较低。

因此可以看出，检索时会对对应的字段全文检索，结果还会按照检索关键词的相关性进行排序，这就是一个基本的搜索引擎雏形

参考：https://blog.csdn.net/devcloud/article/details/91446259

Python Elasticsearch
Python API：https://www.elastic.co/guide/en/elasticsearch/...
elasticsearch 的 python API
导入 es 创建索引其中的 acknowledged 字段表示创建操作执行成功重复创建索引，会引发 400 错...
999 - Elasticsearch 快速上手
Elasticsearch REST API Elasticsearch提供了全面强大的REST API：检查集...
Elasticsearch使用API心得
Elasticsearch Elasticsearch官方API文档 1.建立连接 java api使用搜索的时候...
八、常用资料链接
Spring Data Elasticsearch ES Java API Elasticsearch 权威指南中...
Elasticsearch1.7到2.3升级实践总结
概括简述升级分为Elasticsearch server升级和Elasticsearch client api...
ElasticSearch5.x入门
ES的安装安装ElasticSearch ElasticSearch 各个版本之间的差异较大，Java API也...
ElasticSearch Java客户端初始化
简介正文参考文档： ElasticSearch Java API 1. ElasticSearch 6.x.x...
精读Elasticsearch
Elasticsearch是什么？（红色标注为关键字） Elasticsearch它是遵循 restful API...
［6］elasticsearch源码深入分析——API源码分析
上一篇中我们讲到ElasticSearch中分为五类API（查看API，文档API，搜索API，索引API，集群A...