Elasticsearch 08 keyword和text

作者: 极光火狐狸 | 来源:发表于2017-09-20 17:06 被阅读1374次

Elasticsearch 08 keyword和text
Elasticsearch中Text和Keyword类型的区别
ElasticSearch 6.x 新版本特性验证
text + keyword
elasticsearch的keyword与text的区别
elasticsearch基本查询笔记（二）-- 分词查询
elasticSearch7.x—mapping中的fields
运行
Keyword / Text - based Search 基于
Elasticsearch6.x 变化

Elasticsearch 5.0.0 版本之后将string拆分成两个新的类型: text和keyword.

Keyword类型:

用于存储邮箱号码、手机号码、主机名、状态码、邮政编码、标签、年龄、性别等数据。

用于筛选数据(例如: select * from x where status='open')、排序、聚合(统计)。

直接将完整的文本保存到倒排索引中。

Text类型:

用于存储全文搜索数据, 例如: 邮箱内容、地址、代码块、博客文章内容等。

默认结合standard analyzer(标准解析器)对文本进行分词、倒排索引。

默认结合标准分析器进行词命中、词频相关度打分。

Keyword

"""
精确值字段
"""
from elasticsearch import Elasticsearch
from pprint import pprint
import time

es = Elasticsearch(hosts=["192.168.1.100"])

# 删除索引
if es.indices.exists(index="pra_keyword"):
    es.indices.delete(index="pra_keyword")

# 创建索引
if not es.indices.exists(index="pra_keyword"):
    s1 = es.indices.create(
        index="pra_keyword",
        body={
            "mappings": {
                "my_type": {
                    "properties": {
                        "title": {
                            "type": "keyword",
                        }
                    }
                }
            }
        }
    )
    pprint(s1)

    # 准备数据
    s2 = es.bulk(
        index="pra_keyword",
        doc_type="my_type",
        body=[
            {"create": {"_id": 1}},
            {"title": "ElasticSearch好像没有那么简单喔!"},
            {"create": {"_id": 2}},
            {"title": "Elasticsearch其实没那么难, 就是api多了点, 一个一个攻克呗."},
            {"create": {"_id": 3}},
            {"title": "我觉得elasticsearch完全可以替代关系型数据库了."},
            {"create": {"_id": 4}},
            {"title": "你这么说太武断了把, es毕竟没有事务和回滚, 安全也很粗糙."}
        ]
    )

    time.sleep(10)  # 等待refresh.


# 搜索(不命中)
s = es.search(
    index="pra_keyword",
    doc_type="my_type",
    body={
        "query": {
            "match": {
                "title": "elasticsearch"
            }
        }
    }
)
print("搜索: <elasticsearch>: 不命中.")
pprint(s)


# 搜索(不命中)
s = es.search(
    index="pra_keyword",
    doc_type="my_type",
    body={
        "query": {
            "match": {
                "title": "Elasticsearch"
            }
        }
    }
)
print("搜索: <Elasticsearch> : 不命中.")
pprint(s)


# 搜索(命中)
s = es.search(
    index="pra_keyword",
    doc_type="my_type",
    body={
        "query": {
            "match": {
                "title": "ElasticSearch好像没有那么简单喔!"
            }
        }
    }
)
print("搜索: <ElasticSearch好像没有那么简单喔!>: 命中")
pprint(s)


# 分析
s = es.indices.analyze(
    index="pra_keyword",
    body={
        "field": "title",
        "text": "ElasticSearch好像没有那么简单喔!"
    }
)
print("分析: <ElasticSearch好像没有那么简单喔!>")
pprint(s)

参数: fields

上面的例子使用了keyword之后, 只能用作聚合、排序。不能用来做全文匹配。
为了能够既支持聚合、排序，也支持全文搜索, 那么可以通过使用fields字段来补充支持.

"""
精确值字段
"""
from elasticsearch import Elasticsearch
from pprint import pprint
import time

es = Elasticsearch(hosts=["192.168.1.100"])

# 删除索引
if es.indices.exists(index="pra_keyword"):
    es.indices.delete(index="pra_keyword")

# 创建索引
if not es.indices.exists(index="pra_keyword"):
    s1 = es.indices.create(
        index="pra_keyword",
        body={
            "mappings": {
                "my_type": {
                    "properties": {
                        "title": {
                            "type": "keyword",
                            "fields": {
                                "make_title_searchable": {
                                    "type": "text"
                                }
                            }
                        }
                    }
                }
            }
        }
    )
    pprint(s1)

    # 准备数据
    s2 = es.bulk(
        index="pra_keyword",
        doc_type="my_type",
        body=[
            {"create": {"_id": 1}},
            {"title": "ElasticSearch好像没有那么简单喔!"},
            {"create": {"_id": 2}},
            {"title": "Elasticsearch其实没那么难, 就是api多了点, 一个一个攻克呗."},
            {"create": {"_id": 3}},
            {"title": "我觉得elasticsearch完全可以替代关系型数据库了."},
            {"create": {"_id": 4}},
            {"title": "你这么说太武断了把, es毕竟没有事务和回滚, 安全也很粗糙."}
        ]
    )

    time.sleep(10)  # 等待refresh.


# 搜索(不命中)
s = es.search(
    index="pra_keyword",
    doc_type="my_type",
    body={
        "query": {
            "match": {
                "title": "elasticsearch"
            }
        }
    }
)
print("搜索: <elasticsearch>: 不命中.")
pprint(s)


# 搜索(不命中)
s = es.search(
    index="pra_keyword",
    doc_type="my_type",
    body={
        "query": {
            "match": {
                "title": "Elasticsearch"
            }
        }
    }
)
print("搜索: <Elasticsearch> : 不命中.")
pprint(s)


# 搜索(命中)
s = es.search(
    index="pra_keyword",
    doc_type="my_type",
    body={
        "query": {
            "match": {
                "title": "ElasticSearch好像没有那么简单喔!"
            }
        }
    }
)
print("搜索: <ElasticSearch好像没有那么简单喔!>: 命中")
pprint(s)


# 分析
s = es.indices.analyze(
    index="pra_keyword",
    body={
        "field": "title",
        "text": "ElasticSearch好像没有那么简单喔!"
    }
)
print("分析: <ElasticSearch好像没有那么简单喔!>")
pprint(s)


# 全文搜索
s = es.search(
    index="pra_keyword",
    doc_type="my_type",
    body={
        "query": {
            "match": {
                "title.make_title_searchable": "ElasticSearch"
            }
        }
    }
)
print("全文搜索: <ElasticSearch>: 命中3条数据")
pprint(s)