Elasticsearch系列（5）Mapping之元数据字段

作者: 正义的杰克船长 | 来源:发表于2020-09-02 20:18 被阅读0次

1. 前言

每个文档都有与之关联的元数据字段（Metadata fields），比如_index、_type和_id元数据字段。在创建Mapping时，可以自定义其中一些元数据字段的行为。接下来具体介绍一些重要的元数据字段。

元数据字段分类

2. 标识元数据字段

_index

_index：表示文档属于哪个索引。
_index字段是虚拟公开的，即它不会作为一个真实字段添加到Lucene索引中，我们既可以在term查询(或任何被重写为term查询的查询，例如match、query_string或simple_query_string查询)中使用_index字段，也可以在前缀和通配符查询中使用。但是，它不支持regexp和fuzzy查询。_index字段也支持在聚合、排序或脚本中被使用。
字段使用示例如下：

PUT my_index_1/_doc/1
{"text":"Document in index 1"}
PUT my_index_2/_doc/2?refresh=true
{"text":"Document in index 2"}
# term查询、聚合、排序及脚本中使用_index字段
GET my_index_1,my_index_2/_search
{
  "query": {
    "terms": {
      "_index": [
        "my_index_1",
        "my_index_2"
      ]
    }
  },
  "aggs": {
    "indices": {
      "terms": {
        "field": "_index",
        "size": 10
      }
    }
  },
  "sort": [
    {
      "_index": {
        "order": "asc"
      }
    }
  ],
  "script_fields": {
    "index_name": {
      "script": {
        "lang": "painless",
        "source": "doc['_index']"
      }
    }
  }
}

_id

_id：文档的ID。
每个文档都有一个唯一标识它的_id，_id的大小被限制为512字节，_id字段会被索引，并且支持在查询中使用，在聚合和排序中可以访问_id字段的值，但不建议这样做，因为它需要加载大量数据到内存中。
字段使用示例如下：

PUT my_index_1/_doc/1
{"text":"Document with ID 1"}
PUT my_index_1/_doc/2?refresh=true
{"text":"Document with ID 2"}
# term查询中使用_id字段
GET my_index_1/_search
{"query":{"terms":{"_id":["1","2"]}}}

3. 文档源元数据字段

_source

_source：表示文档主体的原始JSON数据。
_source字段本身没有索引(因此不能搜索)，但是它会被存储，以便在执行fetch请求(如get或search)时返回。
可禁用_source字段。虽然拥有_source有时非常方便，但它确实会导致索引中的存储开销。为此，可将其禁用如下:

# 禁用_source字段
PUT my_index_1
{"mappings":{"_source":{"enabled":false}}}

可使用参数includes/excludes来包含或排除在_source字段中的内容。参数includes/excludes使用如下：

# 创建索引my_index_1，指定_source字段中参数includes/excludes
PUT my_index_1
{
  "mappings": {
    "_source": {
      "includes": [
        "*.count",
        "meta.*"
      ],
      "excludes": [
        "meta.description",
        "meta.other.*"
      ]
    }
  }
}
PUT my_index_1/_doc/1
{
  "requests": {
    "count": 10,
    "foo": "bar" 
  },
  "meta": {
    "name": "Some metric",
    "description": "Some metric description", 
    "other": {
      "foo": "one", 
      "baz": "two" 
    }
  }
}
# 我们可以通过meta.other.foo字段搜索并匹配到数据，
# 即使这个字段内容在_source中被排除。
GET my_index_1/_search
{"query":{"match":{"meta.other.foo":"one"}}}

_size

_size：_source字段的字节大小，由mapper-size插件提供。

4. 索引元数据字段

_field_names

_field_names：文档中非空值的所有字段。
在参数doc_values和参数norm禁用的情况下，exists查询会使用_field_names字段来查找特定字段具有或不具有任何非空值的文档，反之，exists查询使用其他字段查找。
可禁用_field_names字段（通常无此必要），禁用示例如下:

# 禁用_field_names字段
PUT my_index_1
{"mappings":{"_field_names":{"enabled":false}}}

_ignored

_ignored：索引并存储文档中被忽略的每个字段的名称，这些字段是由于格式错误并且ignore_malform开启而被忽略的。
_ignored字段可通过term、terms和exists查询进行搜索，并作为搜索结果的一部分返回。
字段使用示例如下:

PUT my_index_1
{
  "mappings": {
    "properties": {
      "number_one": {
        "type": "integer",
        "ignore_malformed": true
      
      },
      "number_two": {
        "type": "integer",
        "ignore_malformed": true
      }
    }
  }
}
# 由于字段number_one设置了忽略缺陷格式的数据，保存成功
PUT my_index_1/_doc/1
{"text":"Some text value","number_one":"foo"}
# 由于字段number_two设置了ignore_malformed=false，保存失败
PUT my_index_1/_doc/2
{"text":"Some text value","number_two":"foo"}
# 查询已经被忽略缺陷的数据
GET my_index_1/_search
{"query":{"exists":{"field":"_ignored"}}}

5. 路由元数据字段

_routing

_routing：一个可自定义的路由值，用于将文档路由到特定的分片（shard）。
_routing字段默认值是所在文档的_id, 当前的路由公式：

shard_num = hash(_routing) % num_primary_shards

可修改文档的_routing路由值，操作示例如下：

PUT my_index_1 
{
  "settings": {
    "number_of_shards": 2
  }, 
  "mappings": {
    "properties": {
      "title": {
        "type" : "text"
      }
    }
  }  
}
# 文档1使用user1替代id作为routing值
PUT my_index_1/_doc/1?routing=user1&refresh=true 
{"title":"This is a document"}
# 在term query中使用_routing字段搜索
GET my_index_1/_search
{"query":{"terms":{"_routing":["user1"]}}}

可设置路由必须，如果设置路由必须，那么所有文档操作必须提供路由值，设置路由必须操作如下：

PUT my_index_1 
{
  "mappings": {
    "_routing": {
      "required": true 
    }
  }  
}

6. 其他元数据字段

_meta

_meta：用于存储应用程序特定的元数据信息。
通过_meta字段可以自定义与文档相关的元数据。但是这些信息不会被Elasticsearch使用到（即不支持查询、聚合、排序等）。自定义元数据操作如下：

# 自定义元信息class、version，表示文档所属的class和适用的版本范围
PUT my_index_1
{
  "mappings": {
    "_meta": { 
      "class": "MyApp::User",
      "version": {
        "min": "1.0",
        "max": "1.3"
      }
    }
  }
}

网友评论

本文标题：Elasticsearch系列（5）Mapping之元数据字段

本文链接：https://www.haomeiwen.com/subject/iembsktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！