ElasticSearch Mapping

作者: 白奕新 | 来源:发表于2019-12-14 21:33 被阅读0次

2017/07/28
一文搞懂 Elasticsearch 之 Mapping
elasticsearch存储日期格式字段
ElasticSearch - Mapping
ElasticSearch Mapping
elasticsearch mapping
Elasticsearch——mapping
在ElasticSearch中什么是mapping，自定义一个索
Elasticsearch 更改已有字段的数据类型,清洗数据
Elasticsearch用Put Mapping API新增字

0、opeartion

{
    "order": 0,
    "index_patterns": "{INDEX-NAME}(可以写通配符)",
    "settings": {
        "index.refresh_interval": "60s"
    },
    "mappings": {
        "_source": {
            "enabled": false
        },
        "properties": {
            "is_hit_detail": {
                "type": "keyword",
                "doc_values": true
            },
            "isp_id": {
                "type": "keyword",
                "doc_values": true
            }
        }
    }
}

1、default mapping

从ES 5.x开始，索引级别的配置需要写到mapping中，而不是在elasticsearch.yml配置文件中。因此对于索引级别的全局配置信息，可以把这些信息编写到全局模板中。当模板匹配到多个的时候，会自动合并；当多个模板里面有相同的配置的时候，以order大的模板为准。

{
    "template":"*",
    "order":0,
    "settings":{
        "index.number_of_replicas":"1",
        "index.number_of_shards":24
    }
}

2、breaking change

string类型进行了修改

below v5.0	v5.0+
analyezed-string	text
not_analyzed-string	keyword

3、Mapping parameter

doc_values：用来控制是否创建正排索引，会消耗磁盘空间

parameter	meaning	other
true	意味着这些字段都可以被聚合、排序	除了analyzed的string字段，doc_values都默认开启
false	这个字段不能用于聚合、排序	可以省下磁盘空间

index：用来控制是否创建倒排索引，会消耗内存

（1）低版本

parameter	meaning
not_analyzed	数据不分词，直接创建倒排索引。*是除"String"以外的默认配置*
no	字符字段表示不能被搜索，不建立倒排索引，节省内存使用。~~数值字段被设置为0，则sum的结果为0~~
analyzed	字段会被分词以后建立倒排索引。是"String"的默认配置

（2）v5.0+

parameter	meaning
true	能被搜索,default
false	不能被检索

store：用于指定字段是否存储。当_source开启的时候，默认即存储document的所有字段。当把_source关闭，即不需要存储所有字段时，可以通过配置这个字段显示指定要开启的字段。默认false。
fielddata：对于analyzed string/text 而言doc_values不支持，而是使用fielddata属性，会在第一次聚合/排序的时候把所有的倒排索引加载到内存中生成正排索引。

PUT mapping
{
  "mappings": {
    "my_type": {
      "properties": {
        "text": {
          "type": "string",
          "fielddata": {
            "format": "disabled/paged_bytes(开启，default)" 
          }
        }
      }
    }
  }
}

norms：存储了normalization factor(numbers to represent the relative field length and the index time boost setting) 用于计算分数。不适用score的可以把他关了。

（1）低版本，存储在内存中。analyzed-string默认开启，not_analyezd-string默认关闭。

PUT mapping
{
  "mappings": {
    "my_type": {
      "properties": {
        "text": {
          "type": "string",
          "norms": {
            "enabled": "false/true(default)" 
          }
        }
      }
    }
  }
}

（2）在v5.0以后，norms改成了存储在磁盘中。配置也改成["norms":"true/false"]。keyword默认false，text默认开启。

index_options：参数用于决定倒排索引里面存储内容。"docs"代表倒排索引里面存储的是document id；"freqs"代表存储在docs的基础上多了词条频率的记录，当在用于打分的时候高词频的数据分数比较高；"positions"在"freqs"的基础上，多了词条的顺序的记录，用于proximity queries和phrase queries；"offset"在"positions"的基础上，多了起始/结束字符的偏移量，用于定位这个term在document中的位置。对于analyzed的string/text而言，默认值是offset，其他的默认值是docs。