16、修改以及定制分词器，root object简单说明， dy

作者: 众神开挂 | 来源:发表于2020-03-31 10:34 被阅读0次

16、修改以及定制分词器，root object简单说明， dy
Elasticsearch---索引管理、基于scroll+bu
58、索引管理_修改分词器以及定制自己的分词器
装饰者模式
MySQL 修改root密码
五十九、Elasticsearch索引管理-修改分词器以及定制自
Ubuntu之mysql
JAVA基础系列（四）几种常用的类
AndroidStudio中真机调试无法访问Data目录的解决办
VPS服务器使用

主要内容：修改以及定制分词器，root object简单说明， dynamic mapping（动态映射）

1、修改以及定制分词器

1.1、默认的分词器 standard

standard tokenizer：以单词边界进行切分
standard token filter：什么都不做
lowercase token filter：将所有字母转换为小写
stop token filer（默认被禁用）：移除停用词，比如a、the、 it等等

1.2、修改分词器的设置

启用 english的停用词token filter

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "es_std": {
          "type": "standard",
          "stopwords": "_english_"
        }
      }
    }
  }
}

可以试着运行下列两个方法，观察区别

GET /my_index/_analyze
{
  "analyzer": "standard", 
  "text": "a dog is in the house"
}

GET /my_index/_analyze
{
  "analyzer": "es_std",
  "text":"a dog is in the house"
}

1.3、定制化自己的分词器

PUT /my_index
{
  "settings": {
    "analysis": {
      "char_filter": { //自定义一个char_filter ，将&符号转化为and
        "&_to_and": {
          "type": "mapping",
          "mappings": [
            "&=> and"   
          ]
        }
      },
      "filter": {   //自定义停用词，
        "my_stopwords": {
          "type": "stop",
          "stopwords": [
            "the",     // the a 为停用词
            "a"
          ]
        }
      },
      "analyzer": {   自定义分词器
        "my_analyzer": {
          "type": "custom",
          "char_filter": [
            "html_strip",
            "&_to_and"
          ],
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_stopwords"
          ]
        }
      }
    }
  }
}

测试一下，观察结果

GET /my_index/_analyze
{
  "text": "tom&jerry are a friend in the house, <a>, HAHA!!",
  "analyzer": "my_analyzer"
}

使用自己定义的分词器

PUT /my_index/_mapping
{
  "properties": {
    "content": {   //对content字段使用自定义的分词器
      "type": "text",
      "analyzer": "my_analyzer" 
    }
  }
}

2、root object

2.1、root object概念

就是某个type对应的mapping json，包括了properties，metadata（_id，_source，_type），settings（analyzer），其他settings（比如include_in_all）

PUT /my_index
{
  "mappings": {
    "my_type": {
      "properties": {}
    }
  }
}

2.2、properties

type，index，analyzer

PUT /my_index/_mapping/
{
  "properties": {
    "title": {
      "type": "text"
    }
  }
}

2.3、_source

好处

（1）查询的时候，直接可以拿到完整的document，不需要先拿document id，再发送一次请求拿document
（2）partial update基于_source实现
（3）reindex时，直接基于_source实现，不需要从数据库（或者其他外部存储）查询数据再修改
（4）可以基于_source定制返回field
（5）debug query更容易，因为可以直接看到_source

如果不需要上述好处，可以禁用_source

PUT /my_index/_mapping
{
  "_source": {
    "enabled": false
  }
}

2.4、_all

将所有field打包在一起，作为一个_all field，建立索引。没指定任何field进行搜索时，就是使用_all field在搜索。

···

PUT /my_index/_mapping/my_type3
{
"_all": {"enabled": false}
}

也可以在field级别设置include_in_all field，设置是否要将field的值包含在_all field中

PUT /my_index/_mapping/my_type4
{
"properties": {
"my_field": {
"type": "text",
"include_in_all": false
}
}
}

3、dynamic mapping定制化策略

3.1、定制dynamic策略

true：遇到陌生字段，就进行dynamic mapping
false：遇到陌生字段，就忽略
strict：遇到陌生字段，就报错

PUT /my_index
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "title": {
        "type": "text"
      },
      "address": {
        "type": "object",
        "dynamic": "true"
      }
    }
  }
}

尝试插入content字段，会提示content字段不被允许

PUT /my_index/_doc/1
{
  "title": "my article",
  "content": "this is my article",
  "address": {
    "province": "guangdong",
    "city": "guangzhou"
  }
}

address字段则没有这个问题，因为设为dynamic，可以动态插入

PUT /my_index/my_type/1
{
  "title": "my article",
  "address": {
    "province": "guangdong",
    "city": "guangzhou"
  }
}

3.2 定制dynamic maping策略

（1）date_detection

默认会按照一定格式识别date，比如yyyy-MM-dd。但是如果某个field先过来一个2017-01-01的值，就会被自动dynamic mapping成date，后面如果再来一个"hello world"之类的值，就会报错。可以手动关闭某个index的date_detection，如果有需要，自己手动指定某个field为date类型。

PUT /my_index/_mapping
{
    "date_detection": false
}

（2）定制自己的dynamic mapping template（type level）(动态映射模板)

PUT my_index
{
  "mappings": {
    "dynamic_templates": [
      {
        "longs_as_strings": {
          "match_mapping_type": "string",
          "match":   "long_*",
          "unmatch": "*_text",
          "mapping": {
            "type": "long"
          }
        }
      }
    ]
  }
}

插入数据

PUT my_index/_doc/1
{
  "long_num": "5", 
  "long_text": "foo" 
}

long_num会转化成long

long_text会是默认的string

更多操作参见官方文档

Dynamic templates | Elasticsearch Reference [7.6] | Elastic https://www.elastic.co/guide/en/elasticsearch/reference/7.6/dynamic-templates.html

网友评论

ElasticSearch实战笔记

本文标题：16、修改以及定制分词器，root object简单说明， dy

本文链接：https://www.haomeiwen.com/subject/wkrnyhtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！