Elasticsearch第16节 scroll滚动、动态映射、

作者: 小超_8b2f | 来源:发表于2019-06-19 09:57 被阅读0次

Elasticsearch第16节 scroll滚动、动态映射、
elasticsearch 的滚动（scroll）
Elasticsearch 更改已有字段的数据类型,清洗数据
elasticsearch 动态映射
Elasticsearch动态映射
scroll和wheel事件
使用Scrollview和LinearLayout动态添加布局
77_elasticsearch高手进阶_使用动态映射模板定制自
微信小程序组件scroll-view
返回页面位置跳转到上次浏览位置

一、基于scroll技术滚动搜索大量数据

如果一次性要查出来比如10万条数据，那么性能会很差，此时一般会采取用scoll滚动查询，一批一批的查直到所有查询完为止 .

scroll搜索会在笫一次搜索的时候，保存一个当时的视图快照，之后只会基于该旧的视图快照提供数据捜索，如果这个期间数据变更，是不会让用户看到的
采用基于_doc(不使用_score)进行排序的方式，性能较高
每次发送scroll请求，我们还需要指定一个scoll参数，指定一个时间窗口，每次搜索请求只要在这个时间窗口内能完成就可以了。

//
#查询第一页数据，产生快照
GET /test1/_search?scroll=1m   #时间窗口：在1分钟完成
{
  "size":2,
  "query":{
    "match_all": {}
  },
  "sort": ["_doc"]
}
#结果
{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACDBMWU2dnYWl6VDdSTk9uYU1ESFhJZV9HUQ==",
  "took" : 1,
  "timed_out" : false,
  ....
}

#按照上一次产生的结果的id来继续查询，查询条件从快照中带进来，所以不用再写了
#每次查询都会产生一个新的_scroll_id，每次查询都会基于上次的_scroll_id
GET /_search/scroll
{
  "scroll":"1m",  #1分钟
  "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACDBMWU2dnYWl6VDdSTk9uYU1ESFhJZV9HUQ=="
}

二、dynamic mapping策略

dynamic:

true : 遇到陌生字段就 dynamic mapping
false: 遇到陌生字段就忽略
strict : 遇到陌生字段就报错。所见即所得

DELETE /test1
#设置mapping
PUT /test1
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 0
  },
  "mappings": {
      "dynamic":"strict",  #所定义即所得，不可动态扩展字段
      "properties":{
        "name":{"type":"text"},
        "address":{"type":"object","dynamic":true}    
      }
  }
}

PUT /test1
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 0
  },
  "mappings": {
      "dynamic":"strict",   #所定义即所得，不可动态扩展字段
      "date_detection": false, #默认情况下日期格式的数据会自动被存成日期类型
      "properties":{
        "name":{"type":"text"},
        "address":{
          "type":"object",
          "dynamic":true    #是动态映射
        }
      }
  }
}
//
#设置数据
PUT /test1/_doc/1
{
  "name":"address",
  ”age“:10,  #添加一个mapping中没有定义的字段会报错，dynamic：strict
  "address":{
    "jie":"dajie",
    "hello":"world",
    "dao":"dadao"  #此处随意添加字段，因为是dynamic：true
  }
}
GET /t
est1/_search


DELETE /test2

PUT /test2
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 0
  },
  "mappings": {
      "dynamic_templates" : [
        {
        "en":{
          "match":"*_en",
          "match_mapping_type":"string",
          "mapping":{
            "type":"text",
            "analyzer":"english"
          }
        }
      }
    ]
  }
}
#使用了模板，使用英文分词器
PUT /test2/_doc/1
{
  "test_en":"helo world,this is my dog"
}
#未使用模板（字段非_en结尾），使用默认的standard分词器
PUT /test2/_doc/2
{
  "test":"helo world,this is my cat"
}
//
GET /test2/_mapping
#英文分词器中不识别is of 之类的词，视为结束词
#搜索不到结果，证明根据字段名自动匹配到了动态模板
GET /test2/_search
{
  "query":{
    "match": {
      "test_en": "is"    #英文分词器查不到is，但能查到dog
    }
  }
}

#中国文分词器中识别is 
GET /test2/_search
{
  "query":{
    "match": {
      "test": "is"    #中文分词器能查到is
    }
  }
}

三、重建索引

一个field的设置是不能修改的，如果要修改一个field,那么应该重新按照新的mapping建立一个index，然后将数据批量查询出来。重新用bulk api写入到 index 中。
批量查询的时候，建议采用scroll api,并且采用多线程并发的方式来reindex数据，每次scroll就查询指定曰期的一段数据，交给一个线程即可。

PUT /index1/_doc/4
{"content":"1990-12-12"}
GET /index1/_search
GET /index1/_mapping

#修改content的类型为string类型报错，不允许修改
PUT /index1/_doc/4
{"content":"I am happy"}

#修改content类型为string，报错，已经是date了，不允许你修改
PUT /index1
{
  "mappings": {
    "properties": {
    "content":{
      "type":"text"
    }
  }
  }
}

创建一个新的索引，把index索引中的数据查询出来导入到新的索引中。但是应用程序使用的是之前的索引，为了不用重启应用程序，给index这个索引起个别名。

//
#为原索引创建别名
PUT /index1/_alias/index2 
#查看content类型：date
GET /index2/_mapping

#创建新的索引，把content的类型改为字符串
PUT /newindex
{
  "mappings": {
    "properties": {
      "content":{
        "type":"text"
      }
    }
  }
}
#使用scroll批量查询数据，
GET /index1/_search?scroll=1m
{
  "size":2,
  "query":{
    "match_all": {}
  },
  "sort": ["_doc"]
}
#再使用bulk批量添加数据
POST /_bulk
{"index":{"_index":"newindex","_id":1}}
{"content":"hello world"}

#把新的索引和别名进行关联，断掉旧索引和别名间的关联
POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "newindex",
        "alias": "index2"
      }
    },
    {
      "remove": {
        "index": "index1",
        "alias": "index2"
      }
    }
  ]
}

#查看content类型：text
GET /index2/_mapping

1. 倒排索引包括

文档的列表、文档数量、词条在每个文档中出现的次数、出现的位置、每个稳定的长度、所有文档的平均长度

2. 索引不变的原因

不需要锁，提升了并发性能
可以一直保存在缓存中（filter）
节省CPU和IO开销

网友评论

ElasticSearch

本文标题：Elasticsearch第16节 scroll滚动、动态映射、

本文链接：https://www.haomeiwen.com/subject/torqqctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！