Elasticsearch 7.x 深入【11】重建索引

作者: 孙瑞锴 | 来源:发表于2020-05-31 20:18 被阅读0次

Elasticsearch 7.x 深入【11】重建索引
ElasticSearch为已存在的索引新增字段
Elasticsearch 7.x 深入【1】索引【一】原理
Elasticsearch 7.x 深入【1】索引【二】创建
Elasticsearch 7.x 深入【1】索引【三】 fie
ElasticSearch索引升级的小妙招
Elasticsearch 7.x 最详细安装及配置
Elasticsearch 7.x 深入【1】索引【四】常用属性
spring-boot 整合elasticsearch 7.x(
Elasticsearch 7.x 深入目录

1. 借鉴

极客时间阮一鸣老师的Elasticsearch核心技术与实战
 Update By Query API
elasticsearch之Document APIs【Delete By Query API】
官方文档 Index API
干货 | Elasticsearch Reindex性能提升10倍+实战
 [译文] Elasticsearch的任务管理api

2. 开始

数据准备：<Elasticsearch 7.x 深入数据准备>

场景

在何时需要重建索引呢？

看到本篇文章的时候 -_-
索引的mapping发生变更：字段类型属性变更，分词器或者词典需要变更
索引的settings发生变更：主分片数发生变更
数据迁移

方式

update_by_query
适用类型：
a) mapping上增加属性
reindex
适用类型
a) 主分片数发生变更
b) 修改mapping字段属性
c) 数据迁移

update_by_query

在当前索引上重建

场景

增加字段属性，字段需要被分词搜索聚合

DELETE ubq_index
# 建索引
PUT /ubq_index/
{
  "mappings": {
    "properties": {
      "username": {
        "type": "keyword"
      },
      "description": {
        "type": "text",
        "search_analyzer": "standard",
        "analyzer": "standard"
      }
    }
  }
}

# 加入一篇文档
POST /ubq_index/_doc/1
{
  "username": "孙瑞锴",
  "description": "一名喜欢程序的程序员"
}

# 修改mappings, 增加分词属性
PUT /ubq_index/_mapping
{
  "properties": {
    "description": {
      "type": "text",
      "fields": {
        "ik": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_smart"
        }
      }
    }
  }
}

# 索引新文档
POST /ubq_index/_doc/2
{
  "username": "Ga",
  "description": "卖女孩的小火柴"
}

# 搜索新文档，是能出来结果的
GET /ubq_index/_search
{
  "query": {
    "match": {
      "description.ik": "小火柴"
    }
  }
}

# 搜索老文档出不来结果
GET /ubq_index/_search
{
  "query": {
    "match": {
      "description.ik": "程序员"
    }
  }
}

此时我们咋弄呢？

方式1 重新索引所有文档

POST /ubq_index/_update_by_query
{
  
}

方式2 重新索引query匹配的文档

POST /ubq_index/_update_by_query
{
  "query": { 
    "term": {
      "username": "孙瑞锴"
    }
  }
}

补充

_update_by_query还可结合script修改字段的属性

POST /ubq_index/_update_by_query?conflicts=proceed
{
  "query": {
    "term": {
      "username": "孙瑞锴"
    }
  },
  "script": {
    "source": """
        Integer technology = ctx._source.technology; 
        if(technology != null)
        {
          ctx._source.technology++;
        } else 
        {
          ctx._source.technology = 1;
        }
    """,
    "lang": "painless"
  }
}

当然，自己瞎点可以设置conflicts=proceed，生产请使用分布式锁

reindex

在新索引上重建

注意

1. 索引的_source属性是开启的，默认是开启的
1. 必须先创建新的索引，才能使用

场景

需要重新设置mapping，或者重新设置字段的属性，而不是添加属性

DELETE ridx_hotel
DELETE ridx_hotel_v1

# 新建一个有关酒店的索引
PUT /ridx_hotel
{
  "mappings": {
    "properties": {
      "name": {
        "type":"text"
      }
    }
  }
}

# 加入一个文档
PUT /ridx_hotel/_doc/1
{
  "name": "瑜伽酒店"
}

# 我们尝试搜索”瑜伽“，发现没有文档，因为我们忘记指定中文分词器了，此时我们只能重建索引了
GET ridx_hotel/_search
{
  "query": {
    "term": {
      "name": "瑜伽"
    }
  }
}

# 我们重建了一个新索引，里面指定name的分词器
PUT /ridx_hotel_v1
{
  "mappings": {
    "properties": {
      "name": {
        "type":"text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

# 执行reindex api将数据进行迁移
# reindex
POST _reindex
{
  "source": { "index": "ridx_hotel" },
  "dest": { "index": "ridx_hotel_v1" }
}

# 再次查询”瑜伽“，可以被分词出来
GET ridx_hotel_v1/_search
{
  "query": {
    "term": {
      "name": "瑜伽"
    }
  }
}

补充

一般结合alias使用

DELETE ridx_hotel
DELETE ridx_hotel_v1

# 新建一个有关酒店的索引
PUT /ridx_hotel
{
  "mappings": {
    "properties": {
      "name": {
        "type":"text"
      }
    }
  }
}

# 配置别名
POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "ridx_hotel",
        "alias": "ridx_hotel_alias"
      }
    }
  ]
}

# 查看别名
GET /_alias

# 加入一个文档
PUT /ridx_hotel/_doc/1
{
  "name": "瑜伽酒店"
}

# 我们尝试通过别名，搜索”瑜伽“，发现没有文档，因为我们忘记指定中文分词器了，此时我们只能重建索引了
GET ridx_hotel_alias/_search
{
  "query": {
    "term": {
      "name": "瑜伽"
    }
  }
}

# 我们重建了一个新索引，里面指定name的分词器
PUT /ridx_hotel_v1
{
  "mappings": {
    "properties": {
      "name": {
        "type":"text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

# 执行reindex api将数据进行迁移
# reindex
POST _reindex
{
  "source": { "index": "ridx_hotel" },
  "dest": { "index": "ridx_hotel_v1" }
}

# 数据迁移完成后切换别名
POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "ridx_hotel_v1",
        "alias": "ridx_hotel_alias"
      }
    },
    {
      "remove": {
        "index": "ridx_hotel",
        "alias": "ridx_hotel_alias"
      }
    }
  ]
}

# 再次通过别名，查询”瑜伽“，可以被分词出来
GET ridx_hotel_alias/_search
{
  "query": {
    "term": {
      "name": "瑜伽"
    }
  }
}

op_type

如果目的索引里面已经有数据了，又不想覆盖，则可以使用op_type=create来指定，只创建不存在的文档。如果文档已经存在，会报版本冲突错误，其他文档还是会成功

POST _reindex
{
  "source": { "index": "ridx_hotel" },
  "dest": { "index": "ridx_hotel_v1", "op_type": "create" }
}

另外op_type的类型是：index和create

_source

有的时候我们只需要原索引中的部分字段，这是可以使用_source指定

POST _reindex
{
  "source": { "index": "ridx_hotel", "_source": ["name"]},
  "dest": { "index": "ridx_hotel_v1", "op_type": "create" }
}

script

有的时候我们需要对字段进行一些处理，一种方式可以使用pipeline，这个会在下一篇中讲解，另一种方式是在reindex时使用脚本

POST _reindex
{
  "source": {
    "index": "ridx_hotel"
  },
  "dest": {
    "index": "ridx_hotel_v1"
  },
  "script": {
    "source": "ctx._source.tags = ctx._source.remove(\"flags\")"
  }
}

跨集群reindex

需要在elasticsearch.yml中增加reindx.remote.whitelist:"192.169.0.1:9200,192.168.0.2:9200"
重启节点
重写reindex脚本

POST /_reindex
{
  "source": {
    "remote": {
      "host": "http://192.168.0.1:9200"
    },
    "index": "ridx_hotel",
    "size": 1000, // 指定批次数量大小
    "query": { // 可选
      "match": {
        "name": "酒店"
      }
    }
  },
  "dest": {
    "index": "ridx_hotel"
  }
}

异步

在url中指定wait_for_completion=false

POST _reindex?wait_for_completion=false
{
  "source": { "index": "ridx_hotel" },
  "dest": { "index": "ridx_hotel_v1", "op_type": "index" }
}

返回的taskId为

{
  "task" : "M4LyTpueT--40-oJaXKvfA:756303"
}

我们用这个taskId可查看任务状态

GET /_tasks/M4LyTpueT--40-oJaXKvfA:756303

3. 大功告成

网友评论

本文标题：Elasticsearch 7.x 深入【11】重建索引

本文链接：https://www.haomeiwen.com/subject/ooerzhtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Elasticsearch 7.x 深入【11】重建索引

1. 借鉴

2. 开始

场景

方式

update_by_query

场景

方式1 重新索引所有文档

方式2 重新索引query匹配的文档

补充

reindex

注意

场景

补充

一般结合alias使用

op_type

_source

script

跨集群reindex

异步

3. 大功告成

相关文章

Elasticsearch 7.x 深入【11】重建索引

ElasticSearch为已存在的索引新增字段

Elasticsearch 7.x 深入【1】索引【一】原理

Elasticsearch 7.x 深入【1】索引【二】创建

Elasticsearch 7.x 深入【1】索引【三】 fie

ElasticSearch索引升级的小妙招

Elasticsearch 7.x 最详细安装及配置

Elasticsearch 7.x 深入【1】索引【四】常用属性

spring-boot 整合elasticsearch 7.x(

Elasticsearch 7.x 深入目录

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读