From / Size
-
默认情况下,查询按照相关度算分排序,返回前 10 条记录
-
容易理解的分⻚⽅案
-
From:开始位置
-
Size:期望获取⽂档的总数
-
POST kibana_sample_data_ecommerce/_search
{
"from": 10,
"size": 20,
"query": {
"match_all": {}
}
}
分布式系统中深度分⻚的问题
image.png-
ES 天⽣就是分布式的。查询信息,但是数据分 别保存在多个分⽚,多台机器上,ES 天⽣就需 要满⾜排序的需要(按照相关性算分)
-
当⼀个查询: From = 990, Size =10
-
会在每个分⽚上先都获取 1000 个⽂档。然后, 通过 Coordinating Node 聚合所有结果。最后 再通过排序选取前 1000 个⽂档
-
⻚数越深,占⽤内存越多。为了避免深度分⻚带 来的内存开销。ES 有⼀个设定,默认限定到 10000 个⽂档
- Index.max_result_window
-
From / Size Demo
-
简单的 From / Size demo
-
From + Size 必须⼩与 10000
Search After 避免深度分⻚的问题
-
避免深度分⻚的性能问题,可以实时获取下⼀⻚⽂ 档信息
-
不⽀持指定⻚数(From)
-
只能往下翻
-
-
第⼀步搜索需要指定 sort,并且保证值是唯⼀的 (可以通过加⼊ _id 保证唯⼀性)
-
然后使⽤上⼀次,最后⼀个⽂档的 sort 值进⾏查询
POST users/_search
{
"size": 1,
"query": {
"match_all": {}
},
"search_after":
[
10,
"ZQ0vYGsBrR8X3IP75QqX"],
"sort": [
{"age": "desc"} ,
{"_id": "asc"}
]
}
Demo for Search After
-
避免深度分⻚的性能问题,可以实时获取下⼀⻚⽂档信息
-
不⽀持指定⻚数(From)
-
只能往下翻
Search After 是如何解决深度分⻚的问题
image.png-
假定 Size 是 10
-
当查询 990 – 1000
-
通过唯⼀排序值定位,将每次要处 理的⽂档数都控制在 10
Scroll API
-
创建⼀个快照,有新的数据写⼊以后,⽆ 法被查到
-
每次查询后,输⼊上⼀次的 Scroll Id
#Scroll API
DELETE users
POST users/_doc
{"name":"user1","age":10}
POST users/_doc
{"name":"user2","age":20}
POST users/_doc
{"name":"user3","age":30}
POST users/_doc
{"name":"user4","age":40}
POST /users/_search?scroll=5m
{
"size": 1,
"query": {
"match_all" : {
}
}
}
POST users/_doc
{"name":"user5","age":50}
POST /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAWAWbWdoQXR2d3ZUd2kzSThwVTh4bVE0QQ=="
}
Demo for Scroll API
-
插⼊ 4 条记录
-
调⽤ Scroll API
-
插⼊⼀条新的记录
-
发现只能查到 4 条数据
不同的搜索类型和使⽤场景
-
Regular
- 需要实时获取顶部的部分⽂档。例如查询最新的订单
-
Scroll
- 需要全部⽂档,例如导出全部数据
-
Pagination
-
From 和 Size
-
如果需要深度分⻚,则选⽤ Search After
-
本节知识点回顾
-
Elasticsearch 默认返回 10 个结果
-
为了获取更多的结果,提供 3 种⽅式解决分⻚与遍历
-
From / Size 的⽤法,深度分⻚所存在的问题
-
Search After 解决深度分⻚的问题
-
Scroll API,通过快照,遍历数据
-
课程Demo
POST tmdb/_search
{
"from": 10000,
"size": 1,
"query": {
"match_all": {
}
}
}
#Scroll API
DELETE users
POST users/_doc
{"name":"user1","age":10}
POST users/_doc
{"name":"user2","age":11}
POST users/_doc
{"name":"user2","age":12}
POST users/_doc
{"name":"user2","age":13}
POST users/_count
POST users/_search
{
"size": 1,
"query": {
"match_all": {}
},
"sort": [
{"age": "desc"} ,
{"_id": "asc"}
]
}
POST users/_search
{
"size": 1,
"query": {
"match_all": {}
},
"search_after":
[
10,
"ZQ0vYGsBrR8X3IP75QqX"],
"sort": [
{"age": "desc"} ,
{"_id": "asc"}
]
}
#Scroll API
DELETE users
POST users/_doc
{"name":"user1","age":10}
POST users/_doc
{"name":"user2","age":20}
POST users/_doc
{"name":"user3","age":30}
POST users/_doc
{"name":"user4","age":40}
POST /users/_search?scroll=5m
{
"size": 1,
"query": {
"match_all" : {
}
}
}
POST users/_doc
{"name":"user5","age":50}
POST /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAWAWbWdoQXR2d3ZUd2kzSThwVTh4bVE0QQ=="
}
网友评论