MongoDB 快速入门实战教程基础篇三：执行计划与索引

作者: you的日常 | 来源:发表于2022-01-07 13:52 被阅读0次

前一部分的文章：
MongoDB 快速入门实战教程基础篇一：文档的 CR操作
 MongoDB 快速入门实战教程基础篇一：文档的 UD操作
 MongoDB 快速入门实战教程基础篇二: 流式聚合操作

基础篇三执行计划与索引

在前面的几篇中，我们学习了 MongoDB 常用的文档 CURD 操作，并了解了流式聚合的相关知识。要注意的是，如果查询语句使用不当，会降低 MongoDB 的检索效率。反之，如果查询语句设计得当，就能够有效提升检索效率。那么我们如何确定什么语句是“得当”，什么语句又“不得当”呢？

我们将在本篇了解查询语句的优劣，学习如何查看查询语句的执行计划，并学习索引相关的知识。这些知识能避免我们写出“不得当”的查询语句，设计出合理的查询方案。

执行计划

执行计划是对一次查询在数据库中的执行过程或访问路径的描述。我们可以通过这个描述来判断本次查询的效率，并根据实际情况进行调整，进而提升检索效率。

MongoDB 提供了几种方法用于返回执行计划和执行计划统计信息，它们是：

db.collection.explain() 方法；
cursor.explain() 方法；
explain 命令；

本篇我们讨论的是 cursor.explain() 方法，以下简称 explain()，其语法如下：

db.collection.find().explain(<verbose>)

其中，<verbose> 参数代表执行计划的输出模式，该模式将会影响 explain() 的行为以及返回的信息量。<verbose> 的可选参数为：queryPlanner、executionStats 和 allPlansExecution，它们的作用如下：

模式名称	描述
`queryPlanner`	执行计划的详细信息，包括查询计划、集合信息、查询条件、最佳执行计划、查询方式和 MongoDB 服务信息等。
`executionStats`	最佳执行计划的执行情况和被拒绝的计划等信息。
`allPlansExecution`	选择并执行最佳执行计划，并返回最佳执行计划和其他执行计划的执行情况。

每个模式返回的信息均不相同，queryPlanner 模式返回的信息格式如下：

"queryPlanner" : {
   "plannerVersion" : <int>,
   "namespace" : <string>,
   "indexFilterSet" : <boolean>,
   "parsedQuery" : {
      ...
   },
   "winningPlan" : {
      "stage" : <STAGE1>,
      ...
      "inputStage" : {
         "stage" : <STAGE2>,
         ...
         "inputStage" : {
            ...
         }
      }
   },
   "rejectedPlans" : [
      <candidate plan 1>,
      ...
   ]
},
"serverInfo" : {
   "host" : <string>,
   "port" : <int>,
   "version" : <string>,
   "gitVersion" : <string>
}

stage 代表查询方式，各查询方式含义如下例如：

COLLSCAN 全文检索；
IXSCAN 按索引检索；
FETCH 检索文档；
SHARD_MERGE 合并分片的结果；
SHARDING_FILTER 从分片中过滤掉孤立文档；

我们注意到，queryPlanner 模式的返回信息中包含了很多字段，例如 plannerVersion、 namespace、winningPlan、rejetedPlans 和 serverInfo 等。字段及对应的描述如下：

字段名称	描述
plannerVersion	执行计划的版本
namespace	要查询的集合
indexFilterSet	是否使用索引
parsedQuery	查询条件，此处为x=1
winningPlan	最佳执行计划
stage	查询方式
filter	过滤条件
direction	搜索方向
rejectedPlans	拒绝的执行计划
serverInfo	MongoDB服务器信息

executionStats 模式返回的信息格式如下：

"queryPlanner" : {
   "plannerVersion" : <int>,
   "parsedQuery" : {
    ...
   },
   "winningPlan" : {
      "stage" : <STAGE1>,
      ...
   },
   "rejectedPlans" : []
   },
"executionStats" : {
   "executionSuccess" : <boolean>,
   "nReturned" : <int>,
   "executionTimeMillis" : <int>,
   "totalKeysExamined" : <int>,
   "totalDocsExamined" : <int>,
   "executionStages" : {
      "stage" : <STAGE1>
      "nReturned" : <int>,
      "executionTimeMillisEstimate" : <int>,
      "works" : <int>,
      "advanced" : <int>,
      "needTime" : <int>,
      "needYield" : <int>,
      "saveState" : <int>,
      "restoreState" : <int>,
      "isEOF" : <boolean>,
      ...
        }
    },
  "serverInfo" : {
    "host" : <string>,
    "port" : <int>,
    "version" : <string>,
    "gitVersion" : <string>
  }

executionStats 模式的返回信息中包含了 queryPlanner 模式的所有字段，并且还包含了最佳执行计划的执行情况，涉及的字段如 executionSuccess、 totalDocsExamined、advanced 和 works 等。字段具体描述如下：

字段名称	描述
executionSuccess	是否执行成功
nReturned	返回的结果
executionTimeMillis	查询计划的选择和执行所耗费的时间
totalKeysExamined	索引扫描次数
totalDocsExamined	文档扫描次数
executionStages	这个分类下描述执行的状态
isEOF	是否到达 steam 结尾，`1` 或者 `true` 代表已到达结尾
executionTimeMillisEstimate	预估耗时
works	工作单元数，一个查询会分解成小的工作单元
advanced	优先返回的结果数
docsExamined	文档检查数

allPlansExecution 模式返回的信息包含 executionStats 模式的内容，且包含 "allPlansExecution" : [ ] 块，也就是最佳执行计划和被拒绝计划的部分执行信息。"allPlansExecution" : [ ] 块信息格式如下：

"allPlansExecution" : [
      {
         "nReturned" : <int>,
         "executionTimeMillisEstimate" : <int>,
         "totalKeysExamined" : <int>,
         "totalDocsExamined" :<int>,
         "executionStages" : {
            "stage" : <STAGEA>,
            "nReturned" : <int>,
            "executionTimeMillisEstimate" : <int>,
            ...
            }
         }
      },
      ...
   ]

在了解了 explain 的语法和三种模式后，我们就可以开始实际练习了。假设要了解查询集合 inven 的查询语句 db.inven.find({number: {$gt: 6}}) 的查询计划，对应示例如下：

> db.inven.find({number: {$gt: 6}}).explain()
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "test.inven",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "number" : {
                "$gt" : 6
            }
        },
        "winningPlan" : {
            "stage" : "COLLSCAN",
            "filter" : {
                "number" : {
                    "$gt" : 6
                }
            },
            "direction" : "forward"
        },
        "rejectedPlans" : [ ]
    },
    "serverInfo" : {
        "host" : "asyncdeMBP",
        "port" : 27017,
        "version" : "4.0.10",
        "gitVersion" : "c389e7f69f637f7a1ac3cc9fae843b635f20b766"
    },
    "ok" : 1
}

以上就是本次操作的返回信息。要注意的是，在未传入正确参数的情况下，默认模式为 queryPlanner。从返回信息中，我们得知：

本次查询的集合为 test.inven；
最佳执行计划的 stage 为 COLLSCAN；
查询时所用的过滤条件为 number: {$gt: 6}，即球衣号大于 6；
没有被拒绝的执行计划；
MongoDB 版本为 4.0.10，端口号为 27017。

假如我们将查询模式改为 executionStats，那么我们将会得到如下信息：

> db.inven.find({number: {$gt: 6}}).explain("executionStats")
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "test.inven",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "number" : {
                "$gt" : 6
            }
        },
        "winningPlan" : {
            "stage" : "COLLSCAN",
            "filter" : {
                "number" : {
                    "$gt" : 6
                }
            },
            "direction" : "forward"
        },
        "rejectedPlans" : [ ]
    },
    "executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 3,
        "executionTimeMillis" : 0,
        "totalKeysExamined" : 0,
        "totalDocsExamined" : 5,
        "executionStages" : {
            "stage" : "COLLSCAN",
            "filter" : {
                "number" : {
                    "$gt" : 6
                }
            },
            "nReturned" : 3,
            "executionTimeMillisEstimate" : 0,
            "works" : 7,
            "advanced" : 3,
            "needTime" : 3,
            "needYield" : 0,
            "saveState" : 0,
            "restoreState" : 0,
            "isEOF" : 1,
            "invalidates" : 0,
            "direction" : "forward",
            "docsExamined" : 5
        }
    },
    "serverInfo" : {
        "host" : "asyncdeMBP",
        "port" : 27017,
        "version" : "4.0.10",
        "gitVersion" : "c389e7f69f637f7a1ac3cc9fae843b635f20b766"
    },
    "ok" : 1
}

由于 executionStats 模式的返回信息中包含了 queryPlanner 模式的返回内容，所以我们可以得到与执行默认模式相同的结果。除此之外，我们还可以看到最佳执行计划的详细情况：

此次查询共遍历 5 份文档;
只有 3 份文档符合过滤要求；
本次查询操作的执行时间小于 1 毫秒；
本次查询操作已遍历整个集合，没有被 limit 等语句限制。

假如我们在命令中使用了 limit，例如 db.inven.find({number: {$gt: 6}}).limit(2).explain("executionStats")，那么返回信息中就会包含 limitAmount 字段，并且nReturned 字段和 advanced 字段对应的值也会发生相应变化。

索引

索引支持 MongoDB 中查询的高效执行。如果没有索引，MongoDB 必须执行全文检索，即扫描集合中的每个文档，以选择与查询语句匹配的文档。如果查询存在适当的索引，MongoDB 可以使用索引来限制它必须检查的文档数。

MongoDB 的索引是特殊的数据结构，这种结构叫做 B-tree，它以易于遍历的形式存储集合数据集的一小部分。索引存储特定字段或字段集的值，按字段的值排序。索引条目的排序支持有效的等式匹配和基于范围的查询操作。此外，MongoDB 可以使用索引中的顺序返回排序结果。

下图描述了使用索引选择和排序匹配文档的查询过程：

在这里插入图片描述

上图表示索引建立在集合 user 的 score 字段上，索引中记录着包含 score 字段的文档的位置。当发起查询操作时，MongoDB 会先从索引中检索，快速定位包含 score 字段的文档的位置，而不是扫描整个集合。

MongoDB 中的索引与其他数据库系统中的索引类似，它允许在集合级别定义索引，并支持文档的任何字段。

MongoDB 提供了非常多的索引类型来支持特定类型的数据和查询，例如单字段索引、复合索引、多键索引、文字索引、2d 索引、散列索引和稀疏索引等。接下来，我们将学习常用的单字段索引、复合索引和多键索引。

单字段索引

创建索引的语法格式如下：

db.collection.createIndex( <key and index type specification>, <options> )

假设要为集合 inven 的 number 字段创建单字段索引，对应示例如下：

> db.inven.createIndex({number: 1})
{
    "createdCollectionAutomatically" : false,
    "numIndexesBefore" : 1,
    "numIndexesAfter" : 2,
    "ok" : 1
}

结果文档中的 numIndexesBefore 表示本次索引创建前的索引数量，而 numIndexesAfter 代表本次索引创建后的索引数量。在本次操作之前，我们从未为集合 inven 创建过索引，那么 numIndexesBefore 的值为什么是 1 呢？实际上 mongoDB 为每个集合创建了默认的索引，默认索引的字段为 _id，所以本次操作后，numIndexesAfter 的值为 2。

我们可以通过对比索引建立前后的执行计划来了解索引对查询效率的影响，查看执行计划的命令如下：

> db.inven.find({number: {$gt: 6}}).explain()
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "test.inven",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "number" : {
                "$gt" : 6
            }
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "number" : 1
                },
                "indexName" : "number_1",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "number" : [ ]
                },
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 2,
                "direction" : "forward",
                "indexBounds" : {
                    "number" : [
                        "(6.0, inf.0]"
                    ]
                }
            }
        },
        "rejectedPlans" : [ ]
    },
    "serverInfo" : {
        "host" : "asyncdeMacBook-Pro.local",
        "port" : 27017,
        "version" : "4.0.10",
        "gitVersion" : "c389e7f69f637f7a1ac3cc9fae843b635f20b766"
    },
    "ok" : 1
}

将本次执行计划与索引创建前的执行计划进行对比，可以发现 stage 发生了变化：

建立索引前：stage 为 COLLSCAN，即扫描整个集合。
建立索引后：stage 为 IXSCAN，即按索引检索文档。

相对于整个扫描整个集合来说，按索引检索文档的速度显然更快。

除了为文档指定的字段创建索引之外，我们还可以为内嵌文档的字段建立索引。假设文档结构如下：

MongoDB 快速入门实战教程基础篇三：执行计划与索引

基础篇三执行计划与索引

执行计划

索引

单字段索引

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

MongoDB 快速入门实战教程基础篇 三：执行计划与索引

基础篇 三 执行计划与索引

执行计划

索引

单字段索引

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

MongoDB 快速入门实战教程基础篇三：执行计划与索引

基础篇三执行计划与索引