MongoDB开发之聚合管道性能

作者: 五月笙 | 来源:发表于2021-01-06 11:08 被阅读0次

引子

处理大量数据时，聚合管道的性能会下降，我们需要理解管道新能为什么回下降，该如何进行改善。

关键点：

尽早在管道里减少文档的数量和大小
索引用于 $match和$ sort操作，可以加速查询
在管道使用 $match和$ sort之外的操作符不能使用索引
使用分片存储数据集合时， $match和$ project会在单独的片上执行。如果使用了其他操作符，其余的管道会在主要片上执行。

索引可以加宽大集合选择性查询和存储

聚合框架的explain()

聚合框架里使用explain()函数与find()查询里使用不同，哥特式如下：

db.collection.aggregate([**聚合条件**], { explain: true })

db.machineLog.aggregate([
    {
        $match: {
            categoryId: 2
        }
    },
    {
        $group: {
            _id: "$machineId",
            total: {
                $sum: "$price"
            }
        }
    },
    {
        $match: {
            total: {
                $gt: 800
            }
        }
    }
], {explain: true})

返回结果：

{
    "stages": [
        {
            "$cursor": {
                "query": {
                    "categoryId": 2
                },
                "fields": {
                    "machineId": NumberInt("1"),
                    "price": NumberInt("1"),
                    "_id": NumberInt("0")
                },
                "queryPlanner": {
                    "plannerVersion": NumberInt("1"),
                    "namespace": "machine.machineLog",
                    "indexFilterSet": false,
                    "parsedQuery": {
                        "categoryId": {
                            "$eq": 2
                        }
                    },
                    "winningPlan": {
                        "stage": "COLLSCAN",
                        "filter": {
                            "categoryId": {
                                "$eq": 2
                            }
                        },
                        "direction": "forward"
                    },
                    "rejectedPlans": [ ]
                }
            }
        },
        {
            "$group": {
                "_id": "$machineId",
                "total": {
                    "$sum": "$price"
                }
            }
        },
        {
            "$match": {
                "total": {
                    "$gt": 800
                }
            }
        }
    ],
    "ok": 1
}

对$match内的查询条件categroyId增加索引：

> db.machineLog.createIndex({"categoryId": 1})

重新执行explain()操作：

{
    "stages": [
        {
            "$cursor": {
                "query": {
                    "categoryId": 2
                },
                "fields": {
                    "machineId": NumberInt("1"),
                    "price": NumberInt("1"),
                    "_id": NumberInt("0")
                },
                "queryPlanner": {
                    "plannerVersion": NumberInt("1"),
                    "namespace": "machine.machineLog",
                    "indexFilterSet": false,
                    "parsedQuery": {
                        "categoryId": {
                            "$eq": 2
                        }
                    },
                    "winningPlan": {
                        "stage": "FETCH",
                        "inputStage": {
                            "stage": "IXSCAN",
                            "keyPattern": {
                                "categoryId": 1
                            },
                            "indexName": "categoryId_1",
                            "isMultiKey": false,
                            "multiKeyPaths": {
                                "categoryId": [ ]
                            },
                            "isUnique": false,
                            "isSparse": false,
                            "isPartial": false,
                            "indexVersion": NumberInt("2"),
                            "direction": "forward",
                            "indexBounds": {
                                "categoryId": [
                                    "[2.0, 2.0]"
                                ]
                            }
                        }
                    },
                    "rejectedPlans": [ ]
                }
            }
        },
        {
            "$group": {
                "_id": "$machineId",
                "total": {
                    "$sum": "$price"
                }
            }
        },
        {
            "$match": {
                "total": {
                    "$gt": 800
                }
            }
        }
    ],
    "ok": 1
}

通过不存在索引和存在索引查询的explain()返回结果对比，在winningPlan标签下可以看出索引的使用情况，虽然没有find().explain()详细，但是仍有关键信息：

winningPlan.inputStage.indexName: 应用索引名称
winningPlan.inputStage.isMultiKey: 是否为多元索引
winningPlan.inputStage.indexBounds: 索引扫描范围

allowDiskUser

管道查询超大型集合数据，可能会遇到如下错误：

assert: command failed: {
    "errmsg" : "exception: Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUser:true to opt in.",
    "code" : 16945,
    "ok" : 0
} : aggregate failed

原因：管道返回了超过MongoDB RAM内存限制的100MB数据。
解决：设置allowDiskUser:true

db.collection.aggregate([**聚合条件**], {allowDiskUser:true})

使用allowDiskUser会降低管道的新能，只推荐在需要的时候使用。

参考

MongoDB实战

网友评论

本文标题：MongoDB开发之聚合管道性能

本文链接：https://www.haomeiwen.com/subject/ftjpoktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

MongoDB开发之聚合管道性能

引子

聚合框架的explain()

allowDiskUser

参考

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

MongoDB开发之 聚合管道性能

引子

聚合框架的explain()

allowDiskUser

参考

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

MongoDB开发之聚合管道性能