美文网首页
四,聚合操作

四,聚合操作

作者: alexgu | 来源:发表于2021-01-14 04:45 被阅读0次

一,聚合操作

将数据经过计算,映射,汇总等操作后返回最终处理数据。
mongo有三种方式进行聚合操作:
1,Aggregation Pipeline
2,Map-Reduce
3,Single Purpose Aggregation Operations

先下载一份测试数据,https://gitee.com/geektime-geekbang/geektime-mongodb-course/raw/master/aggregation/dump.tar.gz
解压后执行mongorestore(新版mongo需要自行下载mongo tools),可以得到一个mock数据

image.png

看下order数据结构

db.orders.findOne({})
{
        "_id" : ObjectId("5dbe7a545368f69de2b4d36e"),
        "street" : "493 Hilll Curve",
        "city" : "Champlinberg",
        "state" : "Texas",
        "country" : "Malaysia",
        "zip" : "24344-1715",
        "phone" : "425.956.7743 x4621",
        "name" : "Destinee Schneider",
        "userId" : 3573,
        "orderDate" : ISODate("2019-03-26T03:20:08.805Z"),
        "status" : "created",
        "shippingFee" : NumberDecimal("8.00"),
        "orderLines" : [
                {
                        "product" : "Refined Fresh Tuna",
                        "sku" : "2057",
                        "qty" : 25,
                        "price" : NumberDecimal("56.00"),
                        "cost" : NumberDecimal("46.48")
                },
                {
                        "product" : "Refined Concrete Ball",
                        "sku" : "1738",
                        "qty" : 61,
                        "price" : NumberDecimal("47.00"),
                        "cost" : NumberDecimal("47")
                },
                {
                        "product" : "Rustic Granite Towels",
                        "sku" : "500",
                        "qty" : 62,
                        "price" : NumberDecimal("74.00"),
                        "cost" : NumberDecimal("62.16")
                },
                {
                        "product" : "Refined Rubber Salad",
                        "sku" : "1400",
                        "qty" : 73,
                        "price" : NumberDecimal("93.00"),
                        "cost" : NumberDecimal("87.42")
                },
                {
                        "product" : "Intelligent Wooden Towels",
                        "sku" : "5674",
                        "qty" : 72,
                        "price" : NumberDecimal("84.00"),
                        "cost" : NumberDecimal("68.88")
                },
                {
                        "product" : "Refined Steel Bacon",
                        "sku" : "5009",
                        "qty" : 8,
                        "price" : NumberDecimal("53.00"),
                        "cost" : NumberDecimal("50.35")
                }
        ],
        "total" : NumberDecimal("407")
}

二,Aggregation Pipeline

db.collection.aggregate( [ { <stage> }, ... ] )

aggregate需要传入stage的数组作为管道,数据从上一个stage流出,经过当前stage处理,然后传递给下一个stage进行处理,当所有stage处理完之后得到最终数据。

2.1$match操作

通过标准的mongo查询语句过滤文档流,例如:

//id匹配
db.orders.aggregate([{"$match":{"_id":ObjectId("5dbe7a545368f69de2b4d36e")}}])

//total>0的
db.orders.aggregate([{"$match":{"total":{"$gt":NumberDecimal("0")}}}])

//要么userId=3573,要么country为United States of America
db.orders.aggregate(
    {"$match":{"$or":[
        {"userId":3573},
        {"country":"United States of America"}
    ]}}
)

2.2$project操作

投影,用来决定文档字段是否要返回

//只返回userId和country
db.orders.aggregate(
    [
        {"$match":{"$or":[
            {"userId":3573},
            {"country":"United States of America"}
        ]}},
        {"$project":{"userId":1,"country":1,"_id":0}}
    ]
)

2.3$group操作

{
  $group:
    {
      _id: <expression>, // Group By Expression
      <field1>: { <accumulator1> : <expression1> },
      ...
    }
 }

group操作会根据_id字段的表达式来进行分组

//根据userId汇总,查看每个用户产生的订单数
db.orders.aggregate(
    [{
        "$group":{
            "_id":"$userId",
            "count":{$sum:1}
        }
    }]
)

2.4$limit, $sort,$skip

db.orders.aggregate(
    [
        {
            "$group":{
                "_id":"$userId",
                "count":{$sum:1}
            }
        },
        {"$sort":{"count":1}},
        {"$skip":15},
        {"$limit":10}
    ]
)

2.5$unwind

用于展开文档的数组,path的值代表要展开文档的哪个字段,includeArrayIndex的值代表处理之后的文档用includeArrayIndex的值作为字段保存数组位置,preserveNullAndEmptyArrays为true时,空数组不会填充数组字段

{
  $unwind:
    {
      path: <field path>,
      includeArrayIndex: <string>,
      preserveNullAndEmptyArrays: <boolean>
    }
}
汇总_id为5dbe7a545368f69de2b4d36e的订单总花费
db.orders.aggregate(
    [
        {"$match":{"_id":ObjectId("5dbe7a545368f69de2b4d36e")}},
        {"$unwind":{"path":"$orderLines"}},
        {"$group":{"_id":"_id","totalCost":{"$sum":"$orderLines.cost"},"userId":{"$first":"$userId"}}}
    ]
)

2.6$lookup

左外连接

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}
//往users表插入一条数据
db.users.insertOne({"userId":3573,"userName":"alex"})

//查出这个订单的总花费及user
db.orders.aggregate(
    [
        {"$match":{"_id":ObjectId("5dbe7a545368f69de2b4d36e")}},
        {"$unwind":{"path":"$orderLines"}},
        {"$group":{"_id":"_id","totalCost":{"$sum":"$orderLines.cost"},"userId":{"$first":"$userId"}}},
        {"$lookup":{
            "from":"users",
            "localField":"userId",
            "foreignField":"userId",
            "as":"user"
        }}
    ]
)

2.6$bucket

分桶

{
  $bucket: {
      groupBy: <expression>,
      boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
      default: <literal>,
      output: {
         <output1>: { <$accumulator expression> },
         ...
         <outputN>: { <$accumulator expression> }
      }
   }
}
//将3月,4月的订单分成2个通,并统计月份订单数
db.orders.aggregate(
    [
        {"$bucket":{
            "groupBy":"$orderDate",
            "boundaries":[ISODate("2019-03-01T00:00:00.000Z"),ISODate("2019-04-01T00:00:00.000Z"),
                          ISODate("2019-05-01T00:00:00.000Z")],
            "default":"other",
            "output":{
                "count":{"$sum":1}
            }
        }}
    ]
)

//返回
{ "_id" : "other", "count" : 79716 }
{ "_id" : ISODate("2019-03-01T00:00:00Z"), "count" : 10201 }
{ "_id" : ISODate("2019-04-01T00:00:00Z"), "count" : 10083 }

2.7 $facet

使用$facet,可以在一个聚合中使用多组聚合

{ $facet:
   {
      <outputField1>: [ <stage1>, <stage2>, ... ],
      <outputField2>: [ <stage1>, <stage2>, ... ],
      ...

   }
}
//userCount字段里面包含所有userId的订单数,orderCount计算订单总数
db.orders.aggregate(
    [{
        "$facet":{
            "userCount":[{"$group":{
                "_id":"$userId",
                "count":{"$sum":1}
            }}],
            "orderCount":[{"$group":{
                "_id":null,
                "count":{"$sum":1}
            }}]
        }
    }]
)

三,compass

compass是mongo官方出的mongodb管理工具,提供了可视化界面,可以直观的查看聚合每个stage的输出结果


image.png

还可以直接分析查询语句


image.png

相关文章

网友评论

      本文标题:四,聚合操作

      本文链接:https://www.haomeiwen.com/subject/ydbyaktx.html