Skip to main content
Aggregation Pipeline

Aggregation Framework

The aggregation framework allows you to process data records and return computed results. It is similar to the GROUP BY clause in SQL but much more powerful.

The Pipeline

Aggregations work as a pipeline. Documents pass through a series of stages, where each stage transforms the documents.
db.orders.aggregate([
  { $match: { status: "A" } },
  { $group: { _id: "$cust_id", total: { $sum: "$amount" } } }
])

Common Stages

$match

Filters the documents (like find or SQL WHERE).
{ $match: { status: "active" } }

$group

Groups documents by a specified identifier key and applies the accumulator expression(s) to each group.
{
  $group: {
    _id: "$category", // Group by category
    count: { $sum: 1 }, // Count items per category
    avgPrice: { $avg: "$price" } // Calculate average price
  }
}

$project

Reshapes each document in the stream, such as by adding new fields or removing existing fields.
{ $project: { name: 1, total: 1, _id: 0 } }

$sort

Sorts all input documents and returns them to the pipeline in sorted order.
{ $sort: { total: -1 } } // Descending order

$limit & $skip

Used for pagination.
{ $skip: 10 },
{ $limit: 5 }

Example Pipeline

Calculate the total sales per product for the year 2023, sorted by highest sales.
db.sales.aggregate([
  // 1. Filter for 2023
  { $match: { date: { $gte: new Date('2023-01-01'), $lt: new Date('2024-01-01') } } },
  
  // 2. Group by product and sum amount
  { $group: { _id: "$product", totalSales: { $sum: "$amount" } } },
  
  // 3. Sort by totalSales descending
  { $sort: { totalSales: -1 } }
])

Summary

  • Aggregation Pipeline processes data in stages.
  • Use $match to filter early (for performance).
  • Use $group to calculate statistics (sum, avg, etc.).
  • Use $project to format the output.