Data Analytics Bootcamp
  • Syllabus
  • Statistical Thinking
  • SQL
  • Python
  • Tableau
  • Lab
  • Capstone
  1. SQL
  2. SQL
  3. Session 12: DA with SQL | Advanced Aggregations
  • Syllabus
  • Statistical Thinking
    • Statistics
      • Statistics Session 01: Data Layers and Bias in Data
      • Statistics Session 02: Data Types
      • Statistics Session 03: Probabilistic Distributions
      • Statistics Session 04: Probabilistic Distributions
      • Statistics Session 05: Sampling
      • Statistics Session 06: Inferential Statistics
      • Slides
        • Course Intro
        • Descriptive Stats
        • Data Types
        • Continuous Distributions
        • Discrete Distributions
        • Sampling
        • Hypothesis Testing
  • SQL
    • SQL
      • Session 01: Intro to Relational Databases
      • Session 02: Intro to PostgreSQL
      • Session 03: DA with SQL | Data Types & Constraints
      • Session 04: DA with SQL | Filtering
      • Session 05: DA with SQL | Numeric Functions
      • Session 06: DA with SQL | String Functions
      • Session 07: DA with SQL | Date Functions
      • Session 08: DA with SQL | JOINs
      • Session 09: DA with SQL | Advanced SQL
      • Session 10: DA with SQL | Advanced SQL Functions
      • Session 11: DA with SQL | UDFs, Stored Procedures
      • Session 12: DA with SQL | Advanced Aggregations
      • Session 13: DA with SQL | Final Project
      • Slides
        • Intro to Relational Databases
        • Intro to PostgreSQL
        • Basic Queries: DDL DLM
        • Filtering
        • Numeric Functions
        • String Functions
        • Date Functions
        • Normalization and JOINs
        • Temporary Tables
        • Advanced SQL Functions
        • Reporting and Analysis with SQL
        • Advanced Aggregations
  • Python
    • Python
      • Session 01: Programming for Data Analysts
      • Session 02: Python basic Syntax, Data Structures
      • Session 03: Introduction to Pandas
      • Session 04: Advanced Pandas
      • Session 05: Intro to Data Visualization
      • Session 06: Data Visualization
      • Session 07: Working with Dates
      • Session 08: Data Visualization | Plotly
      • Session 09: Customer Segmentation | RFM
      • Slides
        • Data Analyst
  • Tableau
    • Tableau
      • Tableau Session 01: Introduction to Tableau
      • Tableau Session 02: Intermediate Visual Analytics
      • Tableau Session 03: Advanced Analytics
      • Tableau Session 04: Dashboard Design & Performance
      • Slides
        • Data Analyst
        • Data Analyst
        • Data Analyst
        • Data Analyst

On this page

  • Session Goals
  • Analytical Base
  • UNION
    • UNION
    • UNION ALL
    • Problem 1
    • Solution 1
    • Problems with UNION Based Solution
    • Problem 2
    • Solution 2
    • Try Yourself
  • Beyond Simple GROUP BY
  • ROLLUP
    • When ROLLUP Makes Sense
    • Problem 1: Re-solving with ROLLUP
    • ROLLUP Solution
    • What ROLLUP Produces
    • Comparing UNION vs ROLLUP
    • Interpreting NULLs in ROLLUP Results
    • Limitation of ROLLUP
  • CUBE
    • When CUBE Makes Sense
    • From Hierarchy to Cross-Dimensions
    • Problem 2: Re-solving a New Analytical Question
    • CUBE Solution
    • What CUBE Produces
    • Comparing ROLLUP and CUBE
    • Practical Limitation of CUBE
  • GROUPING SETS
    • Why GROUPING SETS Exists
    • Re-solving Problem 2 with GROUPING SETS
    • GROUPING SETS Solution
    • What GROUPING SETS Produces
    • GROUPING SETS vs UNION ALL
    • Handling NULLs with GROUPING()
    • When to Use GROUPING SETS
  • ROLLUP vs CUBE vs GROUPING SET
  1. SQL
  2. SQL
  3. Session 12: DA with SQL | Advanced Aggregations

Session 12: DA with SQL | Advanced Aggregations

UNION
ROLLUP
CUBE
GROUPING SETS

Session Goals

This final session demonstrates why advanced aggregations exist by first solving an analytical problem using classic GROUP BY + UNION ALL, and then solving the same problem using advanced aggregation operators.

  • Understand the limitations of GROUP BY
  • Solve a real analytical problem using UNION ALL
  • Replace the solution with ROLLUP and GROUPING SETS
  • Compare clarity, correctness, and scalability

Analytical Base

In real analytical systems, aggregations are rarely performed on a single flat table.
Instead, they are computed from joined fact and dimension tables.

In this session, we use the following analytical structure:

  • Geography hierarchy: Country → Region → City
  • Customers tied to cities
  • Orders and order items representing transactions
  • Products providing category and price information

The analytical grain is order item.

All examples use the following analytical materialized view.

CREATE MATERIALIZED VIEW analytics.mv_order_revenue AS
SELECT
    c.country_name,
    r.region_name,
    ci.city_name,
    p.category,
    oi.quantity * p.price AS revenue
FROM analytics.order_items oi
JOIN analytics.orders o
    ON oi.order_id = o.order_id
JOIN analytics.products p
    ON oi.product_id = p.product_id
JOIN analytics.customers cu
    ON o.customer_id = cu.customer_id
JOIN analytics.cities ci
    ON cu.city_id = ci.city_id
JOIN analytics.regions r
    ON ci.region_id = r.region_id
JOIN analytics.countries c
    ON r.country_id = c.country_id;

UNION

UNION is a set operator used to combine the results of multiple SELECT statements into a single result set.

  • Result sets are stacked vertically
  • Each SELECT must return the same number of columns
  • Corresponding columns must have compatible data types

Types of UNION:

  • UNION
  • UNION ALL
Operator Description
UNION Combines results and removes duplicates
UNION ALL Combines results and keeps all rows

Suppose we have the following tables (A and B)

Table A

id value
1 A
2 B
3 C

Table B

id value
3 C
4 D
5 E

UNION

SELECT id, value FROM table_a
UNION
SELECT id, value FROM table_b;

\[\Downarrow\]

id value
1 A
2 B
3 C
4 D
5 E

UNION ALL

SELECT id, value FROM table_a
UNION ALL
SELECT id, value FROM table_b;

\[\Downarrow\]

id value
1 A
2 B
3 C
3 C
4 D
5 E
Important

For analytical queries, UNION ALL is preferred due to performance and correctness.

Problem 1

We are assigend to provide:

  • Revenue by city
  • Revenue by region
  • Revenue by country
  • A grand total
Important

All results must appear in one result set.

Solution 1

The traditional way to solve the problem without advanced aggregations.

SELECT
    country_name,
    region_name,
    city_name,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY
    country_name,
    region_name,
    city_name

UNION ALL

SELECT
    country_name,
    region_name,
    NULL AS city_name,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY
    country_name,
    region_name

UNION ALL

SELECT
    country_name,
    NULL AS region_name,
    NULL AS city_name,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY
    country_name

UNION ALL

SELECT
    NULL AS country_name,
    NULL AS region_name,
    NULL AS city_name,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue;

Output Table

country_name region_name city_name total_revenue
Armenia Lori Vanadzor 272.76
Armenia Shirak Gyumri 530.66
Armenia Yerevan Yerevan 1645.10
Georgia Adjara Batumi 525.73
Georgia Tbilisi Tbilisi 461.67
Georgia Tbilisi NULL 461.67
Armenia Shirak NULL 530.66
Armenia Lori NULL 272.76
Armenia Yerevan NULL 1645.10
Georgia Adjara NULL 525.73
Armenia NULL NULL 2448.59
Georgia NULL NULL 987.40
NULL NULL NULL 3435.99

Problems with UNION Based Solution

  • Multiple queries scan the same data repeatedly
  • High risk of inconsistency when logic changes
  • Hard to maintain and extend
  • Difficult to interpret NULL values

Problem 2

Let’s expand the problem with the new New requirement

  • City-level revenue
  • Region-level revenue
  • Product category totals
  • Grand total

Solution 2

SELECT
    country_name,
    region_name,
    city_name,
    category,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY
    country_name,
    region_name,
    city_name,
    category

UNION ALL

SELECT
    country_name,
    region_name,
    NULL,
    category,
    SUM(revenue)
FROM analytics.mv_order_revenue
GROUP BY
    country_name,
    region_name,
    category

UNION ALL

SELECT
    NULL,
    NULL,
    NULL,
    category,
    SUM(revenue)
FROM analytics.mv_order_revenue
GROUP BY
    category

UNION ALL

SELECT
    NULL,
    NULL,
    NULL,
    NULL,
    SUM(revenue)
FROM analytics.mv_order_revenue;

Output Table

country_name region_name city_name category total_revenue
Georgia Adjara Batumi Internet 209.93
Georgia Adjara Batumi Cloud 20.93
Georgia Tbilisi Tbilisi TV 71.94
Armenia Yerevan Yerevan Internet 239.92
Armenia Yerevan Yerevan TV 571.69
Armenia Shirak Gyumri Entertainment 23.97
Armenia Lori Vanadzor TV 35.97
Georgia Adjara Batumi Mobile 174.93
Armenia Lori Vanadzor Entertainment 71.91
Armenia Shirak Gyumri Cloud 41.86
Georgia Tbilisi Tbilisi Internet 284.89
Armenia Shirak Gyumri Internet 464.83
Georgia Tbilisi Tbilisi Mobile 59.94
Armenia Lori Vanadzor Mobile 39.97
Georgia Adjara NULL Internet 209.93
Georgia Adjara NULL Cloud 20.93
Georgia Adjara NULL Mobile 174.93
Georgia Tbilisi NULL TV 71.94
Georgia Tbilisi NULL Internet 284.89
Georgia Tbilisi NULL Mobile 59.94
Armenia Yerevan NULL Internet 239.92
Armenia Yerevan NULL TV 571.69
Armenia Shirak NULL Entertainment 23.97
Armenia Shirak NULL Cloud 41.86
Armenia Shirak NULL Internet 464.83
Armenia Lori NULL TV 35.97
Armenia Lori NULL Entertainment 71.91
Armenia Lori NULL Mobile 39.97
NULL NULL NULL Internet 1199.57
NULL NULL NULL TV 679.60
NULL NULL NULL Mobile 274.84
NULL NULL NULL Cloud 62.79
NULL NULL NULL Entertainment 95.88
NULL NULL NULL NULL 2312.68

Try Yourself

Create new Column which will handle all the combinations:

  • Country-level revenue
  • Region-level revenue
  • City-level revenue
  • Grand total
SELECT
    country_name,
    region_name,
    city_name,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY country_name, region_name, city_name

UNION ALL

SELECT
    country_name,
    region_name,
    NULL,
    SUM(revenue)
FROM analytics.mv_order_revenue
GROUP BY country_name, region_name

UNION ALL

SELECT
    country_name,
    NULL,
    NULL,
    SUM(revenue)
FROM analytics.mv_order_revenue
GROUP BY country_name

UNION ALL

SELECT
    NULL,
    NULL,
    NULL,
    SUM(revenue)
FROM analytics.mv_order_revenue;

Beyond Simple GROUP BY

Traditional GROUP BY generates one aggregation level at a time.

  • Multiple aggregation levels require multiple queries
  • UNION-based solutions are verbose and error-prone
  • Analytical reporting often needs subtotals and totals together

Advanced aggregations solve this by computing multiple grouping levels in a single query.

ROLLUP

ROLLUP is an advanced aggregation operator designed to generate hierarchical subtotals automatically.

Instead of manually writing multiple GROUP BY queries and stitching them together with UNION ALL, ROLLUP expresses the hierarchy directly in SQL.

ROLLUP extends GROUP BY by computing progressively higher-level aggregates along a defined column order.

\[ ROLLUP(A, B, C) \Rightarrow (A,B,C), (A,B), (A), () \]

  • Columns are aggregated from right to left
  • Each step removes one level of detail
  • A final row represents the grand total

When ROLLUP Makes Sense

ROLLUP is best suited when:

  • Dimensions form a natural hierarchy
  • Reporting requires subtotals at each level
  • The hierarchy order is fixed and meaningful

Typical examples include:

  • Geography: Country → Region → City
  • Time: Year → Quarter → Month

Problem 1: Re-solving with ROLLUP

Recall Problem 1:

  • Revenue by city
  • Revenue by region
  • Revenue by country
  • A grand total
  • One result set

This hierarchy exactly matches our geography structure.

ROLLUP Solution

SELECT
    country_name,
    region_name,
    city_name,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY ROLLUP (
    country_name,
    region_name,
    city_name
)
ORDER BY
    country_name,
    region_name,
    city_name;

Output Table

country_name region_name city_name total_revenue
Armenia Lori Vanadzor 272.76
Armenia Lori NULL 272.76
Armenia Shirak Gyumri 530.66
Armenia Shirak NULL 530.66
Armenia Yerevan Yerevan 1645.17
Armenia Yerevan NULL 1645.17
Armenia NULL NULL 2448.59
Georgia Adjara Batumi 525.73
Georgia Adjara NULL 525.73
Georgia Tbilisi Tbilisi 461.67
Georgia Tbilisi NULL 461.67
Georgia NULL NULL 987.40
NULL NULL NULL 3435.99

What ROLLUP Produces

Conceptually, the query generates the following aggregation levels:

  • (country, region, city) → city-level totals
  • (country, region, NULL) → region subtotals
  • (country, NULL, NULL) → country subtotals
  • (NULL, NULL, NULL) → grand total

All results are produced in one scan of the data.

Comparing UNION vs ROLLUP

Aspect GROUP BY + UNION ALL ROLLUP
Number of queries Many One
Logic duplication High None
Hierarchy clarity Implicit Explicit
Maintenance Difficult Easy
Scalability Poor Good

Interpreting NULLs in ROLLUP Results

In ROLLUP output:

  • NULL does not mean missing data
  • NULL means the column was aggregated away

This distinction becomes important when labeling subtotal rows, which will be addressed using GROUPING() later in this session.


Limitation of ROLLUP

While powerful, ROLLUP has constraints:

  • Only supports one fixed hierarchy
  • Cannot skip hierarchy levels
  • Cannot express unrelated aggregation combinations

CUBE

CUBE is an advanced aggregation operator that generates all possible combinations of the specified grouping columns.

Unlike ROLLUP, which follows a single hierarchy, CUBE performs cross-dimensional aggregation.

CUBE computes aggregates for every subset of the grouping columns.

For two dimensions:

\[ CUBE(A, B) \Rightarrow (A,B), (A), (B), () \]

For three dimensions:

\[ CUBE(A, B, C) \Rightarrow 2^3 = 8 \text{ grouping levels} \]

  • No hierarchy is assumed
  • Every combination is treated equally
  • Results grow exponentially with the number of columns

When CUBE Makes Sense

CUBE is appropriate when:

  • Dimensions are independent, not hierarchical
  • The goal is exploration, not structured reporting
  • Analysts want to examine all cross-sections

Typical use cases:

  • Geography × Product
  • Channel × Product
  • Segment × Campaign

From Hierarchy to Cross-Dimensions

In ROLLUP, geography formed a strict hierarchy: Country → Region → City

With CUBE, we now explore independent dimensions, such as:

  • Country
  • Product category

Problem 2: Re-solving a New Analytical Question

Business question

  • Revenue by country and category
  • Revenue by country only
  • Revenue by category only
  • Grand total

All results must appear in one result set.


CUBE Solution

SELECT
    country_name,
    category,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY CUBE (
    country_name,
    category
)
ORDER BY
    country_name,
    category;

What CUBE Produces

The query generates the following aggregation levels:

  • (country, category) → detailed cross-sections
  • (country, NULL) → country totals
  • (NULL, category) → category totals
  • (NULL, NULL) → grand total

Each level is produced automatically.

Comparing ROLLUP and CUBE

Aspect ROLLUP CUBE
Assumes hierarchy Yes No
Aggregation pattern Linear Combinatorial
Result size Moderate Large
Use case Reporting Exploration

Practical Limitation of CUBE

While powerful, CUBE has important drawbacks:

  • Result size grows exponentially
  • Many aggregation levels may be unnecessary
  • Often unsuitable for dashboards and production queries

These limitations lead to the most flexible operator.

GROUPING SETS

GROUPING SETS is the most flexible and precise advanced aggregation operator in SQL.

It allows you to explicitly define which aggregation levels you want, without generating unnecessary subtotals.

GROUPING SETS lets you specify a list of grouping combinations to compute in a single query.

Formally:

\[ GROUPING\ SETS((A,B),(A),(B),()) \]

Each tuple represents one GROUP BY clause.

  • No hierarchy is assumed
  • No automatic combinations are added
  • Only requested aggregation levels are produced

Why GROUPING SETS Exists

GROUPING SETS generalizes both:

  • ROLLUP
  • CUBE

\[\downarrow\]

  • ROLLUP is a special case of GROUPING SETS
  • CUBE is a special case of GROUPING SETS

GROUPING SETS is the endgame of SQL aggregation.


Re-solving Problem 2 with GROUPING SETS

Recall Problem 2:

  • City-level revenue
  • Region-level revenue
  • Product category totals
  • Grand total

This problem cannot be expressed cleanly with ROLLUP or CUBE.

GROUPING SETS Solution

SELECT
    country_name,
    region_name,
    city_name,
    category,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY GROUPING SETS (
    (country_name, region_name, city_name, category),
    (country_name, region_name),
    (category),
    ()
)
ORDER BY
    country_name,
    region_name,
    city_name,
    category;

Output Table

country_name region_name city_name category total_revenue
Georgia Adjara Batumi NULL 525.73
Georgia Tbilisi Tbilisi Cloud 20.93
Georgia Tbilisi Tbilisi Entertainment 23.97
Georgia Tbilisi Tbilisi Internet 284.89
Georgia Tbilisi Tbilisi Mobile 59.94
Georgia Tbilisi Tbilisi TV 71.94
Georgia Tbilisi NULL NULL 461.67
NULL NULL NULL Cloud 107.64
NULL NULL NULL Entertainment 119.85
NULL NULL NULL Internet 1334.48
NULL NULL NULL Mobile 1004.48
NULL NULL NULL TV 779.54
NULL NULL NULL NULL 3435.99

What GROUPING SETS Produces

The query returns exactly four aggregation levels:

  • (country, region, city, category) → detailed level
  • (country, region) → region subtotals
  • (category) → category totals
  • () → grand total

No additional combinations are generated.


GROUPING SETS vs UNION ALL

Aspect UNION ALL GROUPING SETS
Number of queries Many One
Intent clarity Low High
Maintenance Difficult Easy
Performance Poor Better
Error risk High Low

GROUPING SETS is a declarative replacement for complex UNION ALL logic.

Handling NULLs with GROUPING()

When using advanced aggregations, NULL values often represent aggregated dimensions, not missing data.

The GROUPING() function identifies such cases.

SELECT
    country_name,
    region_name,
    city_name,
    category,
    GROUPING(city_name) AS g_city,
    GROUPING(category) AS g_category,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY GROUPING SETS (
    (country_name, region_name, city_name, category),
    (country_name, region_name),
    (category),
    ()
);
  • GROUPING(col) = 1 → column aggregated away
  • GROUPING(col) = 0 → actual value
country_name region_name city_name category g_city g_category total_revenue
Armenia Lori Vanadzor Entertainment 0 0 71.91
Georgia Adjara Batumi Cloud 0 0 20.93
Armenia Shirak Gyumri Entertainment 0 0 23.97
NULL NULL NULL Cloud 1 0 107.64
NULL NULL NULL TV 1 0 779.54
NULL NULL NULL Entertainment 1 0 119.85
NULL NULL NULL Internet 1 0 1334.48
NULL NULL NULL Mobile 1 0 1004.48
Georgia Tbilisi NULL NULL 1 1 461.67
Armenia Shirak NULL NULL 1 1 530.66
Armenia Lori NULL NULL 1 1 272.76
Armenia Yerevan NULL NULL 1 1 1645.17
Georgia Adjara NULL NULL 1 1 525.73

When to Use GROUPING SETS

  • When reporting requirements are specific and selective
  • When building dashboards or production reports
  • When replacing complex UNION ALL queries
Tip

If you know exactly what you want, then use GROUPING SETS.


ROLLUP vs CUBE vs GROUPING SET

Operator Best For Risk
ROLLUP Hierarchies Rigid
CUBE Exploration Explosion
GROUPING SETS Production None
Important

Advanced aggregations exist to replace procedural SQL patterns with declarative intent.

  • ROLLUP expresses hierarchy
  • CUBE explores dimensions
  • GROUPING SETS delivers precision

This concludes the SQL aggregation journey.