Advanced Aggregations

UNION,ROLLUP, CUBE,GROUPING SETS

Karen Hovhannisyan

2026-04-01

Goal

  • Understand the limitations of GROUP BY
  • Solve a real analytical problem using UNION ALL
  • Replace the solution with ROLLUP and GROUPING SETS
  • Compare clarity, correctness, and scalability

UNION

What is UNION?

UNION is a set operator that combines the results of multiple SELECT statements into a single result set.

  • Result sets are stacked vertically
  • Each SELECT must return the same number of columns
  • Corresponding columns must have compatible data types

Types of UNION

  • UNION
  • UNION ALL
Operator Behavior
UNION Removes duplicate rows
UNION ALL Preserves all rows

UNION vs UNION ALL

Two independent result sets:

  • Result Set A
  • Result Set B

UNION applies deduplication
UNION ALL performs pure concatenation

Example Tables

Table A

id value
1 A
2 B
3 C

Table B

id value
3 C
4 D
5 E

UNION Example

SELECT id, value FROM table_a
UNION
SELECT id, value FROM table_b;

\[\Downarrow\]

id value
1 A
2 B
3 C
4 D
5 E

UNION ALL Example

SELECT id, value FROM table_a
UNION ALL
SELECT id, value FROM table_b;

\[\Downarrow\]

id value
1 A
2 B
3 C
3 C
4 D
5 E

Why UNION ALL is Preferred in Analytics

  • No implicit deduplication
  • Predictable row counts
  • Better performance
  • Correct for aggregation logic

Analytical Context

We now move from toy examples to a real analytical problem.

  • Hierarchical geography
  • Transactional revenue
  • Multiple aggregation levels
  • One result set required

Problem 1

We are asked to provide:

  • Revenue by city
  • Revenue by region
  • Revenue by country
  • A grand total

All results must appear in one result set.

Analytical Base

All queries rely on the following analytical layer.

CREATE MATERIALIZED VIEW analytics.mv_order_revenue AS
SELECT
    c.country_name,
    r.region_name,
    ci.city_name,
    p.category,
    oi.quantity * p.price AS revenue
FROM analytics.order_items oi
JOIN analytics.orders o
    ON oi.order_id = o.order_id
JOIN analytics.products p
    ON oi.product_id = p.product_id
JOIN analytics.customers cu
    ON o.customer_id = cu.customer_id
JOIN analytics.cities ci
    ON cu.city_id = ci.city_id
JOIN analytics.regions r
    ON ci.region_id = r.region_id
JOIN analytics.countries c
    ON r.country_id = c.country_id;

Problem 1: UNION-based Solution

SELECT
    country_name,
    region_name,
    city_name,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY country_name, region_name, city_name

UNION ALL

SELECT
    country_name,
    region_name,
    NULL,
    SUM(revenue)
FROM analytics.mv_order_revenue
GROUP BY country_name, region_name

UNION ALL

SELECT
    country_name,
    NULL,
    NULL,
    SUM(revenue)
FROM analytics.mv_order_revenue
GROUP BY country_name

UNION ALL

SELECT
    NULL,
    NULL,
    NULL,
    SUM(revenue)
FROM analytics.mv_order_revenue;

Output 1

country_name region_name city_name total_revenue
Armenia Lori Vanadzor 272.76
Armenia Shirak Gyumri 530.66
Armenia Yerevan Yerevan 1645.10
Georgia Adjara Batumi 525.73
Georgia Tbilisi Tbilisi 461.67
Georgia Tbilisi NULL 461.67
Armenia Shirak NULL 530.66
Armenia Lori NULL 272.76
Armenia Yerevan NULL 1645.10
Georgia Adjara NULL 525.73
Armenia NULL NULL 2448.59
Georgia NULL NULL 987.40
NULL NULL NULL 3435.99

Problem 2

Let’s expand the problem with the new New requirement

  • City-level revenue
  • Region-level revenue
  • Product category totals
  • Grand total

Solution 2

SELECT
    country_name,
    region_name,
    city_name,
    category,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY
    country_name,
    region_name,
    city_name,
    category

UNION ALL

SELECT
    country_name,
    region_name,
    NULL,
    category,
    SUM(revenue)
FROM analytics.mv_order_revenue
GROUP BY
    country_name,
    region_name,
    category

UNION ALL

SELECT
    NULL,
    NULL,
    NULL,
    category,
    SUM(revenue)
FROM analytics.mv_order_revenue
GROUP BY
    category

UNION ALL

SELECT
    NULL,
    NULL,
    NULL,
    NULL,
    SUM(revenue)
FROM analytics.mv_order_revenue;

Output 2

country_name region_name city_name category total_revenue
Georgia Adjara Batumi Internet 209.93
Georgia Adjara Batumi Cloud 20.93
Georgia Tbilisi Tbilisi TV 71.94
Armenia Yerevan Yerevan Internet 239.92
Armenia Yerevan Yerevan TV 571.69
Armenia Shirak Gyumri Entertainment 23.97
Armenia Lori Vanadzor TV 35.97
Georgia Adjara Batumi Mobile 174.93
Armenia Lori Vanadzor Entertainment 71.91
Armenia Shirak Gyumri Cloud 41.86
Georgia Tbilisi Tbilisi Internet 284.89
Armenia Shirak Gyumri Internet 464.83
Georgia Tbilisi Tbilisi Mobile 59.94
Armenia Lori Vanadzor Mobile 39.97
Georgia Adjara NULL Internet 209.93
Georgia Adjara NULL Cloud 20.93
Georgia Adjara NULL Mobile 174.93
Georgia Tbilisi NULL TV 71.94
Georgia Tbilisi NULL Internet 284.89

Problems with UNION-based Aggregation

  • Repeated full-table scans
  • High maintenance cost
  • Error-prone NULL handling
  • Poor scalability as requirements grow

ROLLUP

What is ROLLUP?

ROLLUP is an advanced aggregation operator that extends GROUP BY to produce hierarchical subtotals automatically.

ROLLUP Semantics

For a list of grouping columns:

\[ ROLLUP(A, B, C) \]

SQL generates the following grouping levels:

\[ (A,B,C), (A,B), (A), () \]

  • Aggregation proceeds from right to left
  • Each step removes one level of detail
  • The final row represents the grand total

When ROLLUP Is Appropriate

Typical hierarchies:

  • Geography: Country → Region → City
  • Time: Year → Quarter → Month

Revisiting Problem 1

We previously solved this with UNION ALL:

  • Revenue by city
  • Revenue by region
  • Revenue by country
  • Grand total

All results in one result set.

ROLLUP Solution

SELECT
    country_name,
    region_name,
    city_name,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY ROLLUP (
    country_name,
    region_name,
    city_name
)
ORDER BY
    country_name,
    region_name,
    city_name;

Output 1 with ROLLUP

country_name region_name city_name total_revenue
Armenia Lori Vanadzor 272.76
Armenia Lori NULL 272.76
Armenia Shirak Gyumri 530.66
Armenia Shirak NULL 530.66
Armenia Yerevan Yerevan 1645.17
Armenia Yerevan NULL 1645.17
Armenia NULL NULL 2448.59
Georgia Adjara Batumi 525.73
Georgia Adjara NULL 525.73
Georgia Tbilisi Tbilisi 461.67
Georgia Tbilisi NULL 461.67
Georgia NULL NULL 987.40
NULL NULL NULL 3435.99

What ROLLUP Produces

Conceptually, the query returns:

  • (country, region, city) → city-level totals
  • (country, region, NULL) → region subtotals
  • (country, NULL, NULL) → country subtotals
  • (NULL, NULL, NULL) → grand total

All aggregation levels are computed in one query.

UNION vs ROLLUP

Aspect GROUP BY + UNION ALL ROLLUP
Number of queries Many One
Hierarchy expression Implicit Explicit
Logic duplication High None
Maintainability Low High

Interpreting NULL Values

In ROLLUP results:

  • NULL does not mean missing data
  • NULL means the column was aggregated away

Limitation of ROLLUP

  • Only supports one strict hierarchy
  • Cannot skip hierarchy levels
  • Cannot express independent dimensions

CUBE

What is CUBE?

CUBE is an advanced aggregation operator that generates all possible combinations of the specified grouping columns.

CUBE Semantics

For a list of grouping columns:

\[ CUBE(A, B) \]

SQL generates the following grouping levels:

\[ (A,B), (A), (B), () \]

  • Every dimension is treated as independent
  • All combinations are generated automatically
  • The result size grows exponentially

When CUBE Is Appropriate

CUBE is best suited for exploratory analysis.

Typical use cases:

  • Geography × Product
  • Customer Segment × Campaign
  • Channel × Product Category

There is no natural hierarchy between dimensions.

Revisiting a New Analytical Question

We now want to analyze:

  • Revenue by country and category
  • Revenue by country only
  • Revenue by category only
  • Grand total

All results must appear in one result set.

CUBE Solution

SELECT
    country_name,
    category,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY CUBE (
    country_name,
    category
)
ORDER BY
    country_name,
    category;

country_name category total_revenue
Armenia NULL 2448.59
Georgia Cloud 41.86
Georgia Entertainment 23.97
Georgia Internet 494.82
Georgia Mobile 234.87
Georgia TV 191.88
Georgia NULL 987.40
NULL Cloud 107.64
NULL Entertainment 119.85
NULL Internet 1334.48
NULL Mobile 1004.48
NULL TV 779.54
NULL NULL 3435.99

What CUBE Produces

Conceptually, the query returns:

  • (country, category) → detailed cross-sections
  • (country, NULL) → country totals
  • (NULL, category) → category totals
  • (NULL, NULL) → grand total

All combinations are computed in one query.

ROLLUP vs CUBE

Aspect ROLLUP CUBE
Assumes hierarchy Yes No
Aggregation pattern Linear Combinatorial
Result size Controlled Potentially large
Typical use Reporting Exploration

Limitation of CUBE

  • Result size grows very quickly
  • Many combinations may be unnecessary
  • Often unsuitable for dashboards and production reports

These limitations motivate the final operator.

GROUPING SETS

GROUPING SETS

What is GROUPING SETS?

GROUPING SETS is the most flexible advanced aggregation operator in SQL.

It allows you to explicitly define which aggregation levels should be computed in a single query.

Unlike ROLLUP and CUBE, nothing is implicit.

GROUPING SETS Semantics

General form:

\[ GROUPING\ SETS((A,B),(A),(B),()) \]

Each tuple represents one independent GROUP BY clause.

  • No hierarchy is assumed
  • No extra combinations are added
  • Only requested aggregation levels are produced

Relationship to ROLLUP and CUBE

GROUPING SETS generalizes both operators.

  • ROLLUP is a special case of GROUPING SETS
  • CUBE is a special case of GROUPING SETS

Conceptually:

  • ROLLUP = predefined hierarchical grouping sets
  • CUBE = all possible grouping sets
  • GROUPING SETS = exactly what you ask for

Revisiting Problem 2

Recall the requirement:

  • City-level revenue
  • Region-level revenue
  • Product category totals
  • Grand total

This requirement cannot be expressed cleanly with ROLLUP or CUBE.

GROUPING SETS Solution

SELECT
    country_name,
    region_name,
    city_name,
    category,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY GROUPING SETS (
    (country_name, region_name, city_name, category),
    (country_name, region_name),
    (category),
    ()
)
ORDER BY
    country_name,
    region_name,
    city_name,
    category;

Output

country_name region_name city_name category total_revenue
Georgia Adjara Batumi NULL 525.73
Georgia Tbilisi Tbilisi Cloud 20.93
Georgia Tbilisi Tbilisi Entertainment 23.97
Georgia Tbilisi Tbilisi Internet 284.89
Georgia Tbilisi Tbilisi Mobile 59.94
Georgia Tbilisi Tbilisi TV 71.94
Georgia Tbilisi NULL NULL 461.67
NULL NULL NULL Cloud 107.64
NULL NULL NULL Entertainment 119.85
NULL NULL NULL Internet 1334.48
NULL NULL NULL Mobile 1004.48
NULL NULL NULL TV 779.54
NULL NULL NULL NULL 3435.99

What GROUPING SETS Produces

Exactly four aggregation levels:

  • (country, region, city, category) → detailed level
  • (country, region) → region subtotals
  • (category) → category totals
  • () → grand total

No additional combinations are generated.

UNION ALL vs GROUPING SETS

Aspect UNION ALL GROUPING SETS
Number of queries Many One
Intent clarity Low High
Maintenance Difficult Easy
Error risk High Low
Scalability Poor High

Handling NULLs

In advanced aggregations, NULL often means aggregated away, not missing.

The GROUPING() function identifies such cases.

SELECT
    country_name,
    region_name,
    city_name,
    category,
    GROUPING(city_name) AS g_city,
    GROUPING(category) AS g_category,
    SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY GROUPING SETS (
    (country_name, region_name, city_name, category),
    (country_name, region_name),
    (category),
    ()
);

Output with GROUPING Indicators

  • GROUPING(col) = 1 → column aggregated away
  • GROUPING(col) = 0 → actual value
country_name region_name city_name category g_city g_category total_revenue
Armenia Lori Vanadzor Entertainment 0 0 71.91
Georgia Adjara Batumi Cloud 0 0 20.93
Armenia Shirak Gyumri Entertainment 0 0 23.97
NULL NULL NULL Cloud 1 0 107.64
NULL NULL NULL TV 1 0 779.54
NULL NULL NULL Entertainment 1 0 119.85
NULL NULL NULL Internet 1 0 1334.48
NULL NULL NULL Mobile 1 0 1004.48
Georgia Tbilisi NULL NULL 1 1 461.67
Armenia Shirak NULL NULL 1 1 530.66
Armenia Lori NULL NULL 1 1 272.76
Armenia Yerevan NULL NULL 1 1 1645.17
Georgia Adjara NULL NULL 1 1 525.73

When GROUPING SETS Is the Right Choice

  • Reporting requirements are explicit and selective
  • Building dashboards and production reports
  • Replacing complex UNION ALL logic

If you know exactly what you want, use GROUPING SETS.

Final Comparison

Operator Best Use Case Main Risk
ROLLUP Hierarchies Rigid
CUBE Exploration Explosion
GROUPING SETS Production None

Final Takeaway

Advanced aggregations replace procedural SQL with declarative intent.

  • ROLLUP expresses hierarchy
  • CUBE explores dimensions
  • GROUPING SETS delivers precision