Session 12: DA with SQL | Advanced Aggregations
Session Goals
This final session demonstrates why advanced aggregations exist by first solving an analytical problem using classic GROUP BY + UNION ALL, and then solving the same problem using advanced aggregation operators.
- Understand the limitations of
GROUP BY
- Solve a real analytical problem using
UNION ALL
- Replace the solution with
ROLLUPandGROUPING SETS
- Compare clarity, correctness, and scalability
Analytical Base
In real analytical systems, aggregations are rarely performed on a single flat table.
Instead, they are computed from joined fact and dimension tables.
In this session, we use the following analytical structure:
- Geography hierarchy: Country → Region → City
- Customers tied to cities
- Orders and order items representing transactions
- Products providing category and price information
The analytical grain is order item.
All examples use the following analytical materialized view.
CREATE MATERIALIZED VIEW analytics.mv_order_revenue AS
SELECT
c.country_name,
r.region_name,
ci.city_name,
p.category,
oi.quantity * p.price AS revenue
FROM analytics.order_items oi
JOIN analytics.orders o
ON oi.order_id = o.order_id
JOIN analytics.products p
ON oi.product_id = p.product_id
JOIN analytics.customers cu
ON o.customer_id = cu.customer_id
JOIN analytics.cities ci
ON cu.city_id = ci.city_id
JOIN analytics.regions r
ON ci.region_id = r.region_id
JOIN analytics.countries c
ON r.country_id = c.country_id;UNION
UNION is a set operator used to combine the results of multiple SELECT statements into a single result set.
- Result sets are stacked vertically
- Each
SELECTmust return the same number of columns
- Corresponding columns must have compatible data types
Types of UNION:
UNION
UNION ALL
| Operator | Description |
|---|---|
| UNION | Combines results and removes duplicates |
| UNION ALL | Combines results and keeps all rows |
Suppose we have the following tables (A and B)
Table A
| id | value |
|---|---|
| 1 | A |
| 2 | B |
| 3 | C |
Table B
| id | value |
|---|---|
| 3 | C |
| 4 | D |
| 5 | E |
UNION
SELECT id, value FROM table_a
UNION
SELECT id, value FROM table_b;\[\Downarrow\]
| id | value |
|---|---|
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | E |
UNION ALL
SELECT id, value FROM table_a
UNION ALL
SELECT id, value FROM table_b;\[\Downarrow\]
| id | value |
|---|---|
| 1 | A |
| 2 | B |
| 3 | C |
| 3 | C |
| 4 | D |
| 5 | E |
For analytical queries, UNION ALL is preferred due to performance and correctness.
Problem 1
We are assigend to provide:
- Revenue by city
- Revenue by region
- Revenue by country
- A grand total
All results must appear in one result set.
Solution 1
The traditional way to solve the problem without advanced aggregations.
SELECT
country_name,
region_name,
city_name,
SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY
country_name,
region_name,
city_name
UNION ALL
SELECT
country_name,
region_name,
NULL AS city_name,
SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY
country_name,
region_name
UNION ALL
SELECT
country_name,
NULL AS region_name,
NULL AS city_name,
SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY
country_name
UNION ALL
SELECT
NULL AS country_name,
NULL AS region_name,
NULL AS city_name,
SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue;Output Table
| country_name | region_name | city_name | total_revenue |
|---|---|---|---|
| Armenia | Lori | Vanadzor | 272.76 |
| Armenia | Shirak | Gyumri | 530.66 |
| Armenia | Yerevan | Yerevan | 1645.10 |
| Georgia | Adjara | Batumi | 525.73 |
| Georgia | Tbilisi | Tbilisi | 461.67 |
| Georgia | Tbilisi | NULL | 461.67 |
| Armenia | Shirak | NULL | 530.66 |
| Armenia | Lori | NULL | 272.76 |
| Armenia | Yerevan | NULL | 1645.10 |
| Georgia | Adjara | NULL | 525.73 |
| Armenia | NULL | NULL | 2448.59 |
| Georgia | NULL | NULL | 987.40 |
| NULL | NULL | NULL | 3435.99 |
Problems with UNION Based Solution
- Multiple queries scan the same data repeatedly
- High risk of inconsistency when logic changes
- Hard to maintain and extend
- Difficult to interpret NULL values
Problem 2
Let’s expand the problem with the new New requirement
- City-level revenue
- Region-level revenue
- Product category totals
- Grand total
Solution 2
SELECT
country_name,
region_name,
city_name,
category,
SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY
country_name,
region_name,
city_name,
category
UNION ALL
SELECT
country_name,
region_name,
NULL,
category,
SUM(revenue)
FROM analytics.mv_order_revenue
GROUP BY
country_name,
region_name,
category
UNION ALL
SELECT
NULL,
NULL,
NULL,
category,
SUM(revenue)
FROM analytics.mv_order_revenue
GROUP BY
category
UNION ALL
SELECT
NULL,
NULL,
NULL,
NULL,
SUM(revenue)
FROM analytics.mv_order_revenue;Output Table
| country_name | region_name | city_name | category | total_revenue |
|---|---|---|---|---|
| Georgia | Adjara | Batumi | Internet | 209.93 |
| Georgia | Adjara | Batumi | Cloud | 20.93 |
| Georgia | Tbilisi | Tbilisi | TV | 71.94 |
| Armenia | Yerevan | Yerevan | Internet | 239.92 |
| Armenia | Yerevan | Yerevan | TV | 571.69 |
| Armenia | Shirak | Gyumri | Entertainment | 23.97 |
| Armenia | Lori | Vanadzor | TV | 35.97 |
| Georgia | Adjara | Batumi | Mobile | 174.93 |
| Armenia | Lori | Vanadzor | Entertainment | 71.91 |
| Armenia | Shirak | Gyumri | Cloud | 41.86 |
| Georgia | Tbilisi | Tbilisi | Internet | 284.89 |
| Armenia | Shirak | Gyumri | Internet | 464.83 |
| Georgia | Tbilisi | Tbilisi | Mobile | 59.94 |
| Armenia | Lori | Vanadzor | Mobile | 39.97 |
| Georgia | Adjara | NULL | Internet | 209.93 |
| Georgia | Adjara | NULL | Cloud | 20.93 |
| Georgia | Adjara | NULL | Mobile | 174.93 |
| Georgia | Tbilisi | NULL | TV | 71.94 |
| Georgia | Tbilisi | NULL | Internet | 284.89 |
| Georgia | Tbilisi | NULL | Mobile | 59.94 |
| Armenia | Yerevan | NULL | Internet | 239.92 |
| Armenia | Yerevan | NULL | TV | 571.69 |
| Armenia | Shirak | NULL | Entertainment | 23.97 |
| Armenia | Shirak | NULL | Cloud | 41.86 |
| Armenia | Shirak | NULL | Internet | 464.83 |
| Armenia | Lori | NULL | TV | 35.97 |
| Armenia | Lori | NULL | Entertainment | 71.91 |
| Armenia | Lori | NULL | Mobile | 39.97 |
| NULL | NULL | NULL | Internet | 1199.57 |
| NULL | NULL | NULL | TV | 679.60 |
| NULL | NULL | NULL | Mobile | 274.84 |
| NULL | NULL | NULL | Cloud | 62.79 |
| NULL | NULL | NULL | Entertainment | 95.88 |
| NULL | NULL | NULL | NULL | 2312.68 |
Try Yourself
Create new Column which will handle all the combinations:
- Country-level revenue
- Region-level revenue
- City-level revenue
- Grand total
SELECT
country_name,
region_name,
city_name,
SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY country_name, region_name, city_name
UNION ALL
SELECT
country_name,
region_name,
NULL,
SUM(revenue)
FROM analytics.mv_order_revenue
GROUP BY country_name, region_name
UNION ALL
SELECT
country_name,
NULL,
NULL,
SUM(revenue)
FROM analytics.mv_order_revenue
GROUP BY country_name
UNION ALL
SELECT
NULL,
NULL,
NULL,
SUM(revenue)
FROM analytics.mv_order_revenue;Beyond Simple GROUP BY
Traditional GROUP BY generates one aggregation level at a time.
- Multiple aggregation levels require multiple queries
- UNION-based solutions are verbose and error-prone
- Analytical reporting often needs subtotals and totals together
Advanced aggregations solve this by computing multiple grouping levels in a single query.
ROLLUP
ROLLUP is an advanced aggregation operator designed to generate hierarchical subtotals automatically.
Instead of manually writing multiple GROUP BY queries and stitching them together with UNION ALL, ROLLUP expresses the hierarchy directly in SQL.
ROLLUP extends GROUP BY by computing progressively higher-level aggregates along a defined column order.
\[ ROLLUP(A, B, C) \Rightarrow (A,B,C), (A,B), (A), () \]
- Columns are aggregated from right to left
- Each step removes one level of detail
- A final row represents the grand total
When ROLLUP Makes Sense
ROLLUP is best suited when:
- Dimensions form a natural hierarchy
- Reporting requires subtotals at each level
- The hierarchy order is fixed and meaningful
Typical examples include:
- Geography: Country → Region → City
- Time: Year → Quarter → Month
Problem 1: Re-solving with ROLLUP
Recall Problem 1:
- Revenue by city
- Revenue by region
- Revenue by country
- A grand total
- One result set
This hierarchy exactly matches our geography structure.
ROLLUP Solution
SELECT
country_name,
region_name,
city_name,
SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY ROLLUP (
country_name,
region_name,
city_name
)
ORDER BY
country_name,
region_name,
city_name;Output Table
| country_name | region_name | city_name | total_revenue |
|---|---|---|---|
| Armenia | Lori | Vanadzor | 272.76 |
| Armenia | Lori | NULL | 272.76 |
| Armenia | Shirak | Gyumri | 530.66 |
| Armenia | Shirak | NULL | 530.66 |
| Armenia | Yerevan | Yerevan | 1645.17 |
| Armenia | Yerevan | NULL | 1645.17 |
| Armenia | NULL | NULL | 2448.59 |
| Georgia | Adjara | Batumi | 525.73 |
| Georgia | Adjara | NULL | 525.73 |
| Georgia | Tbilisi | Tbilisi | 461.67 |
| Georgia | Tbilisi | NULL | 461.67 |
| Georgia | NULL | NULL | 987.40 |
| NULL | NULL | NULL | 3435.99 |
What ROLLUP Produces
Conceptually, the query generates the following aggregation levels:
(country, region, city)→ city-level totals
(country, region, NULL)→ region subtotals
(country, NULL, NULL)→ country subtotals
(NULL, NULL, NULL)→ grand total
All results are produced in one scan of the data.
Comparing UNION vs ROLLUP
| Aspect | GROUP BY + UNION ALL | ROLLUP |
|---|---|---|
| Number of queries | Many | One |
| Logic duplication | High | None |
| Hierarchy clarity | Implicit | Explicit |
| Maintenance | Difficult | Easy |
| Scalability | Poor | Good |
Interpreting NULLs in ROLLUP Results
In ROLLUP output:
NULLdoes not mean missing data
NULLmeans the column was aggregated away
This distinction becomes important when labeling subtotal rows, which will be addressed using GROUPING() later in this session.
Limitation of ROLLUP
While powerful, ROLLUP has constraints:
- Only supports one fixed hierarchy
- Cannot skip hierarchy levels
- Cannot express unrelated aggregation combinations
CUBE
CUBE is an advanced aggregation operator that generates all possible combinations of the specified grouping columns.
Unlike ROLLUP, which follows a single hierarchy, CUBE performs cross-dimensional aggregation.
CUBE computes aggregates for every subset of the grouping columns.
For two dimensions:
\[ CUBE(A, B) \Rightarrow (A,B), (A), (B), () \]
For three dimensions:
\[ CUBE(A, B, C) \Rightarrow 2^3 = 8 \text{ grouping levels} \]
- No hierarchy is assumed
- Every combination is treated equally
- Results grow exponentially with the number of columns
When CUBE Makes Sense
CUBE is appropriate when:
- Dimensions are independent, not hierarchical
- The goal is exploration, not structured reporting
- Analysts want to examine all cross-sections
Typical use cases:
- Geography × Product
- Channel × Product
- Segment × Campaign
From Hierarchy to Cross-Dimensions
In ROLLUP, geography formed a strict hierarchy: Country → Region → City
With CUBE, we now explore independent dimensions, such as:
- Country
- Product category
Problem 2: Re-solving a New Analytical Question
Business question
- Revenue by country and category
- Revenue by country only
- Revenue by category only
- Grand total
All results must appear in one result set.
CUBE Solution
SELECT
country_name,
category,
SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY CUBE (
country_name,
category
)
ORDER BY
country_name,
category;What CUBE Produces
The query generates the following aggregation levels:
(country, category)→ detailed cross-sections
(country, NULL)→ country totals
(NULL, category)→ category totals
(NULL, NULL)→ grand total
Each level is produced automatically.
Comparing ROLLUP and CUBE
| Aspect | ROLLUP | CUBE |
|---|---|---|
| Assumes hierarchy | Yes | No |
| Aggregation pattern | Linear | Combinatorial |
| Result size | Moderate | Large |
| Use case | Reporting | Exploration |
Practical Limitation of CUBE
While powerful, CUBE has important drawbacks:
- Result size grows exponentially
- Many aggregation levels may be unnecessary
- Often unsuitable for dashboards and production queries
These limitations lead to the most flexible operator.
GROUPING SETS
GROUPING SETS is the most flexible and precise advanced aggregation operator in SQL.
It allows you to explicitly define which aggregation levels you want, without generating unnecessary subtotals.
GROUPING SETS lets you specify a list of grouping combinations to compute in a single query.
Formally:
\[ GROUPING\ SETS((A,B),(A),(B),()) \]
Each tuple represents one GROUP BY clause.
- No hierarchy is assumed
- No automatic combinations are added
- Only requested aggregation levels are produced
Why GROUPING SETS Exists
GROUPING SETS generalizes both:
ROLLUP
CUBE
\[\downarrow\]
ROLLUPis a special case ofGROUPING SETS
CUBEis a special case ofGROUPING SETS
GROUPING SETS is the endgame of SQL aggregation.
Re-solving Problem 2 with GROUPING SETS
Recall Problem 2:
- City-level revenue
- Region-level revenue
- Product category totals
- Grand total
This problem cannot be expressed cleanly with ROLLUP or CUBE.
GROUPING SETS Solution
SELECT
country_name,
region_name,
city_name,
category,
SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY GROUPING SETS (
(country_name, region_name, city_name, category),
(country_name, region_name),
(category),
()
)
ORDER BY
country_name,
region_name,
city_name,
category;Output Table
| country_name | region_name | city_name | category | total_revenue |
|---|---|---|---|---|
| Georgia | Adjara | Batumi | NULL | 525.73 |
| Georgia | Tbilisi | Tbilisi | Cloud | 20.93 |
| Georgia | Tbilisi | Tbilisi | Entertainment | 23.97 |
| Georgia | Tbilisi | Tbilisi | Internet | 284.89 |
| Georgia | Tbilisi | Tbilisi | Mobile | 59.94 |
| Georgia | Tbilisi | Tbilisi | TV | 71.94 |
| Georgia | Tbilisi | NULL | NULL | 461.67 |
| NULL | NULL | NULL | Cloud | 107.64 |
| NULL | NULL | NULL | Entertainment | 119.85 |
| NULL | NULL | NULL | Internet | 1334.48 |
| NULL | NULL | NULL | Mobile | 1004.48 |
| NULL | NULL | NULL | TV | 779.54 |
| NULL | NULL | NULL | NULL | 3435.99 |
What GROUPING SETS Produces
The query returns exactly four aggregation levels:
(country, region, city, category)→ detailed level
(country, region)→ region subtotals
(category)→ category totals
()→ grand total
No additional combinations are generated.
GROUPING SETS vs UNION ALL
| Aspect | UNION ALL | GROUPING SETS |
|---|---|---|
| Number of queries | Many | One |
| Intent clarity | Low | High |
| Maintenance | Difficult | Easy |
| Performance | Poor | Better |
| Error risk | High | Low |
GROUPING SETS is a declarative replacement for complex UNION ALL logic.
Handling NULLs with GROUPING()
When using advanced aggregations, NULL values often represent aggregated dimensions, not missing data.
The GROUPING() function identifies such cases.
SELECT
country_name,
region_name,
city_name,
category,
GROUPING(city_name) AS g_city,
GROUPING(category) AS g_category,
SUM(revenue) AS total_revenue
FROM analytics.mv_order_revenue
GROUP BY GROUPING SETS (
(country_name, region_name, city_name, category),
(country_name, region_name),
(category),
()
);GROUPING(col) = 1→ column aggregated away
GROUPING(col) = 0→ actual value
| country_name | region_name | city_name | category | g_city | g_category | total_revenue |
|---|---|---|---|---|---|---|
| Armenia | Lori | Vanadzor | Entertainment | 0 | 0 | 71.91 |
| Georgia | Adjara | Batumi | Cloud | 0 | 0 | 20.93 |
| Armenia | Shirak | Gyumri | Entertainment | 0 | 0 | 23.97 |
| NULL | NULL | NULL | Cloud | 1 | 0 | 107.64 |
| NULL | NULL | NULL | TV | 1 | 0 | 779.54 |
| NULL | NULL | NULL | Entertainment | 1 | 0 | 119.85 |
| NULL | NULL | NULL | Internet | 1 | 0 | 1334.48 |
| NULL | NULL | NULL | Mobile | 1 | 0 | 1004.48 |
| Georgia | Tbilisi | NULL | NULL | 1 | 1 | 461.67 |
| Armenia | Shirak | NULL | NULL | 1 | 1 | 530.66 |
| Armenia | Lori | NULL | NULL | 1 | 1 | 272.76 |
| Armenia | Yerevan | NULL | NULL | 1 | 1 | 1645.17 |
| Georgia | Adjara | NULL | NULL | 1 | 1 | 525.73 |
When to Use GROUPING SETS
- When reporting requirements are specific and selective
- When building dashboards or production reports
- When replacing complex
UNION ALLqueries
If you know exactly what you want, then use GROUPING SETS.
ROLLUP vs CUBE vs GROUPING SET
| Operator | Best For | Risk |
|---|---|---|
| ROLLUP | Hierarchies | Rigid |
| CUBE | Exploration | Explosion |
| GROUPING SETS | Production | None |
Advanced aggregations exist to replace procedural SQL patterns with declarative intent.
ROLLUPexpresses hierarchy
CUBEexplores dimensions
GROUPING SETSdelivers precision
This concludes the SQL aggregation journey.