flowchart LR
A[Base Tables] --> B[Joins and Aggregations]
B --> C[Temporary Table]
C --> D[Further Analysis]
C --> E[Validation Queries]
C --> F[Final Output]
Temporary Tables, Subqueries, CTEs
2026-04-02
In real analytical work, SQL queries are rarely written once and discarded.
Temporary tables allow us to materialize intermediate results and reuse them cleanly.
A temporary table is a session-scoped database table used to store intermediate query results.
Temporary tables make multi-step analysis explicit and easier to reason about.
flowchart LR
A[Base Tables] --> B[Joins and Aggregations]
B --> C[Temporary Table]
C --> D[Further Analysis]
C --> E[Validation Queries]
C --> F[Final Output]
Temporary tables are created using CREATE TEMP TABLE.
CREATE TEMP TABLE tmp_order_revenue AS
SELECT
o.order_id,
o.customer_id,
o.order_date,
SUM(oi.quantity * p.price) AS order_revenue
FROM analytics.orders o
JOIN analytics.order_items oi ON o.order_id = oi.order_id
JOIN analytics.products p ON oi.product_id = p.product_id
GROUP BY o.order_id, o.customer_id, o.order_date;
tmp_order_revenuebehaves like a normal table.
Before continuing analysis, intermediate results should be validated.
| total_orders | min_revenue | max_revenue | avg_revenue |
|---|---|---|---|
| 120 | 5.98 | 89.97 | 28.63 |
Once created, the temporary table can be reused without recomputing joins.
| customer_id | avg_order_revenue |
|---|---|
| 11 | 11.49 |
| 9 | 10.99 |
| 15 | 11.49 |
| 19 | 11.49 |
| 3 | 11.49 |
| 17 | 10.99 |
This separation clarifies intent:
Temporary tables scale naturally across business hierarchies.
CREATE TEMP TABLE tmp_customer_spend AS
SELECT
c.customer_id,
c.city_id,
SUM(oi.quantity * p.price) AS total_spend
FROM analytics.customers c
JOIN analytics.orders o ON c.customer_id = o.customer_id
JOIN analytics.order_items oi ON o.order_id = oi.order_id
JOIN analytics.products p ON oi.product_id = p.product_id
GROUP BY c.customer_id, c.city_id;| city_id | avg_customer_spend | max_customer_spend |
|---|---|---|
| 3 | 176.8866666666666667 | 230.86 |
| 5 | 262.8650000000000000 | 294.87 |
| 4 | 115.4175000000000000 | 260.85 |
| … | … | … |
Without temporary tables, the same logic must be repeated.
flowchart TB
Q1[Complex Join + Aggregation] --> R1[Result]
Q2[Complex Join + Aggregation] --> R2[Result]
Q3[Complex Join + Aggregation] --> R3[Result]
With a temporary table, the logic is defined once.
flowchart TB
Q[Complex Join + Aggregation] --> T[Temporary Table]
T --> R1[Analysis 1]
T --> R2[Analysis 2]
T --> R3[Analysis 3]
Temporary tables reduce duplication and cognitive load.
A subquery is a query nested inside another SQL query.
Unlike temporary tables, subqueries do not materialize intermediate results as standalone objects.
Instead, they embed logic directly inside a statement and are evaluated as part of query execution.
Conceptually:
Subqueries are most useful when intermediate results are needed only once and do not justify explicit materialization.
Temporary tables externalize intermediate results as named objects. Subqueries take the next step by embedding intermediate logic directly inside a query.
We will study subqueries in the following order:
WHERE)SELECT)FROM)Filter subqueries are used to restrict rows in the outer query based on the result of another query.
Mental model
Keep rows that satisfy a condition defined by another query.
flowchart LR
A[Input Table] --> B[Outer Query]
C[Subquery Result] --> B
B --> D[Filtered Output]
Select employees earning above the average salary.
| emp_id | department | salary |
|---|---|---|
| 1 | Sales | 1200 |
| 2 | Sales | 900 |
| 3 | IT | 1500 |
| 4 | IT | 1100 |
| emp_id | department | salary |
|---|---|---|
| 1 | Sales | 1200 |
| 3 | IT | 1500 |
| 4 | IT | 1100 |
Analytics SchemaQ: Which customers have an average order revenue above the overall average order revenue?
| customer_id | avg_customer_revenue |
|---|---|
| 4 | 38.48 |
| 10 | 49.15 |
| 6 | 56.64 |
| … | … |
In case of errors
If you encounter an error, recreate the temporary table tmp_order_revenue before running this query.
Q: Select employees earning above their department’s target salary.
| emp_id | department | salary |
|---|---|---|
| 1 | Sales | 1200 |
| 2 | Sales | 900 |
| 3 | IT | 1500 |
| department | target_salary |
|---|---|
| Sales | 1000 |
| IT | 1300 |
| emp_id | department | salary |
|---|---|---|
| 1 | Sales | 1200 |
| 3 | IT | 1500 |
Analytics SchemaWhich customers have total spend above the average spend of all customers?
SELECT
c.customer_id
FROM analytics.customers c
WHERE (
SELECT SUM(oi.quantity * p.price)
FROM analytics.orders o
JOIN analytics.order_items oi ON o.order_id = oi.order_id
JOIN analytics.products p ON oi.product_id = p.product_id
WHERE o.customer_id = c.customer_id
) >
(
SELECT AVG(customer_total)
FROM (
SELECT
o.customer_id,
SUM(oi.quantity * p.price) AS customer_total
FROM analytics.orders o
JOIN analytics.order_items oi ON o.order_id = oi.order_id
JOIN analytics.products p ON oi.product_id = p.product_id
GROUP BY o.customer_id
) t
);This query uses multiple tables inside the filter subquery while still producing a single comparison value.
SELECT)A column subquery returns a single value and is evaluated for each row of the outer query.
SELECT clauseAttach a contextual metric to each row.
This is often where subqueries feel powerful but heavy.
Important
flowchart LR
A[Row from Outer Query] --> B[Scalar Subquery]
B --> C[Computed Value]
A --> D[Enriched Row]
| emp_id | department | salary |
|---|---|---|
| 1 | Sales | 1200 |
| 2 | Sales | 900 |
| 3 | IT | 1500 |
\[\downarrow\]
| emp_id | salary | avg_salary |
|---|---|---|
| 1 | 1200 | 1200 |
| 2 | 900 | 1200 |
| 3 | 1500 | 1200 |
Q: For each order, show the order revenue and the overall average order revenue.
| order_id | order_revenue | avg_order_revenue |
|---|---|---|
| 101 | 42.50 | 47.18 |
| 102 | 55.00 | 47.18 |
| 103 | 38.00 | 47.18 |
employees
| emp_id | department | salary |
|---|---|---|
| 1 | Sales | 1200 |
| 2 | Sales | 900 |
| 3 | IT | 1500 |
departments
| department | target_salary |
|---|---|
| Sales | 1000 |
| IT | 1300 |
| emp_id | salary | target_salary |
|---|---|---|
| 1 | 1200 | 1000 |
| 2 | 900 | 1000 |
| 3 | 1500 | 1300 |
This is a correlated scalar subquery.
Business Question: For each customer, show:
SELECT
c.customer_id,
(
SELECT SUM(oi.quantity * p.price)
FROM analytics.orders o
JOIN analytics.order_items oi ON o.order_id = oi.order_id
JOIN analytics.products p ON oi.product_id = p.product_id
WHERE o.customer_id = c.customer_id
) AS customer_spend,
(
SELECT AVG(oi.quantity * p.price)
FROM analytics.orders o
JOIN analytics.order_items oi ON o.order_id = oi.order_id
JOIN analytics.products p ON oi.product_id = p.product_id
JOIN analytics.customers c2 ON o.customer_id = c2.customer_id
WHERE c2.city_id = c.city_id
) AS avg_city_spend
FROM analytics.customers c;| customer_id | customer_spend | avg_city_spend |
|---|---|---|
| 4 | 182.50 | 154.30 |
| 6 | 240.00 | 198.75 |
| 10 | 165.40 | 198.75 |
Warning
A column subquery must return exactly one value per outer row.
FROM)A derived table is a subquery used inside the FROM clause.
Create an inline dataset, then query it.
Derived tables sit conceptually between column subqueries and CTEs.
flowchart LR
A[Base Tables] --> B[Subquery in FROM]
B --> C[Derived Table]
C --> D[Outer Query]
| emp_id | department | salary |
|---|---|---|
| 1 | Sales | 1200 |
| 2 | Sales | 900 |
| 3 | IT | 1500 |
| 4 | IT | 1100 |
Q: What is the average salary per department, and show only departments with average salary above 1000?
| department | avg_salary |
|---|---|
| IT | 1300 |
Q: What is the average order revenue per customer, and show only customers above the overall average?
Derived Table (Intermediate)
| customer_id | avg_order_revenu e |
|---|---|
| 4 | 38.48 |
| 6 | 56.64 |
| 10 | 49.15 |
Final Output
| customer_id | avg_order_revenue |
|---|---|
| 6 | 56.64 |
| 10 | 49.15 |
Q: For each city, show:
SELECT
city_id,
AVG(total_spend) AS avg_city_spend,
MAX(total_spend) AS max_city_spend
FROM (
SELECT
c.customer_id,
c.city_id,
SUM(oi.quantity * p.price) AS total_spend
FROM analytics.customers c
JOIN analytics.orders o ON c.customer_id = o.customer_id
JOIN analytics.order_items oi ON o.order_id = oi.order_id
JOIN analytics.products p ON oi.product_id = p.product_id
GROUP BY c.customer_id, c.city_id
) t
GROUP BY city_id;| city_id | avg_city_spend | max_city_spend |
|---|---|---|
| 2 | 198.75 | 240.00 |
| 5 | 154.30 | 182.50 |
Derived tables create inline pipelines.
CTEs make those pipelines named, readable, and reusable.
Build a readable data pipeline step by step.
A Common Table Expression (CTE) is a named, temporary result set defined using WITH.
WITHCTEs naturally form pipelines.
Each step builds on the previous one.
flowchart LR
A[Base Tables] --> B[CTE Step 1]
B --> C[CTE Step 2]
C --> D[Final Result]
Each CTE is a logical checkpoint.
Input Table: employees
| emp_id | department | salary |
|---|---|---|
| 1 | Sales | 1200 |
| 2 | Sales | 900 |
| 3 | IT | 1500 |
| 4 | IT | 1100 |
Q: Compute average salary per department and keep only departments above 1000.
\(\downarrow\)
| department | avg_salary |
|---|---|
| IT | 1300 |
What is the average order revenue per customer, and which customers are above the overall average?
Input Tables
ordersorder_itemsproducts(aggregated into tmp_order_revenue)
\(\downarrow\)
customer_avg
| customer_id | avg_order_revenue |
|---|---|
| 4 | 38.48 |
| 6 | 56.64 |
| 10 | 49.15 |
\(\downarrow\)
| customer_id | avg_order_revenue |
|---|---|
| 6 | 56.64 |
| 10 | 49.15 |
For each city, compute:
WITH customer_spend AS (
SELECT
c.customer_id,
c.city_id,
SUM(oi.quantity * p.price) AS total_spend
FROM analytics.customers c
JOIN analytics.orders o ON c.customer_id = o.customer_id
JOIN analytics.order_items oi ON o.order_id = oi.order_id
JOIN analytics.products p ON oi.product_id = p.product_id
GROUP BY c.customer_id, c.city_id
),
city_agg AS (
SELECT
city_id,
AVG(total_spend) AS avg_city_spend,
MAX(total_spend) AS max_city_spend
FROM customer_spend
GROUP BY city_id
)
SELECT *
FROM city_agg;\(\downarrow\)
| city_id | avg_city_spend | max_city_spend |
|---|---|---|
| 2 | 198.75 | 240.00 |
| 5 | 154.30 | 182.50 |
| Feature | Derived Table | CTE | Temporary Table |
|---|---|---|---|
| Named | No | Yes | Yes |
| Reusable | No | Within query | Across queries |
| Lifetime | Single query | Single query | Session |
| Readability | Medium | High | High |