Basic Queries: DDL DLM

SQL for Data Analysis

Karen Hovhannisyan

2026-07-04

Learning Goals

By the end of this section, you will be able to:

explain what happens behind the scenes when a SQL query is executed
understand SQL as a declarative language
interpret query plans at a high level
write more efficient analytical queries
use core SQL clauses to refine query performance

Behind the Scenes

Why “Behind the Scenes” Matters

Before diving into more SQL syntax, let’s pause to understand what happens behind the scenes when a SQL query is executed.

Think about an e-commerce company, similar to our products–sales case study.

inventory management
order fulfillment
payment processing
delivery optimization

The same idea applies to SQL queries.

Important

What you write is only part of the story; how the database executes it matters just as much.

SQL as a Declarative Language

SQL is a declarative programming language.

This means:

you tell the database what result you want
you do not tell it how to get that result

Important

The database engine (RDBMS) decides the most efficient execution strategy.

Declarative Query Example

SELECT 
  product_name
FROM products;

This statement tells the database:

return values from product_name
read data from the products table

It does not specify:

which index to use
how rows are scanned
whether execution is sequential or indexed

All of this is handled internally by the database engine.

Query Optimization Happens Automatically

When a query is submitted:

the database parses the SQL
syntax and permissions are checked
the query planner evaluates execution strategies
the optimizer selects the best plan
results are returned

Writing efficient SQL means trusting the database, while still writing clear and well-structured queries.

Why This Matters for Data Analysts

As a data analyst:

you focus on business logic and correctness
the database focuses on performance and execution
clean SQL enables better optimization
proper indexes and constraints improve speed

Understanding this process helps you:

diagnose slow queries
write scalable SQL
collaborate with data engineers and DBAs

Query Plan

Why Query Plans Matter

As queries grow more complex and datasets become larger, performance matters.

Poorly optimized queries can:

slow down dashboards
increase infrastructure costs
delay decision-making

Faster queries mean:

lower computational cost
better system performance
quicker insights

What Is a Query Plan?

A query plan shows how the database intends to execute a query.

Most RDBMSs (including PostgreSQL) estimate:

execution cost
number of rows processed
operations such as scans, joins, filters

You generate a query plan using EXPLAIN.

Query Plan Example

EXPLAIN
SELECT 
  *
FROM sales;

PostgreSQL returns a plan, not the data.

Example output:

Seq Scan on sales  (cost=0.00..92.00 rows=5000 width=34)

Reading a Simple Query Plan

Seq Scan → full table scan
sales → table being scanned
cost → estimated execution cost
rows → estimated number of rows
width → average row size in bytes

Sequential scans are common when:

tables are small
no suitable index exists
most rows are needed

Refining Your SQL Queries

SELECT Refresher

Basic structure:

SELECT 
    column_name_1,
    column_name_2
FROM table_name;

Example using products:

SELECT
  product_name,
  price,
  category
FROM products;

Why Column Selection Matters

Selecting only needed columns:

reduces data transfer
improves performance
makes results easier to interpret

Try It Yourself

Compare:

EXPLAIN
SELECT 
  *
FROM products;

EXPLAIN
SELECT 
  product_name, 
  price
FROM products;

ORDER BY

Sorting Results

ORDER BY sorts results in ascending or descending order.

Common use cases:

first or last records
highest or lowest values
alphabetical ordering

ORDER BY Examples

Default (ascending):

SELECT 
  product_name
FROM products
ORDER BY product_name;

Descending (Z \(\rightarrow\) A):

SELECT
  product_name,
  category
FROM products
ORDER BY product_name DESC;

Numeric sorting:

SELECT
  product_name,
  price
FROM products
ORDER BY price DESC;

ORDER BY Key Rules

applied after SELECT and FROM
default order is ascending
DESC reverses order
supports multiple columns

ORDER BY category ASC, price DESC;

LIMIT

Limiting Output Size

LIMIT restricts the number of rows returned.

Useful when:

previewing data
reducing cost
retrieving top-N results

LIMIT Example

SELECT
  product_name,
  price
FROM products
ORDER BY price DESC
LIMIT 10;

\[\downarrow\]

sorts by price
returns only 10 rows

Important

LIMIT must be at the end
value must be numeric

GROUP BY

GROUP BY Basics

GROUP BY groups rows sharing the same values.

It is commonly used with aggregate functions:

COUNT()
SUM()
AVG()
MIN()
MAX()

GROUP BY Example

SELECT
  product_id,
  SUM(total_sales) AS total_revenue
FROM sales
GROUP BY product_id;

\[\downarrow\]

groups by product
aggregates revenue
returns one row per product

GROUP BY with ORDER BY and LIMIT

Returns top 5 products by revenue.

SELECT
  product_id,
  SUM(total_sales) AS total_revenue
FROM sales
GROUP BY product_id
ORDER BY total_revenue DESC
LIMIT 5;

DISTINCT

DISTINCT Basics

DISTINCT returns unique values.

Syntax:

SELECT DISTINCT 
  column_name
FROM table_name;

DISTINCT Examples

SELECT DISTINCT 
  category
FROM products;

Multiple columns:

SELECT DISTINCT
  category,
  price
FROM products;

DISTINCT vs GROUP BY

Both can return unique combinations.

DISTINCT:

SELECT DISTINCT
  category,
  price
FROM products;

GROUP BY:

SELECT
  category,
  price
FROM products
GROUP BY category, price;

Use:

DISTINCT for simplicity
GROUP BY when aggregation is needed

HAVING

Why HAVING Exists

HAVING is used to filter aggregated results \[\rightarrow\] filters groups after aggregation.
WHERE filters rows before aggregation

This distinction is critical for analytical queries.

WHERE vs HAVING | Conceptual Difference

WHERE \(\rightarrow\) filters raw rows
GROUP BY \(\rightarrow\) creates groups
HAVING \(\rightarrow\) filters aggregated groups

You cannot use aggregate functions in WHERE.

Logical Query Execution Order

SQL is written top-down, but executed differently.

Logical order (simplified):

FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
LIMIT

This explains why HAVING exists.

Basic HAVING Syntax

SELECT
  group_column,
  AGGREGATE_FUNCTION(column)
FROM table_name
GROUP BY group_column
HAVING AGGREGATE_FUNCTION(column) condition;

HAVING always works together with GROUP BY.

HAVING Example | Filter by Total Revenue

Suppose you want products that generated more than 10,000 in total revenue.

SELECT
  product_id,
  SUM(total_sales) AS total_revenue
FROM sales
GROUP BY product_id
HAVING SUM(total_sales) > 10000;

\[\downarrow\]

groups sales by product
calculates total revenue per product
keeps only products above the threshold

HAVING vs WHERE | Practical Comparison

Using WHERE (incorrect):

SELECT
  product_id,
  SUM(total_sales)
FROM sales
WHERE SUM(total_sales) > 10000
GROUP BY product_id;

This query fails because SUM() is not available at the WHERE stage.

Combining WHERE and HAVING

You can (and often should) use both.

WHERE filters rows early
HAVING filters aggregated results

SELECT
  product_id,
  SUM(total_sales) AS total_revenue
FROM sales
WHERE total_sales > 0
GROUP BY product_id
HAVING SUM(total_sales) > 10000;

This is more efficient than using HAVING alone.

HAVING with COUNT

Find products with at least 50 transactions.

SELECT
  product_id,
  COUNT(transaction_id) AS transaction_count
FROM sales
GROUP BY product_id
HAVING COUNT(transaction_id) >= 50;

HAVING with Multiple Conditions

SELECT
  product_id,
  COUNT(transaction_id) AS transaction_count,
  SUM(total_sales) AS total_revenue
FROM sales
GROUP BY product_id
HAVING
  COUNT(transaction_id) >= 50
  AND SUM(total_sales) > 10000;

This filters on multiple aggregated metrics.

When to Use HAVING

Use HAVING when:

filtering aggregated results
working with SUM, COUNT, AVG, etc.
applying business thresholds to groups

Do not use HAVING when row-level filtering is sufficient.

Execution order

SELECT                                   -- 6
  p.product_id,                          -- 6
  SUM(s.total_sales) AS total_revenue    -- 6
FROM sales AS s                          -- 1
JOIN products AS p                       -- 2
  ON s.product_id = p.product_id         -- 2
WHERE s.total_sales > 0                  -- 3
GROUP BY p.product_id                    -- 4
HAVING SUM(s.total_sales) > 10000        -- 5
ORDER BY total_revenue DESC              -- 7
LIMIT 5;                                 -- 8

Basic Queries: DDL DLM

Learning Goals

Learning Goals

Behind the Scenes

Why “Behind the Scenes” Matters

SQL as a Declarative Language

Declarative Query Example

Query Optimization Happens Automatically

Why This Matters for Data Analysts

Query Plan

Why Query Plans Matter

What Is a Query Plan?

Query Plan Example

Reading a Simple Query Plan

Refining Your SQL Queries

Why Refinement Matters

SELECT Refresher

Why Column Selection Matters

ORDER BY

Sorting Results

ORDER BY Examples

ORDER BY Key Rules

LIMIT

Limiting Output Size

LIMIT Example

GROUP BY

GROUP BY Basics

GROUP BY Example

GROUP BY with ORDER BY and LIMIT

DISTINCT

DISTINCT Basics

DISTINCT Examples

DISTINCT vs GROUP BY

HAVING

Why HAVING Exists

WHERE vs HAVING | Conceptual Difference

Logical Query Execution Order

Basic HAVING Syntax

HAVING Example | Filter by Total Revenue

HAVING vs WHERE | Practical Comparison

Combining WHERE and HAVING

HAVING with COUNT

HAVING with Multiple Conditions

When to Use HAVING

Execution order