Numeric Functions

Karen Hovhannisyan

2026-07-04

Functions

We are going to cover the Built-in SQL functions which will deal with the

Numbers
Texts
Dates
Data Types

From Session 4

Run Docker

Open Docker Desktop, Run and open Pgadmin:

docker compose up -d

Create `sales_analysis`

DROP TABLE IF EXISTS sales_analysis;

CREATE TABLE sales_analysis AS
SELECT
    s.transaction_id,

    o.order_date,
    DATE(o.order_date) AS order_date_date,
    o.year,
    o.quarter,
    o.month,

    c.customer_name,
    c.city,
    c.zip_code,

    p.product_name,
    p.category,
    p.price,

    e.first_name AS employee_first_name,
    e.last_name  AS employee_last_name,
    e.salary     AS employee_salary,

    s.quantity,
    s.discount,
    s.total_sales
FROM sales AS s
JOIN orders AS o
    ON s.order_id = o.order_id
JOIN customers AS c
    ON s.customer_id = c.customer_id
JOIN products AS p
    ON s.product_id = p.product_id
LEFT JOIN employees AS e
    ON s.employee_id = e.employee_id;

Indexes for Filtering Performance

CREATE INDEX idx_sales_analysis_order_date
    ON sales_analysis(order_date_date);

CREATE INDEX idx_sales_analysis_year
    ON sales_analysis(year);

CREATE INDEX idx_sales_analysis_city
    ON sales_analysis(city);

CREATE INDEX idx_sales_analysis_category
    ON sales_analysis(category);

What is a Function ?

SUM()

SUM() | Basic

transaction_id	total_sales
1001	120.50
1002	NULL
1003	99.50

\[\downarrow\]

SELECT
  SUM(total_sales) AS total_revenue
FROM sales_analysis;

\[\downarrow\]

total_revenue
220.00

SUM() | with GROUP BY

category	total_sales
Accessories	50.00
Accessories	70.00
Electronics	120.00

\[\downarrow\]

SELECT
  category,
  SUM(total_sales) AS total_revenue
FROM sales_analysis
GROUP BY category;

\[\downarrow\]

category	total_revenue
Accessories	120.00
Electronics	120.00

SUM() | Use Cases

Simple
- total revenue calculation
- total cost or volume
- high-level KPI reporting
Aggregated
- revenue by product category
- cost by department
- KPI breakdowns for dashboards

SUM() | ATTENTION

Analytical Warning | Always Check the Grain

Before using SUM(), always verify the grain of the table.

sales_analysis is at the transaction level
summing values duplicated per transaction (e.g. salaries) produces incorrect results
aggregation must match the entity you want to measure

Correct use of SUM() depends as much on data structure as on SQL syntax.

AVG()

AVG() | What It Does?

AVG() calculates the arithmetic mean of numeric values.

It answers the question: \(\rightarrow\) “What is the typical value?”

AVG() | Basic

transaction_id	discount
1001	0.10
1002	0.00
1003	0.20

\[\downarrow\]

SELECT
  AVG(discount) AS avg_discount
FROM sales_analysis;

\[\downarrow\]

avg_discount
0.10

AVG() | NULL Behavior

AVG() ignores NULL values.

transaction_id	discount
1001	0.10
1002	NULL
1003	0.20

\[\downarrow\]

SELECT
  AVG(discount) AS avg_discount
FROM sales_analysis;

\[\downarrow\]

avg_discount
0.15

AVG() | Why This Happens?

What AVG() actually computes:

\[ \frac{\text{SUM of non-NULL values}}{\text{COUNT of non-NULL values}} \]

In this case:

\[ \frac{0.10 + 0.20}{2} = 0.15 \]

AVG() | with GROUP BY

category	price
Accessories	25.00
Accessories	35.00
Electronics	120.00

\[\downarrow\]

SELECT
  category,
  AVG(price) AS avg_price
FROM sales_analysis
GROUP BY category;

\[\downarrow\]

category	avg_price
Accessories	30.00
Electronics	120.00

AVG() | Per-Entity Analysis

This shifts analysis from global to behavioral.

SELECT
  customer_name,
  AVG(total_sales) AS avg_transaction_value
FROM sales_analysis
GROUP BY customer_name;

AVG() | Use Cases

Simple
- average order value
- average discount rate
- average price
Aggregated
- average sales per customer
- average price by category
- benchmarking typical performance

AVG() | ATTENTION

Analytical Warning | Distribution Matters

The Average is sensitive to outliers:**

one extreme value can distort the result
skewed data leads to misleading averages

Before using AVG(), always consider:

outliers
skewness
whether a “typical” value makes sense

…

interpretation matters more than syntax

MIN() and MAX()

MIN() and MAX() | What They Do?

MIN() and MAX() are aggregate functions used to identify extreme values in a dataset.

They answer:

What is the smallest value?
What is the largest value?

Commonly used in:

exploratory analysis
data validation
boundary checks

MIN() and MAX() | Basic

transaction_id	total_sales
1001	120.50
1002	80.00
1003	99.50

\[\downarrow\]

SELECT
  MIN(total_sales) AS min_sale,
  MAX(total_sales) AS max_sale
FROM sales_analysis;

\[\downarrow\]

min_sale	max_sale
80.00	120.50

MIN() and MAX() | NULL Behavior

Both MIN() and MAX() ignore NULL values.

transaction_id	total_sales
1001	120.50
1002	NULL
1003	99.50

\[\downarrow\]

SELECT
  MIN(total_sales) AS min_sale,
  MAX(total_sales) AS max_sale
FROM sales_analysis;

\[\downarrow\]

min_sale	max_sale
99.50	120.50

MIN() and MAX() | with GROUP BY

category	price
Accessories	25.00
Accessories	35.00
Electronics	120.00

\[\downarrow\]

SELECT
  category,
  MIN(price) AS min_price,
  MAX(price) AS max_price
FROM sales_analysis
GROUP BY category;

\[\downarrow\]

category	min_price	max_price
Accessories	25.00	35.00
Electronics	120.00	120.00

MIN() and MAX() | Text Columns

product_name
Cable
Mouse
Keyboard

\[\downarrow\]

SELECT
  MIN(product_name) AS first_product,
  MAX(product_name) AS last_product
FROM sales_analysis;

\[\downarrow\]

first_product	last_product
Cable	Mouse

MIN() and MAX() | Use Cases

Numeric
- cheapest and most expensive products
- minimum and maximum sales values
- boundary validation
Text / Dates
- first and last records
- alphabetical boundary checks
- earliest and latest dates

MIN() and MAX() | ATTENTION

Analytical Warning | Extremes Are Not Typical

MIN() and MAX() show boundaries, not typical behavior:

they are sensitive to outliers
a single abnormal value can dominate the result

Always combine extremes with:

AVG()
COUNT()
distribution analysis

to get a reliable analytical picture.

…

MIN() and MAX() define limits, not reality

COUNT()

COUNT() | What It Does

COUNT() measures how many rows or values exist.

It is used for:

volume
cardinality
data completeness

Unlike SUM() or AVG(), it does not depend on numeric magnitude.

COUNT() | Basic Row Counting

transaction_id	discount
1001	0.10
1002	NULL
1003	0.20

\[\downarrow\]

SELECT
  COUNT(*) AS total_rows
FROM sales_analysis;

\[\downarrow\]

total_rows
3

COUNT(column) | Non-NULL Values

transaction_id	discount
1001	0.10
1002	NULL
1003	0.20

\[\downarrow\]

SELECT
  COUNT(discount) AS non_null_discounts
FROM sales_analysis;

\[\downarrow\]

non_null_discounts
2

COUNT() | with GROUP BY

product_name	transaction_id
Mouse	1001
Mouse	1002
Keyboard	1003

\[\downarrow\]

SELECT
  product_name,
  COUNT(*) AS transaction_count
FROM sales_analysis
GROUP BY product_name;

\[\downarrow\]

product_name	transaction_count
Mouse	2
Keyboard	1

COUNT(DISTINCT) | Unique Values

To answer:

how many unique customers exist?
how many unique entities are present?

SELECT
  COUNT(DISTINCT customer_name) AS unique_customers
FROM sales_analysis;

COUNT(DISTINCT) | with GROUP BY

Number of Customers per City?

SELECT
  city,
  COUNT(DISTINCT customer_name) AS unique_customers
FROM sales_analysis
GROUP BY city;

COUNT() | Duplicate Detection

SELECT
  transaction_id,
  COUNT(*) AS duplicate_count
FROM sales_analysis
GROUP BY transaction_id
HAVING COUNT(*) > 1;

COUNT() | NULL Rules

COUNT(*) counts rows (NULLs included)
COUNT(column) ignores NULLs
COUNT(DISTINCT column) ignores NULLs and duplicates
all NULL values → result is 0

Understanding this prevents silent metric errors.

COUNT() | ATTENTION

Analytical Warning | Define What You Are Counting

Before using COUNT(), always ask:

am I counting rows or entities?
do I need DISTINCT?
is the table grain aligned with my question?

…

Correct syntax does not guarantee correct analytics

Row-Level Arithmetic

What Is Row-Level Arithmetic?

Row-level arithmetic refers to calculations performed on each individual row of a table.

Unlike aggregate functions (SUM, AVG, COUNT):

rows are not reduced
new derived columns are created

Why Row-Level Arithmetic Matters

Row-level arithmetic is foundational for:

feature engineering
revenue and cost calculations
metric normalization
preparing data for aggregation

In practice:

row-level calculations → aggregation

Basic Calculation

transaction_id	quantity	price
1001	2	25.00
1002	1	80.00

\[\downarrow\]

SELECT
  transaction_id,
  quantity * price AS calculated_revenue
FROM sales_analysis;

\[\downarrow\]

transaction_id	calculated_revenue
1001	50.00
1002	80.00

Combining Multiple Columns

transaction_id	quantity	price	discount
1001	2	25.00	0.10
1002	1	80.00	0.00

\[\downarrow\]

SELECT
  transaction_id,
  quantity * price * (1 - discount) AS net_revenue
FROM sales_analysis;

\[\downarrow\]

transaction_id	net_revenue
1001	45.00
1002	80.00

Order of Operations

SQL follows standard arithmetic precedence:

parentheses ()
multiplication * and division /
addition + and subtraction -

Use parentheses to make logic explicit.

SELECT
  transaction_id,
  (quantity * price) - (quantity * price * discount) AS net_revenue
FROM sales_analysis;

Numeric Transformations

Row-level arithmetic is often combined with numeric functions.

SELECT
  transaction_id,
  ROUND(quantity * price, 2) AS rounded_revenue
FROM sales_analysis;

Row-Level vs Aggregation

Two distinct concepts:

row-level calculations → preserve rows
aggregate calculations → reduce rows

SELECT
  SUM(quantity * price) AS total_revenue
FROM sales_analysis;

Here:

calculation happens per row
aggregation happens after

ATTENTION

Analytical Warning | NULL Propagation

In arithmetic expressions:

if any operand is NULL, the result is NULL
NULLs can silently remove rows from aggregations

…

Row-level arithmetic is the bridge between raw data and analytics

CEILING() and FLOOR()

What Are CEILING() and FLOOR()?

CEILING() and FLOOR() are numeric functions used to round values to integer boundaries.

CEILING() rounds a value up to the nearest integer
FLOOR() rounds a value down to the nearest integer

They are commonly used for:

threshold logic
range construction
distribution analysis

Thresholds and Ranges | Input

transaction_id	total_sales
1001	120.10
1002	80.90

\[\downarrow\]

SELECT
  transaction_id,
  CEILING(total_sales) AS rounded_up,
  FLOOR(total_sales)   AS rounded_down
FROM sales_analysis;

\[\downarrow\]

transaction_id	rounded_up	rounded_down
1001	121	120
1002	81	80

Thresholds and Ranges | Interpretation

CEILING() is useful when values must meet a minimum threshold
FLOOR() is useful when values must not exceed a maximum threshold

CEILING() and FLOOR() | Use Cases

billing thresholds (minimum chargeable units)
rounding prices or quantities for operational rules
constructing numeric ranges
discretizing continuous values

CEILING() with GROUP BY

CEILING() can be used to group continuous numeric values into fixed-width ranges.

This is a common technique in:

exploratory analysis
descriptive analytics
dashboard preparation

CEILING() with GROUP BY | Input

transaction_id	total_sales
1001	12.40
1002	47.80
1003	52.10
1004	79.90
1005	101.25
1006	138.60

CEILING() with GROUP BY | Output

transaction_id	total_sales	revenue_range (50)
1001	12.40	50
1002	47.80	50
1003	52.10	100
1004	79.90	100
1005	101.25	150
1006	138.60	150

CEILING() with GROUP BY | Range Logic

Range assignment formula:

\[ \text{revenue_range} = \lceil \frac{\text{total_sales}}{50} \rceil \times 50 \]

This means:

\(0 < x \le 50 \rightarrow 50\)
\(50 < x \le 100 \rightarrow 100\)
\(100 < x \le 150 \rightarrow 150\)

CEILING() with GROUP BY | Query

SELECT
  CEILING(total_sales / 50.0) * 50 AS revenue_range,
  COUNT(*) AS transactions,
  SUM(total_sales) AS total_revenue
FROM sales_analysis
GROUP BY CEILING(total_sales / 50.0) * 50
ORDER BY revenue_range;

Revenue Ranges | Intermediate Mapping

transaction_id	total_sales	revenue_range
1001	12.40	50
1002	47.80	50
1003	52.10	100
1004	79.90	100
1005	101.25	150
1006	138.60	150

CEILING() with GROUP BY | Query Output

revenue_range	transactions	total_revenue
50	2	60.20
100	2	132.00
150	2	239.85

CEILING() with GROUP BY | Interpretation

Each row represents a revenue range, not individual transactions.

revenue_range = 100 represents
\[50 < \text{total\_sales} \le 100\]
transactions shows how many transactions fall into that range
total_revenue shows the total sales within the range

Revenue Ranges | ATTENTION

Analytical Warning | Ranges Are Design Choices

Ranges created with CEILING() or FLOOR() are analytical assumptions.

different range widths lead to different interpretations
overly wide ranges hide structure
overly narrow ranges introduce noise

Always justify:

range size
boundary logic
business meaning

…

CEILING() and FLOOR() help transform continuous values into
interpretable analytical structures given the fact that ranges are chosen deliberately

COALESCE()

COALESCE() | What It Does

COALESCE() is a SQL function used for explicit NULL handling.

COALESCE() | Why It Matters

By default, most aggregate functions ignore NULL values.

Note

COALESCE() lets you override this behavior intentionally.

Common strategies include:

replacing NULL with 0
replacing NULL with the average
replacing NULL with the median

Treat Missing as Zero | Input

transaction_id	discount
1001	0.10
1002	NULL
1003	0.20

\[\downarrow\]

SELECT
  AVG(COALESCE(discount, 0)) AS avg_discount_with_zeros
FROM sales_analysis;

\[\downarrow\]

avg_discount_with_zeros
0.10

Treat Missing as Zero | Interpretation

\[ (0.10 + 0 + 0.20) / 3 = 0.10 \]

When to Use

NULL truly means zero impact
KPIs explicitly require zero inclusion
operational metrics (counts, volumes)

Mean Imputation | Concept

Missing values are replaced with the overall average

SELECT AVG(discount) AS avg_discount
       FROM sales_analysis

\[\downarrow\]

SELECT
  AVG(COALESCE(discount, avg_discount)) AS avg_discount_mean_imputed
FROM sales_analysis;

Mean Imputation | When to Use

Appropriate when:

distribution is approximately symmetric
no strong outliers exist
missing values are random and rare
reporting-level analysis

Avoid when:

data is skewed
outliers are present
performance or incentive metrics are involved

Robust Imputation | Concept

Median imputation is more robust than average imputation because it is not affected by outliers.

PostgreSQL supports median via PERCENTILE_CONT(0.5).

\[\downarrow\]

SELECT
         PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY discount) AS median_discount
FROM sales_analysis

\[\downarrow\]

SELECT
  AVG(COALESCE(discount, median_discount)) AS avg_discount_median_imputed
FROM sales_analysis,

Median Imputation | Notes

When to Use

skewed distributions
presence of outliers
financial or behavioral metrics
fairness-sensitive analysis

When NOT to Use

very small datasets
strict business rules
operational counting logic

Practical Rule of Thumb

average ≈ median → symmetric
average > median → right-skewed
average < median → left-skewed

Tip

Review skewness concepts from
Intro to Statistics – Distributions

Average vs Median

Scenario	Prefer
Symmetric distribution	Average
Skewed distribution	Median
Presence of outliers	Median
KPI reporting	Average
Robust / fairness analysis	Median
Modeling or downstream ML	Median

Numeric Functions

Functions

From Session 4

Run Docker

Create sales_analysis

Indexes for Filtering Performance

What is a Function ?

SUM()

SUM() | Basic

SUM() | with GROUP BY

SUM() | Use Cases

SUM() | ATTENTION

AVG()

AVG() | What It Does?

AVG() | Basic

AVG() | NULL Behavior

AVG() | Why This Happens?

AVG() | with GROUP BY

AVG() | Per-Entity Analysis

AVG() | Use Cases

AVG() | ATTENTION

…

MIN() and MAX()

MIN() and MAX() | What They Do?

MIN() and MAX() | Basic

MIN() and MAX() | NULL Behavior

MIN() and MAX() | with GROUP BY

MIN() and MAX() | Text Columns

MIN() and MAX() | Use Cases

MIN() and MAX() | ATTENTION

…

COUNT()

COUNT() | What It Does

COUNT() | Basic Row Counting

COUNT(column) | Non-NULL Values

COUNT() | with GROUP BY

COUNT(DISTINCT) | Unique Values

COUNT(DISTINCT) | with GROUP BY

COUNT() | Duplicate Detection

COUNT() | NULL Rules

COUNT() | ATTENTION

…

Row-Level Arithmetic

What Is Row-Level Arithmetic?

Why Row-Level Arithmetic Matters

Basic Calculation

Combining Multiple Columns

Order of Operations

Numeric Transformations

Row-Level vs Aggregation

ATTENTION

…

CEILING() and FLOOR()

What Are CEILING() and FLOOR()?

Thresholds and Ranges | Input

Thresholds and Ranges | Interpretation

CEILING() and FLOOR() | Use Cases

CEILING() with GROUP BY

CEILING() with GROUP BY | Input

CEILING() with GROUP BY | Output

CEILING() with GROUP BY | Range Logic

CEILING() with GROUP BY | Query

Revenue Ranges | Intermediate Mapping

CEILING() with GROUP BY | Query Output

CEILING() with GROUP BY | Interpretation

Revenue Ranges | ATTENTION

…

COALESCE()

COALESCE() | What It Does

COALESCE() | Why It Matters

Treat Missing as Zero | Input

Treat Missing as Zero | Interpretation

Mean Imputation | Concept

Mean Imputation | When to Use

Robust Imputation | Concept

Median Imputation | Notes

Practical Rule of Thumb

Average vs Median

Create `sales_analysis`