Numeric Functions
2026-04-01
We are going to cover the Built-in SQL functions which will deal with the
Open Docker Desktop, Run and open Pgadmin:
sales_analysisDROP TABLE IF EXISTS sales_analysis;
CREATE TABLE sales_analysis AS
SELECT
s.transaction_id,
o.order_date,
DATE(o.order_date) AS order_date_date,
o.year,
o.quarter,
o.month,
c.customer_name,
c.city,
c.zip_code,
p.product_name,
p.category,
p.price,
e.first_name AS employee_first_name,
e.last_name AS employee_last_name,
e.salary AS employee_salary,
s.quantity,
s.discount,
s.total_sales
FROM sales AS s
JOIN orders AS o
ON s.order_id = o.order_id
JOIN customers AS c
ON s.customer_id = c.customer_id
JOIN products AS p
ON s.product_id = p.product_id
LEFT JOIN employees AS e
ON s.employee_id = e.employee_id;| transaction_id | total_sales |
|---|---|
| 1001 | 120.50 |
| 1002 | NULL |
| 1003 | 99.50 |
\[\downarrow\]
\[\downarrow\]
| total_revenue |
|---|
| 220.00 |
| category | total_sales |
|---|---|
| Accessories | 50.00 |
| Accessories | 70.00 |
| Electronics | 120.00 |
\[\downarrow\]
\[\downarrow\]
| category | total_revenue |
|---|---|
| Accessories | 120.00 |
| Electronics | 120.00 |
Analytical Warning | Always Check the Grain
Before using SUM(), always verify the grain of the table.
sales_analysis is at the transaction levelCorrect use of SUM() depends as much on data structure as on SQL syntax.
AVG() calculates the arithmetic mean of numeric values.
It answers the question: \(\rightarrow\) “What is the typical value?”
| transaction_id | discount |
|---|---|
| 1001 | 0.10 |
| 1002 | 0.00 |
| 1003 | 0.20 |
\[\downarrow\]
\[\downarrow\]
| avg_discount |
|---|
| 0.10 |
AVG() ignores NULL values.
| transaction_id | discount |
|---|---|
| 1001 | 0.10 |
| 1002 | NULL |
| 1003 | 0.20 |
\[\downarrow\]
\[\downarrow\]
| avg_discount |
|---|
| 0.15 |
What AVG() actually computes:
\[ \frac{\text{SUM of non-NULL values}}{\text{COUNT of non-NULL values}} \]
In this case:
\[ \frac{0.10 + 0.20}{2} = 0.15 \]
| category | price |
|---|---|
| Accessories | 25.00 |
| Accessories | 35.00 |
| Electronics | 120.00 |
\[\downarrow\]
\[\downarrow\]
| category | avg_price |
|---|---|
| Accessories | 30.00 |
| Electronics | 120.00 |
This shifts analysis from global to behavioral.
Analytical Warning | Distribution Matters
The Average is sensitive to outliers:**
Before using AVG(), always consider:
interpretation matters more than syntax
MIN() and MAX() are aggregate functions used to identify extreme values in a dataset.
They answer:
Commonly used in:
| transaction_id | total_sales |
|---|---|
| 1001 | 120.50 |
| 1002 | 80.00 |
| 1003 | 99.50 |
\[\downarrow\]
\[\downarrow\]
| min_sale | max_sale |
|---|---|
| 80.00 | 120.50 |
Both MIN() and MAX() ignore NULL values.
| transaction_id | total_sales |
|---|---|
| 1001 | 120.50 |
| 1002 | NULL |
| 1003 | 99.50 |
\[\downarrow\]
\[\downarrow\]
| min_sale | max_sale |
|---|---|
| 99.50 | 120.50 |
| category | price |
|---|---|
| Accessories | 25.00 |
| Accessories | 35.00 |
| Electronics | 120.00 |
\[\downarrow\]
\[\downarrow\]
| category | min_price | max_price |
|---|---|---|
| Accessories | 25.00 | 35.00 |
| Electronics | 120.00 | 120.00 |
| product_name |
|---|
| Cable |
| Mouse |
| Keyboard |
\[\downarrow\]
\[\downarrow\]
| first_product | last_product |
|---|---|
| Cable | Mouse |
Analytical Warning | Extremes Are Not Typical
MIN() and MAX() show boundaries, not typical behavior:
Always combine extremes with:
AVG()COUNT()to get a reliable analytical picture.
MIN() and MAX() define limits, not reality
COUNT() measures how many rows or values exist.
It is used for:
Unlike SUM() or AVG(), it does not depend on numeric magnitude.
| transaction_id | discount |
|---|---|
| 1001 | 0.10 |
| 1002 | NULL |
| 1003 | 0.20 |
\[\downarrow\]
\[\downarrow\]
| total_rows |
|---|
| 3 |
| transaction_id | discount |
|---|---|
| 1001 | 0.10 |
| 1002 | NULL |
| 1003 | 0.20 |
\[\downarrow\]
\[\downarrow\]
| non_null_discounts |
|---|
| 2 |
| product_name | transaction_id |
|---|---|
| Mouse | 1001 |
| Mouse | 1002 |
| Keyboard | 1003 |
\[\downarrow\]
\[\downarrow\]
| product_name | transaction_count |
|---|---|
| Mouse | 2 |
| Keyboard | 1 |
To answer:
Number of Customers per City?
COUNT(*) counts rows (NULLs included)COUNT(column) ignores NULLsCOUNT(DISTINCT column) ignores NULLs and duplicates0Understanding this prevents silent metric errors.
Analytical Warning | Define What You Are Counting
Before using COUNT(), always ask:
DISTINCT?Correct syntax does not guarantee correct analytics
Row-level arithmetic refers to calculations performed on each individual row of a table.
Unlike aggregate functions (SUM, AVG, COUNT):
Row-level arithmetic is foundational for:
In practice:
row-level calculations → aggregation
| transaction_id | quantity | price |
|---|---|---|
| 1001 | 2 | 25.00 |
| 1002 | 1 | 80.00 |
\[\downarrow\]
\[\downarrow\]
| transaction_id | calculated_revenue |
|---|---|
| 1001 | 50.00 |
| 1002 | 80.00 |
| transaction_id | quantity | price | discount |
|---|---|---|---|
| 1001 | 2 | 25.00 | 0.10 |
| 1002 | 1 | 80.00 | 0.00 |
\[\downarrow\]
\[\downarrow\]
| transaction_id | net_revenue |
|---|---|
| 1001 | 45.00 |
| 1002 | 80.00 |
SQL follows standard arithmetic precedence:
()* and division /+ and subtraction -Use parentheses to make logic explicit.
Row-level arithmetic is often combined with numeric functions.
Two distinct concepts:
Here:
Analytical Warning | NULL Propagation
In arithmetic expressions:
NULLs can silently remove rows from aggregationsRow-level arithmetic is the bridge between raw data and analytics
CEILING() and FLOOR() are numeric functions used to round values to integer boundaries.
CEILING() rounds a value up to the nearest integerFLOOR() rounds a value down to the nearest integerThey are commonly used for:
| transaction_id | total_sales |
|---|---|
| 1001 | 120.10 |
| 1002 | 80.90 |
\[\downarrow\]
\[\downarrow\]
| transaction_id | rounded_up | rounded_down |
|---|---|---|
| 1001 | 121 | 120 |
| 1002 | 81 | 80 |
CEILING() is useful when values must meet a minimum thresholdFLOOR() is useful when values must not exceed a maximum thresholdCEILING() can be used to group continuous numeric values into fixed-width ranges.
This is a common technique in:
| transaction_id | total_sales |
|---|---|
| 1001 | 12.40 |
| 1002 | 47.80 |
| 1003 | 52.10 |
| 1004 | 79.90 |
| 1005 | 101.25 |
| 1006 | 138.60 |
| transaction_id | total_sales | revenue_range (50) |
|---|---|---|
| 1001 | 12.40 | 50 |
| 1002 | 47.80 | 50 |
| 1003 | 52.10 | 100 |
| 1004 | 79.90 | 100 |
| 1005 | 101.25 | 150 |
| 1006 | 138.60 | 150 |
Range assignment formula:
\[ \text{revenue_range} = \lceil \frac{\text{total_sales}}{50} \rceil \times 50 \]
This means:
| transaction_id | total_sales | revenue_range |
|---|---|---|
| 1001 | 12.40 | 50 |
| 1002 | 47.80 | 50 |
| 1003 | 52.10 | 100 |
| 1004 | 79.90 | 100 |
| 1005 | 101.25 | 150 |
| 1006 | 138.60 | 150 |
| revenue_range | transactions | total_revenue |
|---|---|---|
| 50 | 2 | 60.20 |
| 100 | 2 | 132.00 |
| 150 | 2 | 239.85 |
Each row represents a revenue range, not individual transactions.
revenue_range = 100 representstransactions shows how many transactions fall into that rangetotal_revenue shows the total sales within the rangeAnalytical Warning | Ranges Are Design Choices
Ranges created with CEILING() or FLOOR() are analytical assumptions.
Always justify:
CEILING() and FLOOR() help transform continuous values into
interpretable analytical structures given the fact that ranges are chosen deliberately
COALESCE() is a SQL function used for explicit NULL handling.
By default, most aggregate functions ignore NULL values.
Note
COALESCE() lets you override this behavior intentionally.
Common strategies include:
NULL with 0NULL with the averageNULL with the median| transaction_id | discount |
|---|---|
| 1001 | 0.10 |
| 1002 | NULL |
| 1003 | 0.20 |
\[\downarrow\]
\[\downarrow\]
| avg_discount_with_zeros |
|---|
| 0.10 |
\[ (0.10 + 0 + 0.20) / 3 = 0.10 \]
When to Use
NULL truly means zero impactMissing values are replaced with the overall average
\[\downarrow\]
Appropriate when:
Avoid when:
Median imputation is more robust than average imputation because it is not affected by outliers.
PostgreSQL supports median via PERCENTILE_CONT(0.5).
\[\downarrow\]
\[\downarrow\]
When to Use
When NOT to Use
Tip
Review skewness concepts from
Intro to Statistics – Distributions
| Scenario | Prefer |
|---|---|
| Symmetric distribution | Average |
| Skewed distribution | Median |
| Presence of outliers | Median |
| KPI reporting | Average |
| Robust / fairness analysis | Median |
| Modeling or downstream ML | Median |