Data Analytics Bootcamp
  • Syllabus
  • Statistical Thinking
  • SQL
  • Python
  • Tableau
  • Lab
  • Capstone
  1. SQL
  2. SQL
  3. Session 03: DA with SQL | Data Types & Constraints
  • Syllabus
  • Statistical Thinking
    • Statistics
      • Statistics Session 01: Data Layers and Bias in Data
      • Statistics Session 02: Data Types
      • Statistics Session 03: Probabilistic Distributions
      • Statistics Session 04: Probabilistic Distributions
      • Statistics Session 05: Sampling
      • Statistics Session 06: Inferential Statistics
      • Slides
        • Course Intro
        • Descriptive Stats
        • Data Types
        • Continuous Distributions
        • Discrete Distributions
        • Sampling
        • Hypothesis Testing
  • SQL
    • SQL
      • Session 01: Intro to Relational Databases
      • Session 02: Intro to PostgreSQL
      • Session 03: DA with SQL | Data Types & Constraints
      • Session 04: DA with SQL | Filtering
      • Session 05: DA with SQL | Numeric Functions
      • Session 06: DA with SQL | String Functions
      • Session 07: DA with SQL | Date Functions
      • Session 08: DA with SQL | JOINs
      • Session 09: DA with SQL | Advanced SQL
      • Session 10: DA with SQL | Advanced SQL Functions
      • Session 11: DA with SQL | UDFs, Stored Procedures
      • Session 12: DA with SQL | Advanced Aggregations
      • Session 13: DA with SQL | Final Project
      • Slides
        • Intro to Relational Databases
        • Intro to PostgreSQL
        • Basic Queries: DDL DLM
        • Filtering
        • Numeric Functions
        • String Functions
        • Date Functions
        • Normalization and JOINs
        • Temporary Tables
        • Advanced SQL Functions
        • Reporting and Analysis with SQL
        • Advanced Aggregations
  • Python
    • Python
      • Session 01: Programming for Data Analysts
      • Session 02: Python basic Syntax, Data Structures
      • Session 03: Introduction to Pandas
      • Session 04: Advanced Pandas
      • Session 05: Intro to Data Visualization
      • Session 06: Data Visualization
      • Session 07: Working with Dates
      • Session 08: Data Visualization | Plotly
      • Session 09: Customer Segmentation | RFM
      • Slides
        • Data Analyst
  • Tableau
    • Tableau
      • Tableau Session 01: Introduction to Tableau
      • Tableau Session 02: Intermediate Visual Analytics
      • Tableau Session 03: Advanced Analytics
      • Tableau Session 04: Dashboard Design & Performance
      • Slides
        • Data Analyst
        • Data Analyst
        • Data Analyst
        • Data Analyst

On this page

  • Behind the Scenes
    • SQL as a Declarative Language
    • Query Optimization Happens Automatically
    • Why This Matters for Data Analysts
  • Query Plan
    • What Is a Query Plan?
  • Refining Your SQL Queries
    • SELECT
    • Why Column Selection Matters?
    • ORDER BY
    • LIMIT
    • GROUP BY
    • DISTINCT
    • GROUP BY vs DISTINCT
    • HAVING
  • Homework
    • Task 1 | Enforce Missing Business Rules with ALTER TABLE
    • Task 2 | Add a New Analytical Attribute
    • Task 3 | Add Indexes for Query Performance
    • Task 4 | Validate Index Usage with EXPLAIN
    • Task 5 | Reduce Query Cost by Refining SELECT
    • Task 6 | ORDER BY and LIMIT for Business Questions
    • Task 7 | DISTINCT vs GROUP BY (Efficiency Comparison)
    • Task 8 | Constraint Enforcement Test
    • Task 9 | Reflection (Short Answer)
    • Submission Rules
  1. SQL
  2. SQL
  3. Session 03: DA with SQL | Data Types & Constraints

Session 03: DA with SQL | Data Types & Constraints

SQL
Data Types
Schemas
Keys
Indexes
Syntax

Behind the Scenes

Before diving into more SQL syntax, let’s pause for a moment to consider what happens behind the scenes when a SQL query is executed.

To understand why this matters, think about an e-commerce company that sells products online (similar to our case study). If management wants to reduce operational costs or improve customer experience, they must analyze and optimize many background processes—inventory management, order fulfillment, payment processing, and delivery.

The same idea applies when running SQL queries.

Important

What you write is only part of the story; how the database executes it matters just as much.

SQL as a Declarative Language

The first key concept to understand is that SQL is a declarative programming language.

This means:

  • You tell the database what result you want
  • You do not tell it how to get that result
Important

When you write a SELECT statement, you describe the desired output, and the database engine (RDBMS) decides the most efficient way to retrieve the data.

Consider the following Declarative Query:

SELECT product_name
FROM products;

This statement tells the database:

  • return all values from the product_name column
  • read the data from the products table

What it does not specify:

  • which indexes to use
  • how the data should be scanned
  • whether to read data sequentially or via an index

All of these decisions are handled internally by the database engine.

Query Optimization Happens Automatically

When a query is submitted:

  1. The database parses the SQL statement
  2. It checks syntax and permissions
  3. The query planner evaluates multiple execution strategies
  4. The optimizer selects the most efficient execution plan
  5. The query is executed and results are returned

Because of this process, part of writing efficient SQL queries is trusting the database to do its job—while still writing clear, well-structured queries that allow the optimizer to work effectively.

Why This Matters for Data Analysts

As a data analyst, this means:

  • You focus on business logic and correctness
  • The database focuses on performance and execution
  • Clean SQL enables better optimization
  • Proper indexes and constraints support faster queries

Understanding what happens behind the scenes will help you:

  • interpret slow queries
  • write more efficient SQL
  • collaborate more effectively with data engineers and database admistrators

This foundation becomes especially important as datasets grow larger and queries become more complex.

If this query plan looks confusing right now. You don’t need to understand every detail. In fact, the only element you need to take note of is the cost. Now, this cost doesn’t refer to the actual cost of running the query, rather it’s an arbitrary unit that’s used to indicate the estimated run time of the query. The cost, or time, of returning the first row is 0, but the cost of returning all the rows is 64. Since the cost unit doesn’t refer to a specific second or minute, all this tells us is that a query with a cost of 64 will take longer than a query with a cost of 30.

Query Plan

As you become more familiar with SQL, you will start combining multiple SQL commands into scripts.
When working with large databases—often containing millions or billions of records—it becomes essential to consider the time and cost of running those scripts.

For example, imagine working with a large transactional system that stores sales and product data.
Running a poorly optimized query on a frequently accessed table could significantly slow down reports and increase infrastructure costs. That’s why queries that run often should be optimized before being deployed.

Faster queries mean:

  • lower computational cost
  • better system performance
  • quicker insights for decision-makers

What Is a Query Plan?

A query plan shows how the database intends to execute a SQL query.

Most relational database management systems (RDBMS), including PostgreSQL, can estimate:

  • how long a query will take
  • how many rows will be processed
  • which operations will be performed (scans, joins, filters)

To generate a query plan, you simply add the keyword EXPLAIN before your SQL query.

Suppose you want to retrieve all records from a sales table and understand how expensive this operation might be.

EXPLAIN
SELECT *
FROM sales;

When this query is executed, PostgreSQL does not return the data itself.
Instead, it returns a query plan describing how the data would be retrieved.

A basic query plan might look like this:

"Seq Scan on sales  (cost=0.00..92.00 rows=5000 width=34)"

This output tells us:

  • Seq Scan: PostgreSQL is performing a sequential scan, reading every row in the table
  • sales: the table being scanned
  • cost: an internal estimate of startup and total execution cost
  • rows: estimated number of rows returned
  • width: average size of each row in bytes

A sequential scan is common when:

  • the table is relatively small
  • no suitable index exists
  • most rows are expected to be returned
Tip

Try yourself

Understanding query plans helps you:

  • diagnose slow queries
  • decide when indexes are needed
  • write more efficient SQL
  • communicate performance issues to data engineers or DBAs

As datasets grow, even simple queries can become expensive.
Learning to read and reason about query plans is a key step toward writing production-ready SQL.

Tip

Query optimization is a huge topic, most of which goes beyond the scope of the bootcamp and the role of a data analyst. Just know that writing optimized queries is a good habit to get into, especially queries that will be executed frequently.

If you’d like to read more about query optimization:

  • this articale is a good place to start: SQL Tuning or Query Optimization
  • PostgreSQL tuning
  • Execution order

Refining Your SQL Queries

We have learned that reducing the cost of your queries is important.
But how do you actually optimize SQL queries in practice?

One of the simplest and most effective ways is to be precise about what data you request.
Before introducing additional commands, it’s worth revisiting SELECT, because it is the foundation of almost every analytical query.

SELECT

As you already know, SELECT allows you to retrieve data from a database.

A SELECT statement usually follows this structure:

SELECT column_name_1,
       column_name_2
FROM table_name;

If you want to retrieve multiple columns, you list them separated by commas.

Consider the products table:

SELECT product_name,
       price,
       category
FROM products;

This query tells the database to:

  • return the product_name
  • return the price
  • return the category
  • read the data from the products table

Only these three columns will be included in the result set.

Why Column Selection Matters?

Selecting only the columns you actually need:

  • reduces the amount of data transferred
  • improves query performance
  • makes results easier to read and interpret
ImportantTry yourself

Compare the following two queries:

EXPLAIN
SELECT *
FROM products;
EXPLAIN
SELECT product_name, price
FROM products;

The second query is usually preferable when you only need product names and prices.

ORDER BY

The ORDER BY clause sorts query results in ascending or descending order.
It is commonly used when you want to view:

  • the first or last records
  • highest or lowest values
  • alphabetically ordered text data

By default, ORDER BY sorts results in ascending order.

SELECT column_name
FROM table_name
ORDER BY column_name;

You can explicitly control the sorting direction using:

  • ASC for ascending order
  • DESC for descending order

Suppose you want to view products sorted by name in descending alphabetical order (Z \(\rightarrow\) A).

SELECT
  product_name,
  category
FROM products
ORDER BY product_name DESC;

This query returns:

  • product_name
  • category

sorted from Z to A, because product_name is a text (character) column.

If you sort a numeric column in descending order, results are ordered from largest to smallest.

SELECT
  product_name,
  price
FROM products
ORDER BY price DESC;

This query returns the most expensive products first.

You can sort by more than one column. The database applies sorting from left to right.

SELECT
  product_name,
  category,
  price
FROM products
ORDER BY category ASC, price DESC;

This means:

  1. Products are grouped by category (A \(\rightarrow\) Z)
  2. Within each category, products are sorted by price (high \(\rightarrow\) low)

To sum up:

  • ORDER BY is applied after SELECT and FROM
  • Default sort order is ascending
  • Use DESC for reverse ordering
  • Text columns are sorted alphabetically
  • Numeric columns are sorted by value
  • Multiple columns can be combined for fine-grained control
Tip

In the ORDER BY statment, instead of writing column names , we can simply provide its index.

For example:

SELECT
  product_name,
  category,
  price
FROM products
ORDER BY 2 ASC, 3 DESC;

LIMIT

The LIMIT clause restricts the number of rows returned by a query.

It is useful when:

  • you only need a small sample of the data
  • you want to preview results
  • you want to reduce query cost and execution time

Basic Syntax:

SELECT column_name
FROM table_name
LIMIT number_of_rows;

The value provided to LIMIT must always be a number.

Example:

Suppose you want to retrieve product names and prices, sorted by price from highest to lowest, but you only care about the top 10 most expensive products.

SELECT
  product_name,
  price
FROM products
ORDER BY price DESC
LIMIT 10;

This query:

  • sorts products by price in descending order
  • returns only the first 10 rows

Using LIMIT in this way saves time and computational resources, especially when working with large tables.

Important
  • LIMIT is always written at the end of a query
  • it must be placed after ORDER BY (if present)
  • the value must be numeric

If LIMIT is placed elsewhere in the query, PostgreSQL will return an error.


NoteDatabase Compatibility Note

Not all SQL databases use LIMIT.

  • PostgreSQL, MySQL, SQLite \(\rightarrow\) LIMIT
  • SQL Server \(\rightarrow\) TOP

GROUP BY

The GROUP BY clause is very popular among data analysts :)

The GROUP BY clause allows you to group rows that share the same values in one or more columns.

It is commonly used together with aggregate functions to summarize data.

Similar to Excel, SQL provides several aggregate functions that allow you to perform calculations across multiple rows of data.

Below is a list of commonly used aggregate functions.
Note that SUM and AVG can only be applied to numeric columns.

  • COUNT() counts the number of rows
  • MAX() finds the maximum value in a column
  • MIN() finds the minimum value in a column
  • SUM() adds up all numeric values in a column
  • AVG() calculates the average of numeric values in a column
Note

We are going to learn more functions later.

Example | Grouping Sales by Product

Suppose you want to calculate total sales per product.

SELECT
  product_id,
  SUM(total_sales) AS total_revenue
FROM sales
GROUP BY product_id;

This query:

  • groups rows by product_id
  • calculates total revenue per product
  • returns one row per product

Key Rules for GROUP BY

  • Every column in SELECT must either:
    • appear in the GROUP BY clause, or
    • be wrapped in an aggregate function
  • GROUP BY is evaluated after WHERE and before ORDER BY

Example | Grouping with Sorting and Limiting

SELECT
  product_id,
  SUM(total_sales) AS total_revenue
FROM sales
GROUP BY product_id
ORDER BY total_revenue DESC
LIMIT 5;

This query returns the top 5 products by total revenue.

Tip
  • LIMIT controls how many rows are returned
  • GROUP BY controls how rows are aggregated
  • Both are essential tools for efficient, scalable data analysis
  • They are often used together in real-world analytical queries

Example | Discovering Distinct Values

GROUP BY can also be used to identify distinct values in a column.
This is useful when you want to understand how data is organized without seeing every individual row.

To see all product categories in the database, you could start with:

SELECT 
  category
FROM products;

This query returns the category for every product, which may include many duplicates. If you only want to know which categories exist, GROUP BY is the right tool.

SELECT 
  category,
  COUNT(*)
FROM products
GROUP BY category;

This query:

  • groups rows by category
  • returns one row per unique category
  • removes duplicate category values from the output

The result is a concise summary of the different product categories stored in the database.

Using GROUP BY in this way helps you:

  • explore the structure of a dataset
  • validate categorical columns
  • prepare data for aggregation and reporting

In practice, GROUP BY becomes even more powerful when combined with aggregate functions, which you’ll explore next.

Example | Sales per Pruduct

Aggregate functions are most powerful when combined with GROUP BY, which allows you to summarize data across categories. Suppose you are working with a sales department and want to understand how many sales transactions exist per product.

To begin, recall that the sales table may contain many rows per product.

SELECT
  product_id,
  COUNT(transaction_id) AS number_of_transactions
FROM sales
GROUP BY product_id;

This query:

  • groups rows by product_id
  • counts how many transactions exist for each product
  • returns one row per product

Example | Total Revenue per Product

You can also summarize numeric columns such as total revenue.

SELECT
  product_id,
  SUM(total_sales) AS total_revenue
FROM sales
GROUP BY product_id;

Here:

  • SUM(total_sales) adds up all sales values per product
  • the result shows total revenue generated by each product

Example | Average Price by Category

Aggregate functions are often used for higher-level analysis.

SELECT
  category,
  AVG(price) AS average_price
FROM products
GROUP BY category;

This query:

  • groups products by category
  • calculates the average price within each category

GROUP BY with Multiple Columns

You can also group data by multiple columns.
To do this, simply list all grouping columns after GROUP BY, separated by commas.

This allows you to create more granular summaries by combining multiple dimensions.

Example | Grouping by Multiple Attributes

Suppose you want to understand how many sales transactions exist for each product and employee_id (later we will get the employee name) combination.

SELECT
  product_id,
  COUNT(transaction_id) AS transaction_count
FROM sales
GROUP BY product_id
ORDER BY COUNT(transaction_id) DESC;

This query:

  • groups records by both product_id and employee_id
  • counts the number of transactions in each combination
  • returns one row per unique (product_id, employee_id) pair

The output represents:

  • one row per product and employee combination
  • the total number of transactions for that specific combination

Grouping by multiple columns helps you:

  • analyze performance across multiple dimensions
  • break down metrics by category, region, channel, or time
  • build detailed summaries for dashboards and reports

You will get more practice using multi-column GROUP BY as you move into data summarization, aggregation, and data cleaning tasks later in the course.

DISTINCT

The DISTINCT keyword is used to return unique values from a column or a combination of columns.
It is especially useful when a dataset contains duplicate values and you want to understand the unique categories or combinations present in the data.

Syntax

SELECT DISTINCT column_name
FROM table_name;

Example | Finding Unique Values

Suppose you want to see all unique product categories in the products table.

SELECT DISTINCT category
FROM products;

This query returns one row per unique category, even if multiple products belong to the same category.

DISTINCT with Multiple Columns

DISTINCT can also be applied to multiple columns to find unique combinations.

For example, to find unique combinations of product category and price:

SELECT DISTINCT
  category,
  price
FROM products;

This returns each unique (category, price) pair found in the table.

DISTINCT with COUNT

SELECT DISTINCT
  COUNT(category),
  COUNT(DISTINCT category)
FROM products;

DISTINCT vs GROUP BY

In many cases, DISTINCT and GROUP BY can produce the same result when no aggregate functions are used.

Using DISTINCT:

SELECT DISTINCT
  category,
  price
FROM products;

Using GROUP BY:

SELECT
  category,
  price
FROM products
GROUP BY category, price;

Both queries return the unique combinations of category and price.

When to Use DISTINCT

Use DISTINCT when:

  • you only need unique values
  • no aggregation is required
  • you want simpler, more readable SQL

Use GROUP BY when:

  • you need aggregate functions
  • you want more control over grouping logic

GROUP BY vs DISTINCT

Which command do you think is faster: GROUP BY or DISTINCT?

Use EXPLAIN to generate query plans for both queries and compare their estimated costs to see which one runs more efficiently in PostgreSQL.

HAVING

Why HAVING Exists

The HAVING clause is designed to filter aggregated results.

In analytical queries, data is often grouped and summarized before it becomes meaningful.
At that stage, row-level filtering is no longer sufficient.

HAVING fills this gap.

  • WHERE \(\rightarrow\) filters rows before aggregation
  • HAVING \(\rightarrow\) filters groups after aggregation

Understanding this distinction is essential for correct and efficient SQL.

WHERE vs HAVING

Think of SQL execution as a pipeline.

  • raw rows are first filtered
  • rows are then grouped
  • aggregates are calculated
  • groups are filtered again if needed

This leads to a clear separation of responsibilities:

  • WHERE \(\rightarrow\) row-level filtering
  • GROUP BY \(\rightarrow\) aggregation
  • HAVING \(\rightarrow\) group-level filtering

Logical Query Execution Order

Although SQL is written top-down, it is executed in a different logical order.

At a high level:

  • FROM
  • WHERE
  • GROUP BY
  • HAVING
  • SELECT
  • ORDER BY
  • LIMIT

This order explains why aggregate functions cannot appear in WHERE, but are allowed in HAVING.

Basic HAVING Syntax

A typical query using HAVING looks like this:

SELECT
  group_column,
  AGGREGATE_FUNCTION(column)
FROM table_name
GROUP BY group_column
HAVING AGGREGATE_FUNCTION(column) condition;

Important points:

  • HAVING is evaluated after aggregation
  • it almost always appears together with GROUP BY
  • it works on aggregated values, not raw rows

HAVING Example | Filtering by Total Revenue

Suppose you want to identify products that generated more than 10,000 in total revenue.

SELECT
  product_id,
  SUM(total_sales) AS total_revenue
FROM sales
GROUP BY product_id
HAVING SUM(total_sales) > 10000;

This query:

  • groups sales by product
  • calculates total revenue per product
  • keeps only products above the revenue threshold

HAVING vs WHERE — Common Mistake

Using WHERE with aggregate functions is not allowed.

Incorrect example:

SELECT
  product_id,
  SUM(total_sales)
FROM sales
WHERE SUM(total_sales) > 10000
GROUP BY product_id;

This fails because SUM(total_sales) does not exist at the WHERE stage of execution.


Combining WHERE and HAVING

In real-world analytical queries, WHERE and HAVING are often used together.

Their roles are complementary:

  • WHERE removes unnecessary rows early
  • HAVING applies business rules after aggregation
SELECT
  product_id,
  SUM(total_sales) AS total_revenue
FROM sales
WHERE total_sales > 0
GROUP BY product_id
HAVING SUM(total_sales) > 10000;

This approach is usually more efficient than relying on HAVING alone.


HAVING with COUNT

HAVING is frequently used with COUNT().

Example: find products with at least 50 transactions.

SELECT
  product_id,
  COUNT(transaction_id) AS transaction_count
FROM sales
GROUP BY product_id
HAVING COUNT(transaction_id) >= 50;

Here:

  • transactions are grouped by product
  • only frequently sold products are retained

HAVING with Multiple Conditions

Multiple aggregate conditions can be combined using logical operators.

SELECT
  product_id,
  COUNT(transaction_id) AS transaction_count,
  SUM(total_sales) AS total_revenue
FROM sales
GROUP BY product_id
HAVING
  COUNT(transaction_id) >= 50
  AND SUM(total_sales) > 10000;

This allows you to enforce compound business rules at the group level.


When to Use HAVING

Use HAVING when:

  • filtering aggregated results
  • applying thresholds to groups
  • working with SUM, COUNT, AVG, MIN, or MAX

Do not use HAVING when row-level filtering with WHERE is sufficient.

Homework

Assume the following:

  • Tables customers, products, and sales already exist
  • Their schemas and columns are already documented
  • Your task is not discovery, but improvement, enforcement, and optimization

You must not recreate tables or re-document schemas.

Task 1 | Enforce Missing Business Rules with ALTER TABLE

Even with a documented schema, databases often lack enforced rules.

Apply the following constraints using ALTER TABLE:

Business rules:

  • Employees emails must be unique
  • Emoployee phone numbers must be mandatory
  • Product prices must be non-negative
  • Sales totals must be non-negative
ALTER TABLE employees
ADD CONSTRAINT uq_employees_email UNIQUE (email);
ALTER TABLE employees
ALTER COLUMN phone_number SET NOT NULL;
ALTER TABLE products
ADD CONSTRAINT chk_products_price CHECK (price >= 0);
ALTER TABLE sales
ADD CONSTRAINT chk_sales_total CHECK (total_sales >= 0);

Task 2 | Add a New Analytical Attribute

The business now wants to analyze sales performance by channel.

Add a new column to the sales table:

  • Column name: sales_channel
  • Allowed values: 'online', 'store'
ALTER TABLE sales
ADD COLUMN sales_channel TEXT;

Add a constraint to enforce valid values.

ALTER TABLE sales
ADD CONSTRAINT chk_sales_channel
CHECK (sales_channel IN ('online', 'store'));

Populate the column with sample values.

UPDATE sales
SET sales_channel = 'online'
WHERE transaction_id % 2 = 0;
Important

write an explanation

Task 3 | Add Indexes for Query Performance

Based on common analytical queries, create indexes on columns that are frequently used for (according to you): - joins - grouping - filtering

Create the following indexes:

CREATE INDEX idx_sales_product_id
ON sales (product_id);
CREATE INDEX idx_sales_customer_id
ON sales (customer_id);
CREATE INDEX idx_products_category
ON products (category);

Task 4 | Validate Index Usage with EXPLAIN

Run EXPLAIN on an aggregation query that is commonly used in reporting.

EXPLAIN
SELECT
  product_id,
  SUM(total_sales) AS total_revenue
FROM sales
GROUP BY product_id;

Write a short interpretation:

  • Is a sequential scan used?
  • Does PostgreSQL leverage the index?
  • Why might the planner choose this plan?

Task 5 | Reduce Query Cost by Refining SELECT

Rewrite a reporting query to avoid unnecessary data retrieval.

Original query:

SELECT *
FROM sales;

Refined query:

SELECT
  transaction_id,
  product_id,
  total_sales
FROM sales;

Explain in 2–3 sentences:

  • why this reduces cost
  • when SELECT * might still be acceptable

Task 6 | ORDER BY and LIMIT for Business Questions

Marketing wants to identify the top 5 products by total revenue.

Write a query that:

  • aggregates sales
  • sorts by revenue
  • limits the output
SELECT
  product_id,
  SUM(total_sales) AS total_revenue
FROM sales
GROUP BY product_id
ORDER BY total_revenue DESC
LIMIT 5;

Run EXPLAIN and comment on:

  • sorting cost
  • whether indexes help in this case

Task 7 | DISTINCT vs GROUP BY (Efficiency Comparison)

Retrieve unique combinations of category and price using both approaches.

Using DISTINCT:

EXPLAIN
SELECT DISTINCT
  category,
  price
FROM products;

Using GROUP BY:

EXPLAIN
SELECT
  category,
  price
FROM products
GROUP BY category, price;

Answer briefly:

  • Are the query plans similar?
  • Which has lower estimated cost?
  • Why might PostgreSQL optimize them the same way?

Task 8 | Constraint Enforcement Test

Attempt to violate at least two constraints you added earlier and observe the errors.

Examples:

UPDATE products
SET price = -5
WHERE product_id = 101;
INSERT INTO customers (customer_id, email, phone_number)
VALUES (999, 'anna@example.com', '091000999');

Explain:

  • which constraint was triggered
  • why this protects data quality

Task 9 | Reflection (Short Answer)

Answer briefly (3–5 sentences total):

  • Which constraints provide the highest business value?
  • Which index would you prioritize in a production environment?
  • What signals tell you a query needs optimization?

Submission Rules

  • No table creation
  • No schema documentation
  • Only ALTER, CREATE INDEX, UPDATE, EXPLAIN, SELECT, WHERE, GROUP BY, HAVING, ORDER BY, LIMI
  • Include SQL and short written interpretations
  • Remember you need to put the code as .sql file
  • Add to staging by git add .
  • Commit the changes by git comit -m "meaningfull message"
  • Push to GitHub! git push