Intro to PostgreSQL

Data Types, Schemas, Keys, Indexes

Karen Hovhannisyan

2026-07-04

For Today…

Learning Goals

By the end of this session, you will be able to:

understand how data is structured in PostgreSQL
identify tables, rows, and columns
explain different types of keys and relationships
understand how indexes affect performance
recognize common PostgreSQL data types
get familiarized with basic SQL syntax

ERD Simple

Sales Analytics Database

The database captures information about:

customers who place orders
products that are sold
employees involved in the sales process
orders placed over time
individual sales transactions

Questions to Data Analyst

Your role as an analyst is to query this data to answer business questions related to:

revenue and sales performance
product popularity
customer behavior
employee contribution
time-based trends

ERD Complete

Keys and Indexes

To store data efficiently and enable relationships between tables, relational databases rely on keys and indexes.

Keys

Primary Key
Candidate Key
Composite Key
Surrogate Key
Foreign key

Primary Key

A primary key uniquely identifies each record in a table.

transaction_id	order_id	product_id	customer_id	total_sales
1001	501	12	3001	49.99
1002	501	18	3001	19.99
1003	502	12	3005	49.99
1004	503	25	3008	89.99

Candidate Key

customer_id	customer_name	email	city
3001	Alice Johnson	alice.johnson@email.com	Berlin
3002	Mark Thompson	mark.thompson@email.com	Paris
3003	Elena Petrova	elena.p@email.com	Madrid
3004	David Chen	david.chen@email.com	London

Composite Key

order_id	product_id	product_name	quantity
501	12	Wireless Mouse	1
501	18	USB-C Cable	2
502	12	Wireless Mouse	1
503	25	Mechanical Keyboard	1

Composite Key

(order_id, product_id) could uniquely identify rows in some transactional systems.

Surrogate Key

transaction_id	order_id	product_id	quantity	total_sales
1001	501	12	1	49.99
1002	501	18	2	19.99
1003	502	12	1	49.99
1004	503	25	1	89.99

Surrogate Key helps:

simplify joins
reduce index size
improve query readability

Foreign Key

Indexes

Trade-Off

Full Table Scan

When a query checks every row instead of using an index, this is called a full scan.

Indexes improve read performance but come with trade-offs:

extra storage
slower inserts and updates
maintenance overhead

When not to Create!

Very small tables
Frequently updated columns
Columns with many NULL values
Columns with very low cardinality

Single-Column Index

A single-column index is built on one column.

Employee ID	Name	Contact Number	Age
1	Max	800692692	24
2	Jessica	800123456	35
3	Mikeal	800745547	49

Composite Index

Consider employees table, but now imagine that queries often filter by both name and age at the same time.

employee_id	name	age	department
1	Max	24	Sales
2	Jessica	35	Marketing
3	Max	35	Finance

When to use:

Efficient for queries filtering on name
Efficient for queries filtering on name and age
Not efficient for queries filtering on age only

Unique Index

Basically all the candidate keys :)

customer_id	customer_name	email
3001	Alice Johnson	alice@email.com
3002	Mark Thompson	mark@email.com
3003	Elena Petrova	elena@email.com

Use a unique index when:

values must be unique across rows
the column is frequently used for lookup
uniqueness is part of business logic

Q: Are Keys Also Indexes?

Answer

Concept	Enforces Uniqueness	Improves Query Speed	Stored as Index
Primary Key	Yes	Yes	Yes (unique index)
Unique Constraint	Yes	Yes	Yes (unique index)
Foreign Key	No	Sometimes	No (by default)
Regular Index	No	Yes	Yes

Data Types

PostgreSQL data types are commonly grouped into:

Numeric
Character (Text)
Date & Time
Boolean & Categorical
Special & Advanced

Data Types in PostgreSQL

Data types define:

what kind of data a column can store?
how data is validated?
how much storage is used?
how fast queries can run?

Why Data Types Matter?

Choosing the wrong data type can:

break calculations
reduce performance
waste storage
introduce subtle errors

Numeric Data Types

Data Type	Description	Example	Use Case
`INTEGER`	Whole numbers	`2500`	Counts
`BIGINT`	Large integers	`9876543210`	IDs
`NUMERIC(p,s)`	Exact decimals	`15432.75`	Revenue
`DOUBLE PRECISION`	Approximate	`0.123456`	KPIs

The Rule of Thumb:

Use INTEGER for counts
Use NUMERIC for money
Use DOUBLE PRECISION for ratios

Index performance depends on data type size:

types \(\rightarrow\) faster indexes
larger types \(\rightarrow\) more storage

Main Character Data Types

Data Type	Description	Example Value	Typical Use Case
CHAR(n)	Fixed-length character string	CHAR(5) → ‘US’	Country codes, fixed formats
VARCHAR(n)	Variable-length string with limit	VARCHAR(50) → ‘Karen Hovhannisyan’	Names, emails
TEXT	Variable-length string, no limit	‘This product has been discontinued.’	Descriptions, comments
CHARACTER VARYING(n)	SQL-standard name for VARCHAR	CHARACTER VARYING(20) → ‘A123XZ’	Codes, identifiers

Main Date and Time Data Types

Data Type	Description	Example Value	Typical Use Case
DATE	Calendar date (no time)	2025-03-15	Birth dates, order dates
TIME	Time of day (no date)	14:30:00	Opening hours
TIME WITH TIME ZONE	Time with time zone info	14:30:00+04	Cross-region schedules
TIMESTAMP	Date and time (no timezone)	2025-03-15 14:30:00	Local event logs
TIMESTAMP WITH TIME ZONE (TIMESTAMPTZ)	Date and time with timezone handling	2025-03-15 10:30:00+00	Auditing, analytics
INTERVAL	Time duration	3 days 4 hours	Session length, SLA

Boolean & Categorical Data Types

Data Type	Description	Example Value	Typical Use Case
BOOLEAN	Logical true / false value	TRUE, FALSE	Active flags, eligibility
CHAR(n)	Fixed-length category code	CHAR(1) → ‘Y’	Yes/No indicators
VARCHAR(n)	Short categorical label	‘premium’	Customer segments
TEXT	Free-form category label	‘high_value_customer’	Tags, labels
ENUM	Predefined set of values	(‘low’,‘medium’,‘high’)	Controlled categories

Special Data Types | OPTIONAL

Data Type / Extension	Description	Example Value	Typical Use Case
JSON / JSONB	Semi-structured JSON data	{“plan”:“premium”,“usage”:120}	Logs, APIs, configs
ARRAY	Array of values	{1,2,3}	Tags, multi-valued attributes
UUID	Universally unique identifier	550e8400-e29b-41d4-a716-446655440000	Distributed IDs
INET	IP address	192.168.1.1	Network traffic
CIDR	Network block	192.168.0.0/24	Subnet modeling
GEOMETRY (PostGIS)	Geometric objects	POINT(40.18 44.51)	Maps, locations
GEOGRAPHY (PostGIS)	Earth-based coordinates	POINT(44.51 40.18)	Distance calculations
ltree	Hierarchical tree paths	region.city.store	Organizational trees
pgRouting	Graph/network extension	N/A	Network routing, telecom

PostGIS (Geospatial Extension)

PostGIS adds geospatial intelligence to PostgreSQL.

Common capabilities:

Distance calculations
Spatial joins
Radius searches
Area coverage

Important

As a data analyst you are not expected to design PostGIS schemas or routing graphs, but you will query them.

Documentation you can find here

Relational Database Schema

What Is a Schema?

A schema describes the structure of a database:

tables,
columns,
keys
relationships.

Star Schema

A star schema consists of:

one central fact table
multip le surrounding dimension tables

Important

Although our database is transactional, its structure closely resembles a star schema, making it well-suited for analytics.

Snowflake Schema

A snowflake schema extends the star schema by normalizing dimension tables into sub-dimensions.

Entity Relationship Diagram (ERD)

Try to generate the ERD for the sales database using your SQL client (PgAdmin).

Identify primary keys, foreign keys, and relationships.

DDL | Data Definition Language

What Is DDL?

DDL (Data Definition Language) is used to define and manage database structure.

operates on database objects, not rows
describes how data is stored
forms the foundation for all queries and analysis

DDL Scope

DDL affects:

tables
columns
constraints
indexes

DDL decisions directly impact:

data quality
query performance
scalability

Common DDL Statements

The most common DDL commands are:

CREATE
ALTER
DROP
TRUNCATE

Each command changes the database schema or metadata.

CREATE (DDL Context)

In DDL, CREATE is used to define new database objects.

Typical use cases:

creating schemas
creating database
creating tables
creating indexes

CREATE Example | Index

Example: create an index to improve analytical queries.

CREATE INDEX idx_sales_product_id
ON sales (product_id);

This command:

does not change data
improves data access performance

ALTER

ALTER modifies the structure of an existing object.

It allows schema evolution without recreating tables.

ALTER Example | Constraint

Example: enforce a business rule on existing data.

ALTER TABLE products
ADD CONSTRAINT chk_products_price
CHECK (price >= 0);

DROP

DROP permanently removes a database object.

structure is deleted
data is lost
operation is irreversible

DROP Example

Example: remove an unused index.

DROP INDEX idx_sales_product_id;

TRUNCATE

TRUNCATE removes all rows from a table while keeping its structure.

Compared to DELETE:

much faster
no WHERE clause
limited rollback support

TRUNCATE Example

TRUNCATE TABLE sales_staging;

CRUD | Data Manipulation

What Is CRUD?

CRUD represents operations used to work with data inside tables.

CRUD stands for:

CREATE
READ
UPDATE
DELETE

Critical Terminology Clarification

The word CREATE means different things depending on context.

DDL CREATE \(\rightarrow\) defines structure
CRUD CREATE \(\rightarrow\) inserts data

Confusing these is a common beginner mistake.

DDL vs CRUD | Mental Model

DDL \(\rightarrow\) defines the rules
CRUD \(\rightarrow\) follows the rules
DDL \(\rightarrow\) structure
CRUD \(\rightarrow\) data

CREATE (CRUD \(\rightarrow\) INSERT)

In CRUD, CREATE means adding new rows.

This is done using INSERT.

INSERT INTO products (product_id, product_name, price, category)
VALUES (101, 'Wireless Mouse', 24.99, 'Accessories');

READ (SELECT)

READ retrieves data and is implemented using SELECT.

SELECT
  product_id,
  product_name
FROM products;

READ with Filtering

Filtering rows is done using WHERE.

SELECT
  *
FROM sales
WHERE total_sales < 50;

UPDATE

UPDATE modifies existing records.

Always use WHERE to target specific rows.

UPDATE products
SET price = 49.99
WHERE product_id = 12;

DELETE

DELETE removes records from a table.

Deletions are irreversible and must be used with caution.

DELETE FROM sales
WHERE transaction_id = 1004;

Constraints

Why Constraints Matter

Constraints play a crucial role in keeping your data organized, consistent, and reliable.

They define what type of data a table or column can accept and are typically applied when tables are created or later via ALTER TABLE.

When defined correctly, constraints:

enforce data integrity
prevent invalid or inconsistent data
act as built-in data quality checks

Types of Constraints

The most common constraints you will encounter are:

UNIQUE
NOT NULL
PRIMARY KEY
FOREIGN KEY
CHECK

Each constraint enforces a specific business or technical rule.

UNIQUE Constraint

The UNIQUE constraint ensures that all values in a column are distinct.

It is commonly used for attributes that must be unique across entities.

Typical use cases include:

email addresses
usernames
national identification numbers

UNIQUE Constraint | Example

Ensure each customer email is unique.

CREATE TABLE customers (
  customer_id INTEGER PRIMARY KEY,
  email TEXT UNIQUE,
  phone_number TEXT
);

With this constraint in place, PostgreSQL will reject any attempt to insert a duplicate email address.

NOT NULL Constraint

The NOT NULL constraint ensures that a column cannot contain NULL values.

Use this constraint for fields that are mandatory and must always be provided.

Typical use cases include:

primary identifiers
contact information
transaction timestamps

NOT NULL Constraint | Example

Ensure every customer has a phone number.

CREATE TABLE customers (
  customer_id INTEGER PRIMARY KEY,
  phone_number TEXT NOT NULL,
  email TEXT
);

If an insert is attempted without a phone number, PostgreSQL will return an error.

PRIMARY KEY Constraint

A PRIMARY KEY uniquely identifies each row in a table.

A primary key:

must be unique
cannot contain NULL values
exists only once per table

PRIMARY KEY Constraint | Example

Define a primary key for a products table.

CREATE TABLE products (
  product_id INTEGER PRIMARY KEY,
  product_name TEXT NOT NULL,
  price NUMERIC(10, 2)
);

The PRIMARY KEY constraint automatically enforces both UNIQUE and NOT NULL.

FOREIGN KEY Constraint

A FOREIGN KEY creates a relationship between two tables.

It ensures referential integrity, meaning referenced values must exist in the parent table.

FOREIGN KEY Constraint | Example

Link sales records to customers.

CREATE TABLE sales (
  transaction_id INTEGER PRIMARY KEY,
  customer_id INTEGER,
  total_sales NUMERIC(10, 2),
  FOREIGN KEY (customer_id)
    REFERENCES customers (customer_id)
);

With this constraint, you cannot insert a sale for a customer that does not exist.

CHECK Constraint

The CHECK constraint restricts the range or condition of values that can be inserted into a column.

It validates data using logical expressions.

Typical use cases include:

value ranges
minimum or maximum limits
domain rules

CHECK Constraint | Example

Ensure total sales values are non-negative.

CREATE TABLE sales (
  transaction_id INTEGER PRIMARY KEY,
  total_sales NUMERIC(10, 2) CHECK (total_sales >= 0)
);

If a value violates the condition, the insert or update will fail.

SQL Rules and Best Practices

Why Best Practices Matter

Writing SQL that works is not enough.

Good SQL should be:

readable
consistent
maintainable
easy to debug

These rules help teams collaborate and reduce errors.

Numbers and Underscores

In relational databases:

table names cannot start with a number
column names cannot contain spaces
underscores should be used to separate words

This improves compatibility and readability.

Naming | Examples

Recommended:

CREATE TABLE customer_lifecycle;
CREATE TABLE customer;

Not recommended:

CREATE TABLE customer lifecycle;
CREATE TABLE 3customer;

Capitalization

A widely accepted convention is:

SQL keywords \(\rightarrow\) UPPERCASE
table names \(\rightarrow\) lowercase
column names \(\rightarrow\) lowercase

This improves readability and consistency.

Capitalization | Examples

Recommended:

SELECT product_id, product_name
FROM products;

SELECT transaction_id, total_sales
FROM sales;

Not recommended:

Select PRODUCT_ID, PRODUCT_NAME From PRODUCTS;

SELECT TRANSACTION_ID, TOTAL_SALES FROM SALES;

Commenting SQL Code

Commenting helps others (and your future self) understand your logic.

Comments:

do not affect execution
improve documentation
help during debugging

Commenting | Examples

Single-line comments:

-- select and display the first 10 rows from the sales table
SELECT *
FROM sales
LIMIT 10;

Multi-line comments:

/*
SELECT
  s.transaction_id,
  p.product_name,
  s.total_sales
FROM sales s
JOIN products p
  ON s.product_id = p.product_id
WHERE s.total_sales > 100;
*/

Aliasing

Aliases assign temporary names to tables or columns.

They are useful for:

improving readability
shortening long names
resolving ambiguity in joins

Aliasing | Examples

Column aliases:

SELECT
  product_name AS item_name,
  price AS unit_price
FROM products;

Table aliases in joins:

SELECT
  s.transaction_id,
  p.product_name,
  s.total_sales
FROM sales AS s
JOIN products AS p
  ON s.product_id = p.product_id;

Summary

constraints protect data quality
best practices improve readability
consistent SQL scales better in teams
clean SQL enables better optimization