Intro to PostgreSQL

Data Types, Schemas, Keys, Indexes

Karen Hovhannisyan

2026-04-01

For Today…

Learning Goals

By the end of this session, you will be able to:

  • understand how data is structured in PostgreSQL
  • identify tables, rows, and columns
  • explain different types of keys and relationships
  • understand how indexes affect performance
  • recognize common PostgreSQL data types
  • get familiarized with basic SQL syntax

ERD Simple

!

Sales Analytics Database

The database captures information about:

  • customers who place orders
  • products that are sold
  • employees involved in the sales process
  • orders placed over time
  • individual sales transactions

Questions to Data Analyst

Your role as an analyst is to query this data to answer business questions related to:

  • revenue and sales performance
  • product popularity
  • customer behavior
  • employee contribution
  • time-based trends

ERD Complete

!

Keys and Indexes

To store data efficiently and enable relationships between tables, relational databases rely on keys and indexes.

Keys

  • Primary Key
  • Candidate Key
  • Composite Key
  • Surrogate Key
  • Foreign key

!

Primary Key

A primary key uniquely identifies each record in a table.

transaction_id order_id product_id customer_id total_sales
1001 501 12 3001 49.99
1002 501 18 3001 19.99
1003 502 12 3005 49.99
1004 503 25 3008 89.99

Candidate Key

customer_id customer_name email city
3001 Alice Johnson alice.johnson@email.com Berlin
3002 Mark Thompson mark.thompson@email.com Paris
3003 Elena Petrova elena.p@email.com Madrid
3004 David Chen david.chen@email.com London

Composite Key

order_id product_id product_name quantity
501 12 Wireless Mouse 1
501 18 USB-C Cable 2
502 12 Wireless Mouse 1
503 25 Mechanical Keyboard 1

Composite Key

(order_id, product_id) could uniquely identify rows in some transactional systems.

Surrogate Key

transaction_id order_id product_id quantity total_sales
1001 501 12 1 49.99
1002 501 18 2 19.99
1003 502 12 1 49.99
1004 503 25 1 89.99

Surrogate Key helps:

  • simplify joins
  • reduce index size
  • improve query readability

Foreign Key

!

Indexes

!

Trade-Off

Full Table Scan

When a query checks every row instead of using an index, this is called a full scan.

Indexes improve read performance but come with trade-offs:

  • extra storage
  • slower inserts and updates
  • maintenance overhead

When not to Create!

  • Very small tables
  • Frequently updated columns
  • Columns with many NULL values
  • Columns with very low cardinality

Single-Column Index

A single-column index is built on one column.

Employee ID Name Contact Number Age
1 Max 800692692 24
2 Jessica 800123456 35
3 Mikeal 800745547 49

Composite Index

Consider employees table, but now imagine that queries often filter by both name and age at the same time.

employee_id name age department
1 Max 24 Sales
2 Jessica 35 Marketing
3 Max 35 Finance

When to use:

  • Efficient for queries filtering on name
  • Efficient for queries filtering on name and age
  • Not efficient for queries filtering on age only

Unique Index

Basically all the candidate keys :)

customer_id customer_name email
3001 Alice Johnson alice@email.com
3002 Mark Thompson mark@email.com
3003 Elena Petrova elena@email.com

Use a unique index when:

  • values must be unique across rows
  • the column is frequently used for lookup
  • uniqueness is part of business logic

Q: Are Keys Also Indexes?

Answer

Concept Enforces Uniqueness Improves Query Speed Stored as Index
Primary Key Yes Yes Yes (unique index)
Unique Constraint Yes Yes Yes (unique index)
Foreign Key No Sometimes No (by default)
Regular Index No Yes Yes

Data Types

PostgreSQL data types are commonly grouped into:

  • Numeric
  • Character (Text)
  • Date & Time
  • Boolean & Categorical
  • Special & Advanced

Data Types in PostgreSQL

Data types define:

  • what kind of data a column can store?
  • how data is validated?
  • how much storage is used?
  • how fast queries can run?

Why Data Types Matter?

Choosing the wrong data type can:

  • break calculations
  • reduce performance
  • waste storage
  • introduce subtle errors

Numeric Data Types

Data Type Description Example Use Case
INTEGER Whole numbers 2500 Counts
BIGINT Large integers 9876543210 IDs
NUMERIC(p,s) Exact decimals 15432.75 Revenue
DOUBLE PRECISION Approximate 0.123456 KPIs

The Rule of Thumb:

  • Use INTEGER for counts
  • Use NUMERIC for money
  • Use DOUBLE PRECISION for ratios

Index performance depends on data type size:

  • types \(\rightarrow\) faster indexes
  • larger types \(\rightarrow\) more storage

Main Character Data Types

Data Type Description Example Value Typical Use Case
CHAR(n) Fixed-length character string CHAR(5) → ‘US’ Country codes, fixed formats
VARCHAR(n) Variable-length string with limit VARCHAR(50) → ‘Karen Hovhannisyan’ Names, emails
TEXT Variable-length string, no limit ‘This product has been discontinued.’ Descriptions, comments
CHARACTER VARYING(n) SQL-standard name for VARCHAR CHARACTER VARYING(20) → ‘A123XZ’ Codes, identifiers

Main Date and Time Data Types

Data Type Description Example Value Typical Use Case
DATE Calendar date (no time) 2025-03-15 Birth dates, order dates
TIME Time of day (no date) 14:30:00 Opening hours
TIME WITH TIME ZONE Time with time zone info 14:30:00+04 Cross-region schedules
TIMESTAMP Date and time (no timezone) 2025-03-15 14:30:00 Local event logs
TIMESTAMP WITH TIME ZONE (TIMESTAMPTZ) Date and time with timezone handling 2025-03-15 10:30:00+00 Auditing, analytics
INTERVAL Time duration 3 days 4 hours Session length, SLA

Boolean & Categorical Data Types

Data Type Description Example Value Typical Use Case
BOOLEAN Logical true / false value TRUE, FALSE Active flags, eligibility
CHAR(n) Fixed-length category code CHAR(1) → ‘Y’ Yes/No indicators
VARCHAR(n) Short categorical label ‘premium’ Customer segments
TEXT Free-form category label ‘high_value_customer’ Tags, labels
ENUM Predefined set of values (‘low’,‘medium’,‘high’) Controlled categories

Special Data Types | OPTIONAL

Data Type / Extension Description Example Value Typical Use Case
JSON / JSONB Semi-structured JSON data {“plan”:“premium”,“usage”:120} Logs, APIs, configs
ARRAY Array of values {1,2,3} Tags, multi-valued attributes
UUID Universally unique identifier 550e8400-e29b-41d4-a716-446655440000 Distributed IDs
INET IP address 192.168.1.1 Network traffic
CIDR Network block 192.168.0.0/24 Subnet modeling
GEOMETRY (PostGIS) Geometric objects POINT(40.18 44.51) Maps, locations
GEOGRAPHY (PostGIS) Earth-based coordinates POINT(44.51 40.18) Distance calculations
ltree Hierarchical tree paths region.city.store Organizational trees
pgRouting Graph/network extension N/A Network routing, telecom

PostGIS (Geospatial Extension)

PostGIS adds geospatial intelligence to PostgreSQL.

Common capabilities:

  • Distance calculations
  • Spatial joins
  • Radius searches
  • Area coverage

Important

As a data analyst you are not expected to design PostGIS schemas or routing graphs, but you will query them.

Documentation you can find here

Relational Database Schema

What Is a Schema?

A schema describes the structure of a database:

  • tables,
  • columns,
  • keys
  • relationships.

Star Schema

A star schema consists of:

  • one central fact table
  • multip le surrounding dimension tables

Important

Although our database is transactional, its structure closely resembles a star schema, making it well-suited for analytics.

Snowflake Schema

A snowflake schema extends the star schema by normalizing dimension tables into sub-dimensions.

.

Entity Relationship Diagram (ERD)

Try to generate the ERD for the sales database using your SQL client (PgAdmin).

Identify primary keys, foreign keys, and relationships.

.

DDL | Data Definition Language

What Is DDL?

DDL (Data Definition Language) is used to define and manage database structure.

  • operates on database objects, not rows
  • describes how data is stored
  • forms the foundation for all queries and analysis

DDL Scope

DDL affects:

  • tables
  • columns
  • constraints
  • indexes

DDL decisions directly impact:

  • data quality
  • query performance
  • scalability

Common DDL Statements

The most common DDL commands are:

  • CREATE
  • ALTER
  • DROP
  • TRUNCATE

Each command changes the database schema or metadata.

CREATE (DDL Context)

In DDL, CREATE is used to define new database objects.

Typical use cases:

  • creating schemas
  • creating database
  • creating tables
  • creating indexes

CREATE Example | Index

Example: create an index to improve analytical queries.

CREATE INDEX idx_sales_product_id
ON sales (product_id);

This command:

  • does not change data
  • improves data access performance

ALTER

ALTER modifies the structure of an existing object.

It allows schema evolution without recreating tables.

ALTER Example | Constraint

Example: enforce a business rule on existing data.

ALTER TABLE products
ADD CONSTRAINT chk_products_price
CHECK (price >= 0);

DROP

DROP permanently removes a database object.

  • structure is deleted
  • data is lost
  • operation is irreversible

DROP Example

Example: remove an unused index.

DROP INDEX idx_sales_product_id;

TRUNCATE

TRUNCATE removes all rows from a table while keeping its structure.

Compared to DELETE:

  • much faster
  • no WHERE clause
  • limited rollback support

TRUNCATE Example

TRUNCATE TABLE sales_staging;

CRUD | Data Manipulation

What Is CRUD?

CRUD represents operations used to work with data inside tables.

CRUD stands for:

  • CREATE
  • READ
  • UPDATE
  • DELETE

Critical Terminology Clarification

The word CREATE means different things depending on context.

  • DDL CREATE \(\rightarrow\) defines structure
  • CRUD CREATE \(\rightarrow\) inserts data

Confusing these is a common beginner mistake.

DDL vs CRUD | Mental Model

  • DDL \(\rightarrow\) defines the rules
  • CRUD \(\rightarrow\) follows the rules
  • DDL \(\rightarrow\) structure
  • CRUD \(\rightarrow\) data

CREATE (CRUD \(\rightarrow\) INSERT)

In CRUD, CREATE means adding new rows.

This is done using INSERT.

INSERT INTO products (product_id, product_name, price, category)
VALUES (101, 'Wireless Mouse', 24.99, 'Accessories');

READ (SELECT)

READ retrieves data and is implemented using SELECT.

SELECT
  product_id,
  product_name
FROM products;

READ with Filtering

Filtering rows is done using WHERE.

SELECT
  *
FROM sales
WHERE total_sales < 50;

UPDATE

UPDATE modifies existing records.

Always use WHERE to target specific rows.

UPDATE products
SET price = 49.99
WHERE product_id = 12;

DELETE

DELETE removes records from a table.

Deletions are irreversible and must be used with caution.

DELETE FROM sales
WHERE transaction_id = 1004;

Constraints

Why Constraints Matter

Constraints play a crucial role in keeping your data organized, consistent, and reliable.

They define what type of data a table or column can accept and are typically applied when tables are created or later via ALTER TABLE.

When defined correctly, constraints:

enforce data integrity
prevent invalid or inconsistent data
act as built-in data quality checks

Types of Constraints

The most common constraints you will encounter are:

UNIQUE
NOT NULL
PRIMARY KEY
FOREIGN KEY
CHECK

Each constraint enforces a specific business or technical rule.

UNIQUE Constraint

The UNIQUE constraint ensures that all values in a column are distinct.

It is commonly used for attributes that must be unique across entities.

Typical use cases include:

email addresses
usernames
national identification numbers

UNIQUE Constraint | Example

Ensure each customer email is unique.

CREATE TABLE customers (
  customer_id INTEGER PRIMARY KEY,
  email TEXT UNIQUE,
  phone_number TEXT
);

With this constraint in place, PostgreSQL will reject any attempt to insert a duplicate email address.

NOT NULL Constraint

The NOT NULL constraint ensures that a column cannot contain NULL values.

Use this constraint for fields that are mandatory and must always be provided.

Typical use cases include:

primary identifiers
contact information
transaction timestamps

NOT NULL Constraint | Example

Ensure every customer has a phone number.

CREATE TABLE customers (
  customer_id INTEGER PRIMARY KEY,
  phone_number TEXT NOT NULL,
  email TEXT
);

If an insert is attempted without a phone number, PostgreSQL will return an error.

PRIMARY KEY Constraint

A PRIMARY KEY uniquely identifies each row in a table.

A primary key:

must be unique
cannot contain NULL values
exists only once per table

PRIMARY KEY Constraint | Example

Define a primary key for a products table.

CREATE TABLE products (
  product_id INTEGER PRIMARY KEY,
  product_name TEXT NOT NULL,
  price NUMERIC(10, 2)
);

The PRIMARY KEY constraint automatically enforces both UNIQUE and NOT NULL.

FOREIGN KEY Constraint

A FOREIGN KEY creates a relationship between two tables.

It ensures referential integrity, meaning referenced values must exist in the parent table.

FOREIGN KEY Constraint | Example

Link sales records to customers.

CREATE TABLE sales (
  transaction_id INTEGER PRIMARY KEY,
  customer_id INTEGER,
  total_sales NUMERIC(10, 2),
  FOREIGN KEY (customer_id)
    REFERENCES customers (customer_id)
);

With this constraint, you cannot insert a sale for a customer that does not exist.

CHECK Constraint

The CHECK constraint restricts the range or condition of values that can be inserted into a column.

It validates data using logical expressions.

Typical use cases include:

value ranges
minimum or maximum limits
domain rules

CHECK Constraint | Example

Ensure total sales values are non-negative.

CREATE TABLE sales (
  transaction_id INTEGER PRIMARY KEY,
  total_sales NUMERIC(10, 2) CHECK (total_sales >= 0)
);

If a value violates the condition, the insert or update will fail.

SQL Rules and Best Practices

Why Best Practices Matter

Writing SQL that works is not enough.

Good SQL should be:

readable
consistent
maintainable
easy to debug

These rules help teams collaborate and reduce errors.

Numbers and Underscores

In relational databases:

table names cannot start with a number
column names cannot contain spaces
underscores should be used to separate words

This improves compatibility and readability.

Naming | Examples

Recommended:

CREATE TABLE customer_lifecycle;
CREATE TABLE customer;

Not recommended:

CREATE TABLE customer lifecycle;
CREATE TABLE 3customer;

Capitalization

A widely accepted convention is:

SQL keywords \(\rightarrow\) UPPERCASE
table names \(\rightarrow\) lowercase
column names \(\rightarrow\) lowercase

This improves readability and consistency.

Capitalization | Examples

Recommended:

SELECT product_id, product_name
FROM products;
SELECT transaction_id, total_sales
FROM sales;

Not recommended:

Select PRODUCT_ID, PRODUCT_NAME From PRODUCTS;
SELECT TRANSACTION_ID, TOTAL_SALES FROM SALES;

Commenting SQL Code

Commenting helps others (and your future self) understand your logic.

Comments:

do not affect execution
improve documentation
help during debugging

Commenting | Examples

Single-line comments:

-- select and display the first 10 rows from the sales table
SELECT *
FROM sales
LIMIT 10;

Multi-line comments:

/*
SELECT
  s.transaction_id,
  p.product_name,
  s.total_sales
FROM sales s
JOIN products p
  ON s.product_id = p.product_id
WHERE s.total_sales > 100;
*/

Aliasing

Aliases assign temporary names to tables or columns.

They are useful for:

improving readability
shortening long names
resolving ambiguity in joins

Aliasing | Examples

Column aliases:

SELECT
  product_name AS item_name,
  price AS unit_price
FROM products;

Table aliases in joins:

SELECT
  s.transaction_id,
  p.product_name,
  s.total_sales
FROM sales AS s
JOIN products AS p
  ON s.product_id = p.product_id;

Summary

constraints protect data quality
best practices improve readability
consistent SQL scales better in teams
clean SQL enables better optimization