Data Analytics Bootcamp
  • Syllabus
  • Statistical Thinking
  • SQL
  • Python
  • Tableau
  • Lab
  • Capstone
  1. Python
  2. Python
  3. Session 02: Python basic Syntax, Data Structures
  • Syllabus
  • Statistical Thinking
    • Statistics
      • Statistics Session 01: Data Layers and Bias in Data
      • Statistics Session 02: Data Types
      • Statistics Session 03: Probabilistic Distributions
      • Statistics Session 04: Probabilistic Distributions
      • Statistics Session 05: Sampling
      • Statistics Session 06: Inferential Statistics
      • Slides
        • Course Intro
        • Descriptive Stats
        • Data Types
        • Continuous Distributions
        • Discrete Distributions
        • Sampling
        • Hypothesis Testing
  • SQL
    • SQL
      • Session 01: Intro to Relational Databases
      • Session 02: Intro to PostgreSQL
      • Session 03: DA with SQL | Data Types & Constraints
      • Session 04: DA with SQL | Filtering
      • Session 05: DA with SQL | Numeric Functions
      • Session 06: DA with SQL | String Functions
      • Session 07: DA with SQL | Date Functions
      • Session 08: DA with SQL | JOINs
      • Session 09: DA with SQL | Advanced SQL
      • Session 10: DA with SQL | Advanced SQL Functions
      • Session 11: DA with SQL | UDFs, Stored Procedures
      • Session 12: DA with SQL | Advanced Aggregations
      • Session 13: DA with SQL | Final Project
      • Slides
        • Intro to Relational Databases
        • Intro to PostgreSQL
        • Basic Queries: DDL DLM
        • Filtering
        • Numeric Functions
        • String Functions
        • Date Functions
        • Normalization and JOINs
        • Temporary Tables
        • Advanced SQL Functions
        • Reporting and Analysis with SQL
        • Advanced Aggregations
  • Python
    • Python
      • Session 01: Programming for Data Analysts
      • Session 02: Python basic Syntax, Data Structures
      • Session 03: Introduction to Pandas
      • Session 04: Advanced Pandas
      • Session 05: Intro to Data Visualization
      • Session 06: Data Visualization
      • Session 07: Working with Dates
      • Session 08: Data Visualization | Plotly
      • Session 09: Customer Segmentation | RFM
      • Slides
        • Data Analyst
  • Tableau
    • Tableau
      • Tableau Session 01: Introduction to Tableau
      • Tableau Session 02: Intermediate Visual Analytics
      • Tableau Session 03: Advanced Analytics
      • Tableau Session 04: Dashboard Design & Performance
      • Slides
        • Data Analyst
        • Data Analyst
        • Data Analyst
        • Data Analyst

On this page

  • Arithmetic Operations
  • Basic Data Structures
  • List
    • Analytical Context
  • Tuple
  • Set
  • Dictionary
  • From Dictionary to Structured Data
  • Transition to Pandas
    • Creating a DataFrame
    • Inspecting Structure
  • Basic DataFrame Manipulation
    • Selecting Columns
    • Adding a Column
    • Removing a Column
    • Filtering Rows
    • Updating Values
    • Analytical Flow
  • Conditional Statements
    • The Basic if Statement
    • Indentation | Why It Matters
    • Using elif for Multiple Conditions
    • Boolean Expressions and Comparison Operators
    • Combining Conditions with Logical Operators
    • Nested Conditional Statements
    • Common Mistakes
    • Visualizing Conditional Flow
    • Why Conditional Logic Matters in Analytics
  • Loops
    • The Basic for Loop
    • How a for Loop Works
    • Loop With Accumulation
    • Loop With Conditional Logic
    • Looping Over Dictionaries
    • The range() Function
    • Nested Loops
    • Common Mistakes With Loops
    • Analytical Perspective
    • Summary
  • List Comprehension
    • Basic Structure
    • With Conditional Filtering
    • Mathematical Interpretation
    • Conditional Expression Inside Comprehension
    • When to Use List Comprehension
    • Conceptual Flow
    • Train Yourself
  • Why This Matters for Pandas
  • Homework
    • Scenario
    • Part 1
    • Tasks
    • Part 2
    • Part 3
    • Part 4
    • Part 5
    • Part 6
    • Part 7
    • Bonus Reflection
    • Submission Requirements
    • Analytical Flow
  1. Python
  2. Python
  3. Session 02: Python basic Syntax, Data Structures

Session 02: Python basic Syntax, Data Structures

Python
List
Tuple
Set
Dictionary
Pandas DataFrames

Arithmetic Operations

Let revenue be \(r\) and tax rate be \(t\).

The total revenue including tax is:

\[ \text{total} = r \times (1 + t) \]

r = 100
t = 0.2

total = r * (1 + t)
total
120.0

Python supports:

  • Addition +
  • Subtraction -
  • Multiplication *
  • Division /
  • Power **

Boolean values are either True or False.

100 > 50
100 == 50
100 != 50

Logical operators:

  • and
  • or
  • not
print((100 > 50) and (20 < 30))
True

Boolean logic becomes essential when filtering data.

Basic Data Structures

Before working comfortably with pandas, we must understand the core Python data structures.

Every DataFrame is built on top of them.

In analytics, these structures represent:

  • Collections of values
  • Observations
  • Attributes
  • Mappings between keys and values

List

A list is an ordered, mutable collection.

sales = [100, 200, 150, 200]
sales
[100, 200, 150, 200]

Properties:

  • Ordered
  • Indexed
  • Allows duplicates
  • Mutable

Access elements:

sales[0]
sales[-1]
200

Modify elements:

sales.append(300)
sales[1] = 250
sales
[100, 250, 150, 200, 300]

Remove elements:

sales.remove(150)
sales
[100, 250, 200, 300]

Length of the list:

len(sales)
4

Analytical Context

A list can represent:

  • Daily sales
  • Customer revenues
  • Monthly growth rates

If values are \(x_1, x_2, ..., x_n\), the total revenue is:

\[ \sum_{i=1}^{n} x_i \]

total = 0
for value in sales:
    total += value

total
850

Tuple

A tuple is ordered but immutable.

coordinates = (40.18, 44.51)
coordinates
(40.18, 44.51)

Properties:

  • Ordered
  • Indexed
  • Immutable

Why immutability matters:

  • Prevents accidental changes
  • Safe for constant data
  • Can be used as dictionary keys

Set

A set stores unique values.

customer_ids = {1, 2, 3, 3, 4}
customer_ids
{1, 2, 3, 4}

Properties:

  • Unordered
  • No duplicates
  • Fast membership checking

Analytical use case:

  • Removing duplicates
  • Comparing segments
segment_a = {1, 2, 3}
segment_b = {3, 4, 5}

segment_a.intersection(segment_b)
{3}

Dictionary

A dictionary maps keys to values.

customer = {
    "name": "Anna",
    "revenue": 150,
    "city": "Yerevan"
}

customer
{'name': 'Anna', 'revenue': 150, 'city': 'Yerevan'}

Properties:

  • Keys must be unique
  • Values can be any type
  • Fast lookup

Access values:

customer["revenue"]
150

Add or update:

customer["segment"] = "Premium"
customer["revenue"] = 200
customer
{'name': 'Anna', 'revenue': 200, 'city': 'Yerevan', 'segment': 'Premium'}

Remove:

del customer["city"]
customer
{'name': 'Anna', 'revenue': 200, 'segment': 'Premium'}

From Dictionary to Structured Data

A collection of dictionaries can represent tabular data:

customers = [
    {"name": "Anna", "revenue": 150},
    {"name": "David", "revenue": 220},
    {"name": "Mariam", "revenue": 90}
]

customers
[{'name': 'Anna', 'revenue': 150},
 {'name': 'David', 'revenue': 220},
 {'name': 'Mariam', 'revenue': 90}]

This structure is very close to what pandas formalizes.


Transition to Pandas

A pandas DataFrame is conceptually:

  • A dictionary of columns
  • Each column behaves like a labeled list

flowchart LR
    A[List] --> C[Dictionary]
    C --> D[DataFrame]


Creating a DataFrame

import pandas as pd

data = {
    "name": ["Anna", "David", "Mariam"],
    "revenue": [150, 220, 90],
    "city": ["Yerevan", "Tbilisi", "Warsaw"]
}

df = pd.DataFrame(data)
df
name revenue city
0 Anna 150 Yerevan
1 David 220 Tbilisi
2 Mariam 90 Warsaw

Inspecting Structure

df.info()
df.shape
df.columns
<class 'pandas.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   name     3 non-null      str  
 1   revenue  3 non-null      int64
 2   city     3 non-null      str  
dtypes: int64(1), str(2)
memory usage: 239.0 bytes
Index(['name', 'revenue', 'city'], dtype='str')

Basic DataFrame Manipulation

Selecting Columns

df["revenue"]
df[["name", "revenue"]]
name revenue
0 Anna 150
1 David 220
2 Mariam 90

Adding a Column

Suppose tax rate \(t = 0.2\).

\[ \text{revenue\_after\_tax} = r \times (1 - t) \]

t = 0.2
df["revenue_after_tax"] = df["revenue"] * (1 - t)
df
name revenue city revenue_after_tax
0 Anna 150 Yerevan 120.0
1 David 220 Tbilisi 176.0
2 Mariam 90 Warsaw 72.0

Removing a Column

df.drop("city", axis=1)
name revenue revenue_after_tax
0 Anna 150 120.0
1 David 220 176.0
2 Mariam 90 72.0

To modify permanently:

df = df.drop("city", axis=1)
df
name revenue revenue_after_tax
0 Anna 150 120.0
1 David 220 176.0
2 Mariam 90 72.0

Filtering Rows

df[df["revenue"] > 100]
name revenue revenue_after_tax
0 Anna 150 120.0
1 David 220 176.0

Equivalent SQL:

SELECT *
FROM customers
WHERE revenue > 100;

Updating Values

df.loc[df["revenue"] < 100, "segment"] = "Low"
df.loc[df["revenue"] >= 100, "segment"] = "High"
df
name revenue revenue_after_tax segment
0 Anna 150 120.0 High
1 David 220 176.0 High
2 Mariam 90 72.0 Low

This applies conditional logic to structured data.

Analytical Flow

flowchart LR
    A[Python Structures] --> B[Dictionary of Lists]
    B --> C[DataFrame]
    C --> D[Select]
    C --> E[Filter]
    C --> F[Transform]
    D --> G[Insights]
    E --> G
    F --> G

We have have moved from:

  • Lists
  • Dictionaries
  • Sets
  • Tuples

To:

  • Structured tabular data
  • Column manipulation
  • Row filtering
  • Feature creation
Important

Understanding the foundations makes pandas intuitive instead of magical.

Next, after contidions and loops, we will go deeper into selection, filtering, and aggregation using real datasets.

ImportantMutable vs Immutable

Understanding mutability is essential for writing correct analytical code.

Mutable Objects

Mutable objects can be changed after creation.

  • list
  • dict
  • set
  • pandas DataFrame

Example:

sales = [100, 200, 150]
sales[0] = 300
sales
[300, 200, 150]

The original object is modified.


Immutable Objects

Immutable objects cannot be changed after creation.

  • int
  • float
  • str
  • bool
  • tuple

Example:

x = 10
x = x + 5
x
15

Here, a new object is created. The original value is not modified.

Trying to modify a tuple:

coordinates = (40.18, 44.51)
# coordinates[0] = 41   # Error

Why This Matters in Data Analytics

Mutability affects:

  • Function behavior
  • Memory references
  • Unexpected side effects
  • pandas chained assignments

If you modify a mutable object inside a function, the original data may change.

Understanding this prevents subtle analytical bugs.

Conditional Statements

Conditional statements allow a program to make decisions. In data analytics, decisions appear everywhere:

  • Filtering rows
  • Classifying customers
  • Detecting anomalies
  • Creating segments
  • Handling missing values

At the core of these operations lies the if statement.

The Basic if Statement

revenue = 150

if revenue > 100:
    print("High revenue")
else:
    print("Normal revenue")
High revenue

Structure:

  • if keyword
  • A condition that evaluates to True or False
  • A colon :
  • An indented block of code

If the condition is True, the indented block runs.
If it is False, nothing happens.

Indentation | Why It Matters

In Python, indentation defines structure.
It is not optional formatting — it is syntax.

revenue = 150

if revenue > 100:
    print("High revenue")

print("Analysis complete")
High revenue
Analysis complete

Only the indented line belongs to the if block.

If indentation is incorrect, Python raises an error:

revenue = 150

if revenue > 100:
print("High revenue")   # IndentationError

Best practice:

  • Use 4 spaces per indentation level
  • Never mix tabs and spaces
  • Keep indentation consistent

Using elif for Multiple Conditions

When there are more than two possible categories, use elif.

revenue = 180

if revenue > 200:
    print("Very high revenue")
elif revenue > 100:
    print("High revenue")
elif revenue > 50:
    print("Medium revenue")
else:
    print("Low revenue")
High revenue

Important principles:

  • Conditions are evaluated from top to bottom
  • The first True condition executes
  • Remaining conditions are skipped

Boolean Expressions and Comparison Operators

Every conditional statement depends on a Boolean expression.

A Boolean expression evaluates to either True or False.

100 > 50
100 == 50
100 != 50
True

Common comparison operators:

  • > greater than
  • < less than
  • >= greater than or equal
  • <= less than or equal
  • == equal
  • != not equal

These operators form the foundation of filtering logic in data analysis.

Combining Conditions with Logical Operators

Often, a single condition is not enough.

Python provides logical operators:

  • and
  • or
  • not
revenue = 150
is_active = True

if revenue > 100 and is_active:
    print("Target customer")
Target customer

Rules:

  • and → both conditions must be True
  • or → at least one condition must be True
  • not → reverses the Boolean value

Example:

is_active = False

if not is_active:
    print("Customer is inactive")
Customer is inactive

Logical operators become extremely important when filtering datasets with multiple criteria.

Nested Conditional Statements

Conditionals can be placed inside other conditionals.

revenue = 150
region = "EU"

if revenue > 100:
    if region == "EU":
        print("High EU revenue")
    else:
        print("High non-EU revenue")
High EU revenue

Notice how indentation increases with each nested block.

Each indentation level represents a new logical layer.

Deep nesting reduces readability.
In data analytics, clarity is preferred over complexity.

Common Mistakes

1. Using = instead of ==

if revenue = 100:   # SyntaxError

Correct:

if revenue == 100:
    print("Equal")

= assigns a value.
== compares values.


2. Forgetting the colon

if revenue > 100
    print("High")

The colon is mandatory.


3. Incorrect indentation

if revenue > 100:
print("High")

Python requires consistent indentation.


Visualizing Conditional Flow

flowchart TD
    A[Start] --> B{Condition True?}
    B -->|Yes| C[Execute Block]
    B -->|No| D[Skip Block]
    C --> E[Continue]
    D --> E

Indentation defines what belongs to the decision branch.


Why Conditional Logic Matters in Analytics

Conditional logic is the basis of:

  • Data filtering
  • Segmentation
  • Rule-based scoring
  • Feature creation
  • Data validation

Every WHERE clause in SQL is conceptually an if statement applied to rows.

Understanding conditional statements deeply ensures that later pandas filtering feels natural and intuitive.

Loops

Important

Conditional statements allow decisions.

Loops allow repetition.

In data analytics, repetition appears constantly:

  • Iterating over values
  • Applying rules to many observations
  • Aggregating manually
  • Cleaning records
  • Transforming data

The most common loop in Python is the for loop.


The Basic for Loop

sales = [100, 200, 150]

for value in sales:
    print(value)
100
200
150

Structure:

  • for keyword
  • A temporary variable (value)
  • The keyword in
  • An iterable object (sales)
  • A colon :
  • An indented block

The loop runs once for each element in the collection.


How a for Loop Works

flowchart TD
    A[Start] --> B[Take first element]
    B --> C[Execute indented block]
    C --> D{More elements?}
    D -->|Yes| B
    D -->|No| E[Stop]

Each iteration processes one element.

Loop With Accumulation

Loops are often used to compute totals.

sales = [100, 200, 150]

total = 0

for value in sales:
    total = total + value

total
450

Mathematically, if values are \(x_1, x_2, ..., x_n\):

\[ \text{Total} = \sum_{i=1}^{n} x_i \]

This manual summation mirrors what sum() does internally.


Loop With Conditional Logic

You can combine loops and conditions.

sales = [100, 200, 150, 50]

for value in sales:
    if value > 120:
        print("High sale:", value)
High sale: 200
High sale: 150

Now we are:

  • Iterating
  • Evaluating
  • Filtering

This is conceptually similar to a SQL WHERE clause.


Looping Over Dictionaries

Loops are not limited to lists.

customer = {
    "name": "Anna",
    "revenue": 150,
    "city": "Yerevan"
}

for key in customer:
    print(key, ":", customer[key])
name : Anna
revenue : 150
city : Yerevan

You can also iterate over key–value pairs:

for key, value in customer.items():
    print(key, value)
name Anna
revenue 150
city Yerevan

The range() Function

Sometimes you need numeric iteration.

range(5) generates:

for i in range(5):
    print(i)
0
1
2
3
4

flowchart LR
    A[range 0 to 4] --> B[0]
    A --> C[1]
    A --> D[2]
    A --> E[3]
    A --> F[4]


Nested Loops

Loops can be nested inside each other.

for i in range(3):
    for j in range(2):
        print("i =", i, ", j =", j)
i = 0 , j = 0
i = 0 , j = 1
i = 1 , j = 0
i = 1 , j = 1
i = 2 , j = 0
i = 2 , j = 1

Indentation increases with nesting.

Each additional level increases computational complexity.


Common Mistakes With Loops

1. Forgetting indentation

for value in sales:
print(value)

Indentation is required.


2. Modifying a collection while iterating

This can lead to unexpected behavior.

Analytical Perspective

In analytics, loops are useful for:

  • Custom feature engineering
  • Rule-based transformations
  • Processing API responses
  • Working with nested structures

However:

For tabular data, pandas vectorized operations are usually faster and cleaner than loops.

Loops are foundational knowledge.
Vectorization is analytical optimization.

Summary

You now understand:

  • Basic for loop structure
  • Loop flow
  • Accumulation logic
  • Combining loops with conditions
  • Iterating over dictionaries
  • Using range()
  • Nested loops

For more information on loops, see the official Python documentation: or this tutorial:

List Comprehension

List comprehension allows us to create new lists in a concise and readable way.

It replaces many simple loops.

Basic Structure

new_list = [expression for item in iterable]

Equivalent traditional loop:

sales = [100, 200, 150]

new_sales = []

for value in sales:
    new_sales.append(value * 1.2)

new_sales

Now using list comprehension:

sales = [100, 200, 150]

new_sales = [value * 1.2 for value in sales]
new_sales

The result is identical, but the syntax is cleaner.


With Conditional Filtering

You can add a condition:

sales = [100, 200, 150, 50]

high_sales = [value for value in sales if value > 120]
high_sales

Traditional version:

high_sales = []

for value in sales:
    if value > 120:
        high_sales.append(value)

high_sales

List comprehension combines:

  • Iteration
  • Conditional filtering
  • Transformation

In one readable line.


Mathematical Interpretation

If values are \(x_1, x_2, ..., x_n\), and we want only those where \(x_i > 120\):

\[ \{ x_i \mid x_i > 120 \} \]

List comprehension expresses this directly in code.


Conditional Expression Inside Comprehension

You can also transform conditionally:

sales = [100, 200, 150, 50]

labels = ["High" if value > 120 else "Low" for value in sales]
labels

This mirrors segmentation logic.


When to Use List Comprehension

Use when:

  • You are transforming a list
  • You are filtering values
  • The logic is simple and readable

Avoid when:

  • Logic becomes too complex
  • Multiple nested conditions reduce clarity

Readability is more important than brevity.

Conceptual Flow

flowchart LR
    A[Original List] --> B[Iterate]
    B --> C{Condition?}
    C -->|Yes| D[Transform]
    C -->|No| E[Skip or Alternate]
    D --> F[New List]
    E --> F

List comprehension is structured iteration with transformation.

Train Yourself

Given:

revenues = [120, 250, 80, 310, 95]
  1. Create a new list with revenues after applying 10% tax.
  2. Create a list containing only revenues greater than 100.
  3. Create a list labeling each revenue as "High" if > 200, otherwise "Normal".

Use list comprehension only.


Why This Matters for Pandas

List comprehension is conceptually similar to:

  • Creating new columns
  • Applying transformations
  • Conditional feature engineering

Soon, you will see how pandas vectorizes this behavior.

Homework

This homework integrates:

  • Arithmetic operations
  • Boolean logic
  • Lists, sets, dictionaries
  • Conditional statements
  • Loops
  • List comprehension
  • Basic pandas DataFrame manipulation

You will simulate a small revenue analytics pipeline and submit your work as a Jupyter Notebook (.ipynb) file.

Note

Create a 02_python_foundamentals_for_data_analytics.ipynb file and complete the following tasks. Once finished push to GitHub and share the link.

Scenario

You are analyzing weekly transaction data for a small company.

The company wants to:

  • Adjust revenues for tax
  • Apply discount rules
  • Classify customers
  • Remove duplicate IDs
  • Prepare data for structured analysis

Part 1

Given:

revenues = [120, 250, 80, 310, 95]
tax_rate = 0.18
discount_rate = 0.10

Let revenue be \(r\).

Final revenue formula:

\[ \text{final} = r \times (1 + \text{tax\_rate}) \times (1 - \text{discount\_rate}) \]

Tasks

  1. Compute revenues including tax.
  2. Compute final revenues after tax and discount.
  3. Create a Boolean list indicating whether revenue > 100.
  4. Create a Boolean list indicating whether revenue is between 100 and 300.
  5. Add Markdown explanation: Why do parentheses matter in the formula?

Part 2

Using:

sales = [120, 250, 80, 310, 95]
  1. Compute total revenue manually using a loop.
  2. Compute average revenue.
  3. Identify the maximum value without using max().
  4. Count how many values are greater than 150.
  5. Create a new list of revenues after adding 5% commission.

Part 3

Segmentation rules:

  • “Premium” if revenue > 250
  • “Standard” if 100 < revenue ≤ 250
  • “Low” otherwise

  1. Create a list of segment labels.
  2. Count how many customers fall into each segment.
  3. Add Markdown explanation: Why does the order of elif statements matter?

Part 4

Given:

customer_ids = [1, 2, 3, 3, 4, 5, 5, 6]

  1. Remove duplicates using a set.
  2. Convert back to a list.
  3. Compare lengths before and after deduplication.
  4. Explain in Markdown why sets are unordered.

Part 5

Create a dictionary:

customer = {
    "name": "Anna",
    "revenue": 250,
    "city": "Yerevan"
}

  1. Add a new key "segment" based on revenue.
  2. Update revenue to 300.
  3. Remove "city".
  4. Loop over the dictionary and print key-value pairs.
  5. Add Markdown explanation: Why are dictionaries useful for structured data?

Part 6

Using:

revenues = [120, 250, 80, 310, 95]
  1. Create a list of revenues after 10% tax.
  2. Create a list of revenues greater than 100.
  3. Create a list labeling each revenue as:
    • “High” if > 200
    • “Normal” otherwise
  4. Compare list comprehension vs loop in Markdown.

Part 7

Convert your data into a DataFrame.

import pandas as pd

data = {
    "revenue": revenues
}

df = pd.DataFrame(data)
df

  1. Add column "revenue_after_tax".
  2. Add column "segment" using conditional logic.
  3. Filter rows where revenue > 100.
  4. Remove the "revenue_after_tax" column.
  5. Print the shape of the DataFrame.
  6. Add Markdown explanation: Why is this easier than manual loops?

Bonus Reflection

Answer briefly in Markdown:

  • Difference between mutable and immutable objects
  • Why loops are less efficient than pandas vectorization
  • How Boolean logic relates to SQL WHERE
  • Why understanding lists helps understand DataFrames

Submission Requirements

  • Submit a .ipynb file
  • Use both code cells and Markdown cells
  • All code must execute without errors
  • Clearly label each section

Notebook structure should be clean and readable.

Analytical Flow

flowchart LR
    A[Raw Revenues] --> B[Arithmetic Transformations]
    B --> C[Conditional Classification]
    C --> D[Deduplication]
    D --> E[Dictionary Structure]
    E --> F[DataFrame]
    F --> G[Filtering & Transformation]