r = 100
t = 0.2
total = r * (1 + t)
total120.0
Let revenue be \(r\) and tax rate be \(t\).
The total revenue including tax is:
\[ \text{total} = r \times (1 + t) \]
Python supports:
+-*/**Boolean values are either True or False.
Logical operators:
andornotBoolean logic becomes essential when filtering data.
Before working comfortably with pandas, we must understand the core Python data structures.
Every DataFrame is built on top of them.
In analytics, these structures represent:
A list is an ordered, mutable collection.
Properties:
Access elements:
Modify elements:
Remove elements:
Length of the list:
A list can represent:
If values are \(x_1, x_2, ..., x_n\), the total revenue is:
\[ \sum_{i=1}^{n} x_i \]
A tuple is ordered but immutable.
Properties:
Why immutability matters:
A set stores unique values.
Properties:
Analytical use case:
A dictionary maps keys to values.
{'name': 'Anna', 'revenue': 150, 'city': 'Yerevan'}
Properties:
Access values:
Add or update:
{'name': 'Anna', 'revenue': 200, 'city': 'Yerevan', 'segment': 'Premium'}
Remove:
A collection of dictionaries can represent tabular data:
[{'name': 'Anna', 'revenue': 150},
{'name': 'David', 'revenue': 220},
{'name': 'Mariam', 'revenue': 90}]
This structure is very close to what pandas formalizes.
A pandas DataFrame is conceptually:
flowchart LR
A[List] --> C[Dictionary]
C --> D[DataFrame]
| name | revenue | city | |
|---|---|---|---|
| 0 | Anna | 150 | Yerevan |
| 1 | David | 220 | Tbilisi |
| 2 | Mariam | 90 | Warsaw |
<class 'pandas.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 3 non-null str
1 revenue 3 non-null int64
2 city 3 non-null str
dtypes: int64(1), str(2)
memory usage: 239.0 bytes
Index(['name', 'revenue', 'city'], dtype='str')
Suppose tax rate \(t = 0.2\).
\[ \text{revenue\_after\_tax} = r \times (1 - t) \]
| name | revenue | city | revenue_after_tax | |
|---|---|---|---|---|
| 0 | Anna | 150 | Yerevan | 120.0 |
| 1 | David | 220 | Tbilisi | 176.0 |
| 2 | Mariam | 90 | Warsaw | 72.0 |
| name | revenue | revenue_after_tax | |
|---|---|---|---|
| 0 | Anna | 150 | 120.0 |
| 1 | David | 220 | 176.0 |
| 2 | Mariam | 90 | 72.0 |
To modify permanently:
| name | revenue | revenue_after_tax | |
|---|---|---|---|
| 0 | Anna | 150 | 120.0 |
| 1 | David | 220 | 176.0 |
| 2 | Mariam | 90 | 72.0 |
Equivalent SQL:
| name | revenue | revenue_after_tax | segment | |
|---|---|---|---|---|
| 0 | Anna | 150 | 120.0 | High |
| 1 | David | 220 | 176.0 | High |
| 2 | Mariam | 90 | 72.0 | Low |
This applies conditional logic to structured data.
flowchart LR
A[Python Structures] --> B[Dictionary of Lists]
B --> C[DataFrame]
C --> D[Select]
C --> E[Filter]
C --> F[Transform]
D --> G[Insights]
E --> G
F --> G
We have have moved from:
To:
Understanding the foundations makes pandas intuitive instead of magical.
Next, after contidions and loops, we will go deeper into selection, filtering, and aggregation using real datasets.
Understanding mutability is essential for writing correct analytical code.
Mutable objects can be changed after creation.
listdictsetDataFrameExample:
The original object is modified.
Immutable objects cannot be changed after creation.
intfloatstrbooltupleExample:
Here, a new object is created. The original value is not modified.
Trying to modify a tuple:
Mutability affects:
If you modify a mutable object inside a function, the original data may change.
Understanding this prevents subtle analytical bugs.
Conditional statements allow a program to make decisions. In data analytics, decisions appear everywhere:
At the core of these operations lies the if statement.
if StatementStructure:
if keywordTrue or False:If the condition is True, the indented block runs.
If it is False, nothing happens.
In Python, indentation defines structure.
It is not optional formatting — it is syntax.
High revenue
Analysis complete
Only the indented line belongs to the if block.
If indentation is incorrect, Python raises an error:
Best practice:
elif for Multiple ConditionsWhen there are more than two possible categories, use elif.
High revenue
Important principles:
True condition executesEvery conditional statement depends on a Boolean expression.
A Boolean expression evaluates to either True or False.
Common comparison operators:
> greater than< less than>= greater than or equal<= less than or equal== equal!= not equalThese operators form the foundation of filtering logic in data analysis.
Often, a single condition is not enough.
Python provides logical operators:
andornotTarget customer
Rules:
and → both conditions must be Trueor → at least one condition must be Truenot → reverses the Boolean valueExample:
Logical operators become extremely important when filtering datasets with multiple criteria.
Conditionals can be placed inside other conditionals.
High EU revenue
Notice how indentation increases with each nested block.
Each indentation level represents a new logical layer.
Deep nesting reduces readability.
In data analytics, clarity is preferred over complexity.
1. Using = instead of ==
Correct:
= assigns a value.
== compares values.
2. Forgetting the colon
The colon is mandatory.
3. Incorrect indentation
Python requires consistent indentation.
flowchart TD
A[Start] --> B{Condition True?}
B -->|Yes| C[Execute Block]
B -->|No| D[Skip Block]
C --> E[Continue]
D --> E
Indentation defines what belongs to the decision branch.
Conditional logic is the basis of:
Every WHERE clause in SQL is conceptually an if statement applied to rows.
Understanding conditional statements deeply ensures that later pandas filtering feels natural and intuitive.
Conditional statements allow decisions.
Loops allow repetition.
In data analytics, repetition appears constantly:
The most common loop in Python is the for loop.
for LoopStructure:
for keywordvalue)insales):The loop runs once for each element in the collection.
for Loop Worksflowchart TD
A[Start] --> B[Take first element]
B --> C[Execute indented block]
C --> D{More elements?}
D -->|Yes| B
D -->|No| E[Stop]
Each iteration processes one element.
Loops are often used to compute totals.
Mathematically, if values are \(x_1, x_2, ..., x_n\):
\[ \text{Total} = \sum_{i=1}^{n} x_i \]
This manual summation mirrors what sum() does internally.
You can combine loops and conditions.
High sale: 200
High sale: 150
Now we are:
This is conceptually similar to a SQL WHERE clause.
Loops are not limited to lists.
name : Anna
revenue : 150
city : Yerevan
You can also iterate over key–value pairs:
range() FunctionSometimes you need numeric iteration.
range(5) generates:
flowchart LR
A[range 0 to 4] --> B[0]
A --> C[1]
A --> D[2]
A --> E[3]
A --> F[4]
Loops can be nested inside each other.
i = 0 , j = 0
i = 0 , j = 1
i = 1 , j = 0
i = 1 , j = 1
i = 2 , j = 0
i = 2 , j = 1
Indentation increases with nesting.
Each additional level increases computational complexity.
1. Forgetting indentation
Indentation is required.
2. Modifying a collection while iterating
This can lead to unexpected behavior.
In analytics, loops are useful for:
However:
For tabular data, pandas vectorized operations are usually faster and cleaner than loops.
Loops are foundational knowledge.
Vectorization is analytical optimization.
You now understand:
for loop structurerange()For more information on loops, see the official Python documentation: or this tutorial:
List comprehension allows us to create new lists in a concise and readable way.
It replaces many simple loops.
Equivalent traditional loop:
Now using list comprehension:
The result is identical, but the syntax is cleaner.
You can add a condition:
Traditional version:
List comprehension combines:
In one readable line.
If values are \(x_1, x_2, ..., x_n\), and we want only those where \(x_i > 120\):
\[ \{ x_i \mid x_i > 120 \} \]
List comprehension expresses this directly in code.
You can also transform conditionally:
This mirrors segmentation logic.
Use when:
Avoid when:
Readability is more important than brevity.
flowchart LR
A[Original List] --> B[Iterate]
B --> C{Condition?}
C -->|Yes| D[Transform]
C -->|No| E[Skip or Alternate]
D --> F[New List]
E --> F
List comprehension is structured iteration with transformation.
Given:
"High" if > 200, otherwise "Normal".Use list comprehension only.
List comprehension is conceptually similar to:
Soon, you will see how pandas vectorizes this behavior.
This homework integrates:
You will simulate a small revenue analytics pipeline and submit your work as a Jupyter Notebook (.ipynb) file.
Create a 02_python_foundamentals_for_data_analytics.ipynb file and complete the following tasks. Once finished push to GitHub and share the link.
You are analyzing weekly transaction data for a small company.
The company wants to:
Given:
Let revenue be \(r\).
Final revenue formula:
\[ \text{final} = r \times (1 + \text{tax\_rate}) \times (1 - \text{discount\_rate}) \]
Using:
max().Segmentation rules:
elif statements matter?Given:
Create a dictionary:
"segment" based on revenue."city".Using:
Convert your data into a DataFrame.
"revenue_after_tax"."segment" using conditional logic."revenue_after_tax" column.Answer briefly in Markdown:
WHERE.ipynb fileNotebook structure should be clean and readable.
flowchart LR
A[Raw Revenues] --> B[Arithmetic Transformations]
B --> C[Conditional Classification]
C --> D[Deduplication]
D --> E[Dictionary Structure]
E --> F[DataFrame]
F --> G[Filtering & Transformation]