from datetime import datetime, date, timedeltaSession 07: Working with Dates
Working with Dates in Python
Before building time-based visualizations, we need to become comfortable working with dates. In analytics, dates are everywhere: transaction dates, signup dates, campaign start dates, delivery timestamps, and monthly summaries.
If you do not understand how Python handles dates, you will struggle with:
- filtering time periods
- calculating intervals
- grouping by month
- building correct visualizations.
In Python, there are two closely related ways to work with dates:
- the built-in
datetimemodule - Pandas date functionality
The datetime module is part of standard Python and is useful for creating and manipulating individual date and time objects.
Pandas extends this idea to entire columns and time series. It is the main tool analysts use when dates are stored inside DataFrames.
Why Date Handling Matters
As we have found out during the SQL sessions, Dates are not just text. Even if a date looks like "2024-07-01", Python must understand that it represents a calendar value rather than an ordinary string.
If dates remain as plain text, many analytical tasks become difficult or incorrect: be calculated properly - resampling and aggregation over time will not work correctly
That is why one of the first tasks in real analytical workflows is converting date columns into proper datetime format.
- sorting may behave like text sorting instead of time sorting
- filtering by year or month becomes inconvenient
- time differences cannot
datetime Module
The built-in datetime module is useful when working with individual date values.
Creating a Date Object
A date object stores only year, month, and day.
simple_date = date(2024, 7, 1)
simple_datedatetime.date(2024, 7, 1)
| Position | Value | Meaning |
|---|---|---|
| 1st | 2024 |
year |
| 2nd | 7 |
month |
| 3rd | 1 |
day |
Creating a Datetime Object
A datetime object stores both date and time.
full_datetime = datetime(2024, 7, 1, 14, 30, 0)
full_datetimedatetime.datetime(2024, 7, 1, 14, 30)
| Position | Value | Meaning |
|---|---|---|
| 1st | 2024 |
year |
| 2nd | 7 |
month |
| 3rd | 1 |
day |
| 4th | 14 |
hour |
| 5th | 30 |
minute |
| 6th | 0 |
second |
Getting the Current Date and Time
today = date.today()
now = datetime.now()
print(f'today is {today}')
print(f'right now is {now}')today is 2026-04-02
right now is 2026-04-02 16:54:47.385112
Accessing Parts of a Datetime
A datetime object has useful attributes.
print(f"Year: {full_datetime.year}")
print(f"Month: {full_datetime.month}")
print(f"Day: {full_datetime.day}")
print(f"Hour: {full_datetime.hour}")
print(f"Minute: {full_datetime.minute}")
print(f"Second: {full_datetime.second}")Year: 2024
Month: 7
Day: 1
Hour: 14
Minute: 30
Second: 0
Formatting Dates as Strings
Most of the times we want to display a date in a readable format.
full_datetime.strftime("%Y-%m-%d")'2024-07-01'
full_datetime.strftime("%d/%m/%Y")'01/07/2024'
full_datetime.strftime("%B %d, %Y")'July 01, 2024'
Common Formatting Codes
| Code | Meaning | Example |
|---|---|---|
%Y |
4-digit year | 2024 |
%m |
2-digit month | 07 |
%d |
2-digit day | 01 |
%H |
hour (24-hour clock) | 14 |
%M |
minute | 30 |
%S |
second | 00 |
%b |
short month name | Jul |
%B |
full month name | July |
Parsing Strings into Datetime
This is the reverse of formatting.
dt = datetime.strptime("2024-07-01", "%Y-%m-%d")
dtdatetime.datetime(2024, 7, 1, 0, 0)
This is useful when reading text dates and converting them into structured date objects.
Arithmetic with Dates
One of the most important date operations is measuring time differences.
start = date(2024, 7, 1)
end = date(2024, 7, 15)
delta = end - start
deltadatetime.timedelta(days=14)
delta.days14
We can also add or subtract days using timedelta.
start + timedelta(days=10)datetime.date(2024, 7, 11)
start - timedelta(days=7)datetime.date(2024, 6, 24)
We can also add or subtract time using timedelta.
When we work with a date object, the most natural granularity is usually days or weeks.
print(start + timedelta(days=10))
print(start - timedelta(days=7))
print(start + timedelta(weeks=2))
print(start - timedelta(weeks=1))2024-07-11
2024-06-24
2024-07-15
2024-06-24
Other granularity are available as well: such as hours, minutes, or seconds, we usually work with a datetime object instead of a plain date.
start_dt = datetime(2024, 7, 1, 14, 30, 0)
print(start_dt + timedelta(days=1))
print(start_dt + timedelta(hours=5))
print(start_dt + timedelta(minutes=45))
print(start_dt + timedelta(seconds=30))
print(start_dt + timedelta(days=2, hours=3, minutes=15, seconds=10))2024-07-02 14:30:00
2024-07-01 19:30:00
2024-07-01 15:15:00
2024-07-01 14:30:30
2024-07-03 17:45:10
Here are the most common timedelta units:
| Argument | Meaning |
|---|---|
days |
add or subtract days |
weeks |
add or subtract weeks |
hours |
add or subtract hours |
minutes |
add or subtract minutes |
seconds |
add or subtract seconds |
So timedelta helps us move forward or backward in time with different levels of granularity.
Why datetime is Useful?
The built-in module is especially useful when:
- creating single date values manually
- measuring time differences
- formatting dates for display
- parsing custom date strings
However, once we start working with full columns of data, Pandas becomes much more practical.
Dates in Pandas
In analytics, dates usually appear inside DataFrames. As we know Pandas provides powerful tools to convert, extract, filter, and aggregate date values.
Creating a Sample DataFrame
import pandas as pd
df_dates = pd.DataFrame({
"customer_id": [101, 102, 103, 104, 105],
"signup_date": ["2024-01-15", "2024-02-20", "2024-03-05", "2024-03-18", "2024-04-10"],
"last_purchase": ["2024-06-01", "2024-06-15", "2024-07-01", "2024-07-08", "2024-07-12"]
})
df_dates| customer_id | signup_date | last_purchase | |
|---|---|---|---|
| 0 | 101 | 2024-01-15 | 2024-06-01 |
| 1 | 102 | 2024-02-20 | 2024-06-15 |
| 2 | 103 | 2024-03-05 | 2024-07-01 |
| 3 | 104 | 2024-03-18 | 2024-07-08 |
| 4 | 105 | 2024-04-10 | 2024-07-12 |
At this point, the dates are still strings.
Checking Data Types
df_dates.dtypescustomer_id int64
signup_date str
last_purchase str
dtype: object
Notice that the date columns are usually stored as object, which means text.
Newer version of pandas, might use txt instead of object.
Converting to Datetime
We use pd.to_datetime().
df_dates["signup_date"] = pd.to_datetime(df_dates["signup_date"])
df_dates["last_purchase"] = pd.to_datetime(df_dates["last_purchase"])
df_dates.dtypescustomer_id int64
signup_date datetime64[us]
last_purchase datetime64[us]
dtype: object
Now Pandas recognizes them as datetime columns.
When importing from the data csv or any other file type, it is recommended to explicitly mention the date type:
pd.read_file('filename.type', parse_dates = ['date_column_name'])Once converted, we can:
- extract year, month, day
- calculate time differences
- filter by time period
- sort properly by calendar order
- group by month, quarter, or year
Extracting Date Components with .dt
Pandas provides the .dt accessor for datetime columns.
Year, Month, and Day
df_dates["signup_year"] = df_dates["signup_date"].dt.year
df_dates["signup_month"] = df_dates["signup_date"].dt.month
df_dates["signup_day"] = df_dates["signup_date"].dt.day
df_dates| customer_id | signup_date | last_purchase | signup_year | signup_month | signup_day | |
|---|---|---|---|---|---|---|
| 0 | 101 | 2024-01-15 | 2024-06-01 | 2024 | 1 | 15 |
| 1 | 102 | 2024-02-20 | 2024-06-15 | 2024 | 2 | 20 |
| 2 | 103 | 2024-03-05 | 2024-07-01 | 2024 | 3 | 5 |
| 3 | 104 | 2024-03-18 | 2024-07-08 | 2024 | 3 | 18 |
| 4 | 105 | 2024-04-10 | 2024-07-12 | 2024 | 4 | 10 |
Month Name and Day Name
df_dates["signup_month_name"] = df_dates["signup_date"].dt.month_name()
df_dates["signup_day_name"] = df_dates["signup_date"].dt.day_name()
df_dates[["signup_date", "signup_month_name", "signup_day_name"]]| signup_date | signup_month_name | signup_day_name | |
|---|---|---|---|
| 0 | 2024-01-15 | January | Monday |
| 1 | 2024-02-20 | February | Tuesday |
| 2 | 2024-03-05 | March | Tuesday |
| 3 | 2024-03-18 | March | Monday |
| 4 | 2024-04-10 | April | Wednesday |
Quarter
df_dates["signup_quarter"] = df_dates["signup_date"].dt.quarter
df_dates[["signup_date", "signup_quarter"]]| signup_date | signup_quarter | |
|---|---|---|
| 0 | 2024-01-15 | 1 |
| 1 | 2024-02-20 | 1 |
| 2 | 2024-03-05 | 1 |
| 3 | 2024-03-18 | 1 |
| 4 | 2024-04-10 | 2 |
Day of Week
Pandas represents weekdays numerically:
- Monday = 0
- Tuesday = 1
- Wednesday = 2
- Thursday = 3
- Friday = 4
- Saturday = 5
- Sunday = 6
df_dates["signup_dow"] = df_dates["signup_date"].dt.dayofweek
df_dates[["signup_date", "signup_dow"]]| signup_date | signup_dow | |
|---|---|---|
| 0 | 2024-01-15 | 0 |
| 1 | 2024-02-20 | 1 |
| 2 | 2024-03-05 | 1 |
| 3 | 2024-03-18 | 0 |
| 4 | 2024-04-10 | 2 |
Weekend Flag
This is a practical example of a derivative date column.
df_dates["is_weekend"] = (df_dates["signup_date"].dt.dayofweek >= 5).astype(int)
df_dates[["signup_date", "signup_dow", "is_weekend"]]| signup_date | signup_dow | is_weekend | |
|---|---|---|---|
| 0 | 2024-01-15 | 0 | 0 |
| 1 | 2024-02-20 | 1 | 0 |
| 2 | 2024-03-05 | 1 | 0 |
| 3 | 2024-03-18 | 0 | 0 |
| 4 | 2024-04-10 | 2 | 0 |
This is especially useful later in behavioral and transaction analysis.
Calculating Differences Between Dates
A very common analytical task is measuring how much time passed between two events.
df_dates["days_between_signup_and_purchase"] = (
df_dates["last_purchase"] - df_dates["signup_date"]
).dt.days
df_dates[["signup_date", "last_purchase", "days_between_signup_and_purchase"]]| signup_date | last_purchase | days_between_signup_and_purchase | |
|---|---|---|---|
| 0 | 2024-01-15 | 2024-06-01 | 138 |
| 1 | 2024-02-20 | 2024-06-15 | 116 |
| 2 | 2024-03-05 | 2024-07-01 | 118 |
| 3 | 2024-03-18 | 2024-07-08 | 112 |
| 4 | 2024-04-10 | 2024-07-12 | 93 |
This pattern is extremely common in analytics:
- days until first purchase
- days since prior order
- time to churn
- time between visits
Filtering by Date Conditions
Once a column is datetime, filtering becomes easy and natural.
Filter Rows After a Given Date
df_dates[df_dates["signup_date"] >= "2024-03-01"]| customer_id | signup_date | last_purchase | signup_year | signup_month | signup_day | signup_month_name | signup_day_name | signup_quarter | signup_dow | is_weekend | days_between_signup_and_purchase | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 103 | 2024-03-05 | 2024-07-01 | 2024 | 3 | 5 | March | Tuesday | 1 | 1 | 0 | 118 |
| 3 | 104 | 2024-03-18 | 2024-07-08 | 2024 | 3 | 18 | March | Monday | 1 | 0 | 0 | 112 |
| 4 | 105 | 2024-04-10 | 2024-07-12 | 2024 | 4 | 10 | April | Wednesday | 2 | 2 | 0 | 93 |
Filter a Specific Month
df_dates[df_dates["signup_date"].dt.month == 3]| customer_id | signup_date | last_purchase | signup_year | signup_month | signup_day | signup_month_name | signup_day_name | signup_quarter | signup_dow | is_weekend | days_between_signup_and_purchase | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 103 | 2024-03-05 | 2024-07-01 | 2024 | 3 | 5 | March | Tuesday | 1 | 1 | 0 | 118 |
| 3 | 104 | 2024-03-18 | 2024-07-08 | 2024 | 3 | 18 | March | Monday | 1 | 0 | 0 | 112 |
Filter a Specific Year
df_dates[df_dates["signup_date"].dt.year == 2024]| customer_id | signup_date | last_purchase | signup_year | signup_month | signup_day | signup_month_name | signup_day_name | signup_quarter | signup_dow | is_weekend | days_between_signup_and_purchase | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 101 | 2024-01-15 | 2024-06-01 | 2024 | 1 | 15 | January | Monday | 1 | 0 | 0 | 138 |
| 1 | 102 | 2024-02-20 | 2024-06-15 | 2024 | 2 | 20 | February | Tuesday | 1 | 1 | 0 | 116 |
| 2 | 103 | 2024-03-05 | 2024-07-01 | 2024 | 3 | 5 | March | Tuesday | 1 | 1 | 0 | 118 |
| 3 | 104 | 2024-03-18 | 2024-07-08 | 2024 | 3 | 18 | March | Monday | 1 | 0 | 0 | 112 |
| 4 | 105 | 2024-04-10 | 2024-07-12 | 2024 | 4 | 10 | April | Wednesday | 2 | 2 | 0 | 93 |
Pandas is flexible here because it understands date-like strings.
Sorting by Date
Sorting works correctly only when the column is in datetime format.
This becomes especially important in time-series analysis and line charts.
df_dates.sort_values("signup_date")| customer_id | signup_date | last_purchase | signup_year | signup_month | signup_day | signup_month_name | signup_day_name | signup_quarter | signup_dow | is_weekend | days_between_signup_and_purchase | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 101 | 2024-01-15 | 2024-06-01 | 2024 | 1 | 15 | January | Monday | 1 | 0 | 0 | 138 |
| 1 | 102 | 2024-02-20 | 2024-06-15 | 2024 | 2 | 20 | February | Tuesday | 1 | 1 | 0 | 116 |
| 2 | 103 | 2024-03-05 | 2024-07-01 | 2024 | 3 | 5 | March | Tuesday | 1 | 1 | 0 | 118 |
| 3 | 104 | 2024-03-18 | 2024-07-08 | 2024 | 3 | 18 | March | Monday | 1 | 0 | 0 | 112 |
| 4 | 105 | 2024-04-10 | 2024-07-12 | 2024 | 4 | 10 | April | Wednesday | 2 | 2 | 0 | 93 |
Creating Date Ranges
Sometimes we need to generate a sequence of dates.
Daily Date Range
pd.date_range(start="2024-07-01", periods=7, freq="D")DatetimeIndex(['2024-07-01', '2024-07-02', '2024-07-03', '2024-07-04',
'2024-07-05', '2024-07-06', '2024-07-07'],
dtype='datetime64[us]', freq='D')
Monthly Date Range
pd.date_range(start="2024-01-01", periods=6, freq="MS")DatetimeIndex(['2024-01-01', '2024-02-01', '2024-03-01', '2024-04-01',
'2024-05-01', '2024-06-01'],
dtype='datetime64[us]', freq='MS')
Here, MS means month start.
Common Frequency Codes
| Code | Meaning |
|---|---|
D |
daily |
W |
weekly |
MS |
month start |
M |
month end |
Q |
quarter end |
Y |
year end |
H |
hourly |
Date ranges are useful for:
- synthetic datasets
- building time-series examples
- resampling and alignment
Begining and the End of Month
We can also work with the beginning of the month (bom) and the end of the month (eom), which is very common in analytics.
This is useful when we want to:
- align dates to monthly reporting periods
- calculate monthly summaries
- label values by month start or month end
- prepare data for time-based aggregation
With Pandas, this functionality is very convenient.
import pandas as pd
some_date = pd.Timestamp("2024-07-15")
print(f"Original date: {some_date}")
print(f"Beginning of month: {some_date.to_period('M').start_time}")
print(f"End of month: {some_date.to_period('M').end_time}")Original date: 2024-07-15 00:00:00
Beginning of month: 2024-07-01 00:00:00
End of month: 2024-07-31 23:59:59.999999
Here:
to_period('M')converts the date into a monthly period.start_timereturns the first moment of that month.end_timereturns the last moment of that month
Sometimes the full timestamp is not needed, and we want just the date part.
print(f"Beginning of month (date only): {some_date.to_period('M').start_time.date()}")
print(f"End of month (date only): {some_date.to_period('M').end_time.date()}")Beginning of month (date only): 2024-07-01
End of month (date only): 2024-07-31
This is especially useful in DataFrames.
df_dates["signup_month"] = df_dates["signup_date"].dt.to_period("M").dt.start_time
df_dates["signup_month"] = df_dates["signup_date"].dt.to_period("M").dt.end_time
df_dates| customer_id | signup_date | last_purchase | signup_year | signup_month | signup_day | signup_month_name | signup_day_name | signup_quarter | signup_dow | is_weekend | days_between_signup_and_purchase | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 101 | 2024-01-15 | 2024-06-01 | 2024 | 2024-01-31 23:59:59.999999 | 15 | January | Monday | 1 | 0 | 0 | 138 |
| 1 | 102 | 2024-02-20 | 2024-06-15 | 2024 | 2024-02-29 23:59:59.999999 | 20 | February | Tuesday | 1 | 1 | 0 | 116 |
| 2 | 103 | 2024-03-05 | 2024-07-01 | 2024 | 2024-03-31 23:59:59.999999 | 5 | March | Tuesday | 1 | 1 | 0 | 118 |
| 3 | 104 | 2024-03-18 | 2024-07-08 | 2024 | 2024-03-31 23:59:59.999999 | 18 | March | Monday | 1 | 0 | 0 | 112 |
| 4 | 105 | 2024-04-10 | 2024-07-12 | 2024 | 2024-04-30 23:59:59.999999 | 10 | April | Wednesday | 2 | 2 | 0 | 93 |
This creates two new columns:
month_start= first day of the monthmonth_end= last day of the month
Why This Matters
Beginning-of-month and end-of-month functionality is helpful for tasks such as:
- monthly KPI reporting
- billing cycle analysis
- customer cohort preparation
- monthly sales aggregation
- aligning dates before plotting line charts
A Simple Summary
| Functionality | Example |
|---|---|
| beginning of month | some_date.to_period('M').start_time |
| end of month | some_date.to_period('M').end_time |
| beginning of month as date only | some_date.to_period('M').start_time.date() |
| end of month as date only | some_date.to_period('M').end_time.date() |
The end of month returned by Pandas is the last timestamp of that month, not just the last date. That is why you may see a time like:
2024-07-31 23:59:59.999999999
If you want only the calendar date, use .date().
Homework
The following tasks are designed to help you practice the main ideas from the date tutorial you developed, including datetime, timedelta, pd.to_datetime(), .dt accessors, filtering, date ranges, and beginning/end of month functionality.
Practice with the datetime Module
- Create a
dateObject and name itmy_datemy_datemust be representing March 15, 2025.- Print each component separately using f-strings:
- the full value
- the year
- the month
- the day
- Create a
datetimeObject and name itmeeting_timemeeting_timemustbe representing October 8, 2025 at 16:45:30.- Print each component separately using f-strings:
- year
- month
- day
- hour
- minute
- second
- Format Dates as Strings using
meeting_time, display the date in the following formats:YYYY-MM-DDDD/MM/YYYY- full month name, day, year
- Parse a String into a Datetime
- Convert the string
"2025-12-31"into a Python datetime object usingstrptime(). - Then print the resulting object again using f-strings.
- Convert the string
- Date Arithmetic
start = date(2025, 1, 10)end = date(2025, 2, 5)- calculate the difference between
endandstart - the number of days only
- Work with
timedeltaUsingstart, calculate and print with f-strings:- 10 days after
start - 14 days before
start - 3 weeks after
start - Then create a datetime object and add:
- 5 hours
- 20 minutes
- 45 seconds
- 10 days after
Practice with Pandas Dates
- Create a DataFrame with Date Strings
| customer_id | signup_date | last_purchase |
|---|---|---|
| 1 | 2024-01-10 | 2024-02-15 |
| 2 | 2024-02-05 | 2024-03-01 |
| 3 | 2024-03-12 | 2024-03-28 |
| 4 | 2024-04-01 | 2024-05-10 |
| 5 | 2024-05-20 | 2024-06-25 |
- Check the data types of all columns.
- Convert String Columns to Datetime
- convert
signup_dateinto proper datetime columns - convert
last_purchaseinto proper datetime columns - Then print the data types again. (you must see
pd.datetimeinstead of ) - Extract Date Components
signup_date, create the following new columns:signup_yearsignup_monthsignup_daysignup_quartersignup_month_namesignup_day_name
- Create a Day-of-Week and Weekend Flag
signup_dowusing.dt.dayofweekis_weekendwhere weekend =1and weekday =0
- Create a new column called
days_between_signup_and_purchase. It should show the number of days betweensignup_dateandlast_purchase. - Filter Data: Write code to return:
- customers who signed up after
"2024-03-01" - customers who signed up in March
- customers who signed up in 2024
- customers who signed up after
- Sort by Date: Sort the DataFrame by
signup_datefrom earliest to latest. - Generate a daily date range starting from
"2024-07-01"for 10 days. - Generate a monthly date range starting from
"2024-01-01"for 12 months using month start frequency. - Create a DataFrame with:
- a
datecolumn covering 15 daily dates starting from"2024-01-01" - a
salescolumn with any numeric values you choose - Then create the following derivative columns:
yearmonthdayday_namedowis_weekend
- a
- Beginning and End of Month for One Date for
some_date = pd.Timestamp("2024-08-18"), then print:- original date
- beginning of month
- end of month
- beginning of month as date only
- end of month as date only
- Beginning and End of Month for a Full ColumnUsing the
signup_datecolumn in your DataFrame, create two new columns:month_startmonth_end- Make sure both are derived from
signup_date.