Data Analytics Bootcamp
  • Syllabus
  • Statistical Thinking
  • SQL
  • Python
  • Tableau
  • Lab
  • Capstone
  1. Python
  2. Python
  3. Session 09: Customer Segmentation | RFM
  • Syllabus
  • Statistical Thinking
    • Statistics
      • Statistics Session 01: Data Layers and Bias in Data
      • Statistics Session 02: Data Types
      • Statistics Session 03: Probabilistic Distributions
      • Statistics Session 04: Probabilistic Distributions
      • Statistics Session 05: Sampling
      • Statistics Session 06: Inferential Statistics
      • Slides
        • Course Intro
        • Descriptive Stats
        • Data Types
        • Continuous Distributions
        • Discrete Distributions
        • Sampling
        • Hypothesis Testing
  • SQL
    • SQL
      • Session 01: Intro to Relational Databases
      • Session 02: Intro to PostgreSQL
      • Session 03: DA with SQL | Data Types & Constraints
      • Session 04: DA with SQL | Filtering
      • Session 05: DA with SQL | Numeric Functions
      • Session 06: DA with SQL | String Functions
      • Session 07: DA with SQL | Date Functions
      • Session 08: DA with SQL | JOINs
      • Session 09: DA with SQL | Advanced SQL
      • Session 10: DA with SQL | Advanced SQL Functions
      • Session 11: DA with SQL | UDFs, Stored Procedures
      • Session 12: DA with SQL | Advanced Aggregations
      • Session 13: DA with SQL | Final Project
      • Slides
        • Intro to Relational Databases
        • Intro to PostgreSQL
        • Basic Queries: DDL DLM
        • Filtering
        • Numeric Functions
        • String Functions
        • Date Functions
        • Normalization and JOINs
        • Temporary Tables
        • Advanced SQL Functions
        • Reporting and Analysis with SQL
        • Advanced Aggregations
  • Python
    • Python
      • Session 01: Programming for Data Analysts
      • Session 02: Python basic Syntax, Data Structures
      • Session 03: Introduction to Pandas
      • Session 04: Advanced Pandas
      • Session 05: Intro to Data Visualization
      • Session 06: Data Visualization
      • Session 07: Working with Dates
      • Session 08: Data Visualization | Plotly
      • Session 09: Customer Segmentation | RFM
      • Slides
        • Data Analyst
  • Tableau
    • Tableau
      • Tableau Session 01: Introduction to Tableau
      • Tableau Session 02: Intermediate Visual Analytics
      • Tableau Session 03: Advanced Analytics
      • Tableau Session 04: Dashboard Design & Performance
      • Slides
        • Data Analyst
        • Data Analyst
        • Data Analyst
        • Data Analyst

On this page

  • RFM
    • Downloading the Data
    • csv_downloader()
    • Customer Segmentation
    • Why Segment Customers?
    • What is RFM
    • The Components of RFM
    • RFM Score
    • Loading Packages and Data
    • Revenue Distribution
    • R, F, M Scores
    • RFM Scores and Distributions
    • RFM Score and RFM Segment
    • Finding Top Customers
    • Number of RFM Segments
    • Naming the Segments
    • Segment Distribution
    • Segments’ Distribution
    • Segments’ Visualization
  1. Python
  2. Python
  3. Session 09: Customer Segmentation | RFM

Session 09: Customer Segmentation | RFM

Plotly
Data Visualization
Segmentation
RFM

RFM

Downloading the Data

Before starting this exercise first let’s download the data, which you can find here

Tip

Try to download this data the way have learnt during the previous sessions.

Let’s also try to create a function named csv_downloader(URL, name, path) which will take URL, name and path as an arguments

URL = 'https://raw.githubusercontent.com/hovhannisyan91/data_analytics_with_python/refs/heads/main/data/rfm/orders.csv'
df = pd.read_csv(URL)

csv_downloader()

def csv_downloader(URL:str, name:str, path:str)->pd.Dataframe:
    df = pd.read_csv(URL)
    df.to_csv(f'{path}/{name}.csv', index = False)
    print(f'The {name}.csv was successfully stored in {path}')
    return df

Now we can use this to make our operations faster and efficient.

csv_downloader(URL = URL, name = 'orders',path = '../data')

Customer Segmentation

Customer segmentation can be defined as the practice of dividing a customer base into groups of individuals that are similar in specific ways relevant to marketing.

Segmentation assumes:

  1. Identify segmentation bases and segment the market
  2. Develop a profile of the resulting segments

Why Segment Customers?

In short For differentiation!

In Marketing and Service:

  • Make more focused/targeted marketing
  • Identify the most profitable and at-risk customers
  • Build relationships
  • Create user personas

In Product and Brand:

  • Brand to appeal to particular segments
  • Customize products and services
  • Predict future purchasing patterns

In Pricing:

  • Pricing products by groups
  • Determine willingness to pay for optimal value

What is RFM

RFM (recency, frequency, monetary) analysis is a behavior-based technique used to segment customers by examining their transaction history.

  • how recently a customer has purchased (Recency)
  • how often they purchase (Frequency)
  • how much the customer spends (Monetary)

RFM model is a straightforward yet effective method for segmentation, especially in retail.

RFM helps to identify customers who are more likely to respond to promotions by segmenting them into various categories.

The Components of RFM

  • A Recency Score is assigned to each customer based on the most recent purchase date. The score is generated by binning the recency values into several categories. For example, suppose one uses four categories. In that case, the customers with the most recent purchase dates receive a recency ranking of 4, and those with purchase dates in the distant past receive a recency ranking of 1.
  • A Frequency Score is assigned similarly. Customers with high purchase frequency are assigned a higher score (4 or 5), and those with the lowest frequency score are assigned 1.
  • A Monetary Score is assigned based on the total revenue generated by the customer in the period under consideration for the analysis. Customers with the highest revenue or order amount are assigned a higher score, while those with the lowest ones are assigned a score of 1.

RFM Score

RFM score is generated by simply combining the above mentioned components or scores into a single value.

RFM Score could be calculated as:

  • Average of the three scores
  • Weighted average
  • Simple concatenation

Loading Packages and Data

import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

pd.set_option('display.max_columns', None)
px.defaults.template = "plotly_white"
orders = pd.read_csv('../data/rfm/orders.csv', parse_dates=['order_date'])
orders.head()
customer_id order_date revenue
0 Mr. Brion Stark Sr. 2004-12-20 32
1 Ethyl Botsford 2005-05-02 36
2 Hosteen Jacobi 2004-03-06 116
3 Mr. Edw Frami 2006-03-15 99
4 Josef Lemke 2006-08-14 76

Revenue Distribution

fig = px.histogram(
    orders,
    x='revenue',
    nbins=30,
    title='Histogram of Revenue'
)

fig.update_layout(
    xaxis_title='Revenue',
    yaxis_title='Count'
)

fig.show()

R, F, M Scores

Let us create R, F, and M columns.

max_date = orders['order_date'].max()

rfm = orders.groupby('customer_id').agg(
    {
        'order_date': lambda date: (max_date - date.max()).days,
        'customer_id': lambda num: len(num),
        'revenue': lambda price: price.sum()
    }
)

rfm.columns = ['Recency', 'Frequency', 'Monetary']
rfm.reset_index(inplace=True)
rfm.head()
customer_id Recency Frequency Monetary
0 Abbey O'Reilly DVM 204 6 472
1 Add Senger 139 3 340
2 Aden Lesch Sr. 193 4 405
3 Admiral Senger 131 5 448
4 Agness O'Keefe 89 9 843
Important

Think about a reasons we used lambda function…

Let us create 5 categories based on the quartiles for each RFM variable using pd.qcut() method

rfm['R'] = pd.qcut(rfm['Recency'], 4, ['4', '3', '2', '1'])
rfm['F'] = pd.qcut(rfm['Frequency'], 4, ['1', '2', '3', '4'])
rfm['M'] = pd.qcut(rfm['Monetary'], 4, ['1', '2', '3', '4'])
Important

Pay attention to the order of Frequency \((\downarrow)\).

RFM Scores and Distributions

rfm.head()
customer_id Recency Frequency Monetary R F M
0 Abbey O'Reilly DVM 204 6 472 3 3 3
1 Add Senger 139 3 340 3 1 2
2 Aden Lesch Sr. 193 4 405 3 2 2
3 Admiral Senger 131 5 448 4 2 3
4 Agness O'Keefe 89 9 843 4 4 4

Distributions of R, F, M

from plotly.subplots import make_subplots
import plotly.graph_objects as go

fig = make_subplots(
    rows=1,
    cols=3,
    subplot_titles=("Recency", "Frequency", "Monetary")
)

fig.add_trace(
    go.Histogram(x=rfm["Recency"], nbinsx=30, name="Recency"),
    row=1,
    col=1
)

fig.add_trace(
    go.Histogram(x=rfm["Frequency"], nbinsx=30, name="Frequency"),
    row=1,
    col=2
)

fig.add_trace(
    go.Histogram(x=rfm["Monetary"], nbinsx=30, name="Monetary"),
    row=1,
    col=3
)

fig.update_layout(
    title="",
    showlegend=False
)

fig.update_xaxes(title_text="Recency", row=1, col=1)
fig.update_xaxes(title_text="Frequency", row=1, col=2)
fig.update_xaxes(title_text="Monetary", row=1, col=3)

fig.update_yaxes(title_text="Count", row=1, col=1)
fig.update_yaxes(title_text="Count", row=1, col=2)
fig.update_yaxes(title_text="Count", row=1, col=3)

fig.show()

RFM Score and RFM Segment

  • RFM Segments: converting R, F, M scores to string and concatenating
  • RFM Score: sum of the R, F, M scores
rfm['RFM_Score'] = rfm.R.astype(int) + rfm.F.astype(int) + rfm.M.astype(int)
rfm['RFM_Segment'] = rfm.R.astype(str) + rfm.F.astype(str) + rfm.M.astype(str)

rfm.head()
customer_id Recency Frequency Monetary R F M RFM_Score RFM_Segment
0 Abbey O'Reilly DVM 204 6 472 3 3 3 9 333
1 Add Senger 139 3 340 3 1 2 6 312
2 Aden Lesch Sr. 193 4 405 3 2 2 7 322
3 Admiral Senger 131 5 448 4 2 3 9 423
4 Agness O'Keefe 89 9 843 4 4 4 12 444

Finding Top Customers

Sorting RFM_Segment in descending order, we will get the list of top customers.

rfm = rfm.sort_values('RFM_Segment', ascending=False)
rfm.head()
customer_id Recency Frequency Monetary R F M RFM_Score RFM_Segment
4 Agness O'Keefe 89 9 843 4 4 4 12 444
5 Aileen Barton 83 9 763 4 4 4 12 444
40 Ariel Yundt 125 10 1069 4 4 4 12 444
50 Aurthur Kirlin II 115 9 856 4 4 4 12 444
51 Auther Haley 56 9 878 4 4 4 12 444

Number of RFM Segments

rfm_segments = rfm.groupby('RFM_Segment').count().reset_index(level=0)
rfm_segments.head()
RFM_Segment customer_id Recency Frequency Monetary R F M RFM_Score
0 111 90 90 90 90 90 90 90 90
1 112 31 31 31 31 31 31 31 31
2 113 6 6 6 6 6 6 6 6
3 121 11 11 11 11 11 11 11 11
4 122 29 29 29 29 29 29 29 29
fig = px.bar(
    rfm_segments.sort_values('customer_id'),
    x='customer_id',
    y='RFM_Segment',
    orientation='h',
    title='Number of Customers by RFM Segment'
)

fig.update_layout(
    xaxis_title='Number of Customers',
    yaxis_title='RFM Segment'
)

fig.show()

Naming the Segments

The next step for customer segmentation is naming those very segments. The number of segments depends on the scope of the marketing campaign. Here naming() function just contains a set of conditional rules, which comes from classical segmentation.

def naming(df):
    if df['RFM_Score'] >= 9:
        return 'Can\'t Loose Them'
    elif ((df['RFM_Score'] >= 8) and (df['RFM_Score'] < 9)):
        return 'Champions'
    elif ((df['RFM_Score'] >= 7) and (df['RFM_Score'] < 8)):
        return 'Loyal/Commited'
    elif ((df['RFM_Score'] >= 6) and (df['RFM_Score'] < 7)):
        return 'Potential'
    elif ((df['RFM_Score'] >= 5) and (df['RFM_Score'] < 6)):
        return 'Promising'
    elif ((df['RFM_Score'] >= 4) and (df['RFM_Score'] < 5)):
        return 'Requires Attention'
    else:
        return 'Demands Activation'

Now let us apply the above-defined function by creating Segment_Name column.

rfm['Segment_Name'] = rfm.apply(naming, axis=1)
rfm.head(5)
customer_id Recency Frequency Monetary R F M RFM_Score RFM_Segment Segment_Name
4 Agness O'Keefe 89 9 843 4 4 4 12 444 Can't Loose Them
5 Aileen Barton 83 9 763 4 4 4 12 444 Can't Loose Them
40 Ariel Yundt 125 10 1069 4 4 4 12 444 Can't Loose Them
50 Aurthur Kirlin II 115 9 856 4 4 4 12 444 Can't Loose Them
51 Auther Haley 56 9 878 4 4 4 12 444 Can't Loose Them

Segment Distribution

grouped_by = rfm.groupby('Segment_Name').agg({
    'Recency': 'mean',
    'Frequency': 'mean',
    'Monetary': ['mean', 'count']
}).round(1)

Segments’ Distribution

grouped_by
Recency Frequency Monetary
mean mean mean count
Segment_Name
Can't Loose Them 166.6 7.2 707.2 335
Champions 212.0 5.0 489.6 114
Demands Activation 600.5 1.9 151.0 90
Loyal/Commited 275.5 4.7 429.3 147
Potential 286.8 3.9 352.1 132
Promising 361.4 3.4 294.8 97
Requires Attention 460.7 2.8 246.0 80

Segments’ Visualization

Using Plotly, we can visualize segments with a treemap.

grouped_by.columns = ['RecencyMean', 'FrequencyMean', 'MonetaryMean', 'Count']
grouped_by = grouped_by.reset_index()

grouped_by
Segment_Name RecencyMean FrequencyMean MonetaryMean Count
0 Can't Loose Them 166.6 7.2 707.2 335
1 Champions 212.0 5.0 489.6 114
2 Demands Activation 600.5 1.9 151.0 90
3 Loyal/Commited 275.5 4.7 429.3 147
4 Potential 286.8 3.9 352.1 132
5 Promising 361.4 3.4 294.8 97
6 Requires Attention 460.7 2.8 246.0 80
fig = px.treemap(
    grouped_by,
    path=['Segment_Name'],
    values='Count',
    color='MonetaryMean',
    hover_data=['RecencyMean', 'FrequencyMean', 'MonetaryMean'],
    title='RFM Segments'
)

fig.show()