Lazy loaded image
Business
Lazy loaded imageRFM Customer Segmentation
Words 1415Read Time 4 min
Apr 20, 2020
May 4, 2025
type
status
date
slug
summary
tags
category
icon
password

RFM Customer Segmentation

 
 
 

Step 1: Data Loading and Initial Cleaning

Goal: Prepare the dataset by removing irrelevant or problematic records.
Key Actions:
  • Load the dataset using pandas
  • Remove records with negative quantity (these are typically returns)
  • Remove rows with missing CustomerID
  • Create a new column for Amount = Quantity × UnitPrice

notion image
 

2. Data Cleaning and Preprocessing

Purpose: Ensure the dataset only contains valid, useful transaction data.
  • Drop missing customer IDs
  • Filter out cancelled transactions (InvoiceNo starting with 'C')
  • Remove exact duplicate rows
  • Remove negative Quantity and UnitPrice entries

3. Handling Outliers

Purpose: Cap extreme outlier values based on the 1st and 99th percentiles.
You defined two functions:
  • outlier_thresholds: Calculates upper and lower bounds using IQR method.
  • replace_with_threshold: Replaces outliers with upper or lower limits.

4. Creating Monetary Feature

You added a new column:
  • Amount = Quantity × UnitPrice – representing total transaction value.

5. Converting Dates and Defining Reference Date

You convert InvoiceDate to datetime and define Latest_Date as one day after the most recent transaction, for calculating recency.

6. Constructing the RFM Table

You group data by CustomerID and calculate:
  • Recency: Days since last purchase
  • Frequency: Number of unique invoices
  • Monetary: Total amount spent

notion image

7. Creating the Interpurchase Time Feature

Purpose: Add behavioral insight – average time between repeat purchases.
  • You filter out users with only one purchase.
  • Calculate Shopping_Cycle as the time between first and last purchase.
  • Calculate Interpurchase_Time as:
    • Shopping_Cycle / Frequency

notion image

8. RFM Scoring

You assigned RFM scores from 1 to 5 using qcut:
  • R_score: Lower Recency is better (more recent), so reverse the scale.
  • F_score and M_score: Higher Frequency and Monetary values are better.

Recency
Frequency
Monetary
Shopping_Cycle
Interpurchase_Time
R_score
F_score
M_score
RFM_Score
CustomerID
12347.0
2
7
4310.00
365
52
5
4
5
545
12348.0
75
4
1770.78
282
70
2
3
4
234
12352.0
36
8
1756.34
260
32
3
5
4
354
12356.0
23
3
2811.43
302
100
3
2
5
325
12358.0
2
2
1150.42
149
74
5
1
3
513
...
...
...
...
...
...
...
...
...
...
18272.0
3
6
3078.58
244
40
5
4
5
545
18273.0
2
3
204.00
255
85
5
3
1
531
18282.0
8
2
178.05
118
59
5
2
1
521
18283.0
4
16
2045.53
333
20
5
5
4
554
18287.0
43
3
1837.28
158
52
3
3
4
334
2845 rows × 9 columns
 

9. Customer Segmentation Based on RFM Scores

You defined customer segments using a custom logic:
  • CORE USER: High scores in all three dimensions
  • KEY RETENTION: Good recency and frequency
  • HIGH-VALUE CHURN: High spenders at risk
  • NEW USER: Only recency is high
  • GENERAL USER: Everyone else

notion image
 

10. Visualization

  • Customer Segment Counts
  • Heatmap of Average Monetary Spend for R-F combinations

notion image

Optional Enhancements

  • Add Tableau or Power BI dashboards to visualize segment insights
  • Replace rule-based segmentation with machine learning clustering
  • Combine RFM scores with demographic or behavioral data
  • Link segments to marketing actions (email campaigns, discounts, churn prevention)

Project Description(STAR Framework)

Project Title: Customer Segmentation Using RFM Model
  • Situation: The company needed to identify key customer groups for personalized marketing campaigns
  • Task: Build an RFM model to segment customers based on purchasing behavior
  • Action:
    • Cleaned and preprocessed over 500000 transaction records
    • Calculated Recency, Frequency, and Monetary metrics for each customer
    • Assigned scores using quantiles and created a segmentation rule set
    • Visualized segment distribution and created strategic insights
  • Result:
    • Identified top 5 customer segments including champions and at-risk groups
    • Enabled targeted campaigns that improved customer retention and boosted sales by X percent
 
So basically, I started with a dataset of online retail transactions. Each row represented a purchase, and it included things like invoice number, product code, quantity, unit price, date, customer ID, and country.
First, I cleaned the data — I removed any rows where the customer ID was missing because we can’t track those users. I also filtered out any cancelled orders, which usually have invoice numbers starting with "C", and I dropped any exact duplicates. Then I checked for negative values in quantity and price, and handled those by removing or capping them.
After that, I handled outliers. I used the 1st and 99th percentiles to set the boundaries and capped any extreme values for quantity and price to keep things more realistic. That way, a few huge or weird transactions don’t mess up the analysis.
Then I created a new column called Amount, which is just quantity multiplied by price — it tells us how much revenue came from each transaction.
Next, I calculated the core RFM values:
  • Recency is how recently a customer made a purchase, calculated based on the most recent transaction.
  • Frequency is how often they purchased, which I measured by counting the number of unique invoices.
  • Monetary is how much they spent in total.
After calculating those, I added another feature called Interpurchase Time, which is basically the average time between purchases. It gives a sense of how regularly a customer shops, and it’s only calculated for customers who made more than one purchase.
Once I had all those features, I scored each customer from 1 to 5 for Recency, Frequency, and Monetary using quantiles. Then I combined those scores into a single RFM Score, like “555” or “213”.
Based on those scores, I grouped customers into segments. For example:
  • Core Users are recent, frequent, and high-spending.
  • Key Retention are valuable but may need encouragement to stay.
  • High-Value Churn are big spenders who haven’t returned recently.
  • New Users just started buying.
  • And General Users are the rest.
Finally, I visualized everything — a bar plot to show how many customers are in each group, and a heatmap showing how average spending changes depending on Recency and Frequency combinations.
And that’s the whole RFM analysis — it helps businesses understand customer behavior, target marketing efforts, and improve retention.
 

1. How does RFM compare to unsupervised clustering like KMeans?

Answer:
RFM is rule-based and interpretable. It’s quick, simple, and business-friendly. You can clearly explain why a customer belongs to a segment.
KMeans, on the other hand, captures hidden patterns and interactions between features that RFM might miss. It works well for high-dimensional or nonlinear segmentation.
In practice, I like to use RFM for baseline segmentation and then apply clustering for further granularity.

2. Should RFM analysis be done for different product categories, like menswear vs womenswear?

Answer:
Yes. Customer behavior can vary drastically between categories. Segmenting based on all purchases might dilute important signals.
Performing RFM within categories like menswear vs womenswear can reveal insights such as which customer segments prefer which type of product, and help tailor campaigns more effectively.

3. Can RFM be integrated with customer lifetime value (LTV) models?

Answer:
Definitely. RFM is a short-term snapshot of customer behavior, while LTV forecasts long-term value. Combining both gives a more complete picture.
For example, a customer with high recency but low frequency might still have a high predicted LTV if they consistently spend big.
LTV can also help prioritize RFM segments — like focusing marketing budget on high-LTV users in retention-risk segments.

4. How would you create a Tableau dashboard for RFM to support marketing teams?

Answer:
I’d build a dashboard with filters by RFM segment, country, and product category. It would include KPIs like average order value, recent churn rates, and segment growth over time.
A heatmap of R vs F with color-coded average monetary value is also effective.
I’d also add interactivity — clicking a segment updates recommended campaign actions or customer lists.

5. How frequently should RFM scores be updated? Daily, monthly, quarterly?

Answer:
It depends on business dynamics.
  • For fast-moving e-commerce: Weekly or biweekly makes sense.
  • For retail or B2B: Monthly or quarterly might be enough.
    • More frequent updates help catch recent churn risks or newly active customers.
      I’d also monitor trends in recency to adjust campaign timing dynamically.

6. How do different RFM segments trigger different marketing strategies?

Answer:
Segment
Strategy
Core Users
VIP programs, early access, loyalty points
New Users
Welcome offers, onboarding emails
High-Value Churn Risk
Win-back emails, exclusive discounts
Low-Frequency Spenders
Frequency boost campaigns
Lapsed or Low RFM
Reactivation SMS, surveys, or remove from list
These can also be tied into marketing automation platforms like HubSpot or Salesforce to trigger automatically.

7. What are some challenges of using RFM?

Answer:
  • It assumes equal weight for R, F, M unless otherwise adjusted
  • Not suitable for businesses with irregular purchase cycles (e.g. B2B with seasonal orders)
  • Sensitive to outliers — hence I included outlier capping in preprocessing
  • Frequency of one can be misleading (new customer vs one-off buyer)

8. How could RFM be extended or improved?

Answer:
  • Add time-decayed versions of F and M to reflect recent trends
  • Integrate behavioral features like product types, channel (mobile vs web)
  • Use predictive modeling (e.g. churn prediction) on top of RFM scores
  • Combine with web/app session data for a more complete user profile
 
上一篇
Tech Stack for E-commerce Data Analysts
下一篇
RFM Analysis + KMeans Clustering