type
status
date
slug
summary
tags
category
icon
password
RFM Customer Segmentation
Step 1: Data Loading and Initial Cleaning
Goal: Prepare the dataset by removing irrelevant or problematic records.
Key Actions:
- Load the dataset using pandas
- Remove records with negative quantity (these are typically returns)
- Remove rows with missing
CustomerID
- Create a new column for
Amount = Quantity × UnitPrice

2. Data Cleaning and Preprocessing
Purpose: Ensure the dataset only contains valid, useful transaction data.
- Drop missing customer IDs
- Filter out cancelled transactions (InvoiceNo starting with 'C')
- Remove exact duplicate rows
- Remove negative Quantity and UnitPrice entries
3. Handling Outliers
Purpose: Cap extreme outlier values based on the 1st and 99th percentiles.
You defined two functions:
outlier_thresholds
: Calculates upper and lower bounds using IQR method.
replace_with_threshold
: Replaces outliers with upper or lower limits.
4. Creating Monetary Feature
You added a new column:
- Amount = Quantity × UnitPrice – representing total transaction value.
5. Converting Dates and Defining Reference Date
You convert
InvoiceDate
to datetime and define Latest_Date
as one day after the most recent transaction, for calculating recency.6. Constructing the RFM Table
You group data by CustomerID and calculate:
- Recency: Days since last purchase
- Frequency: Number of unique invoices
- Monetary: Total amount spent

7. Creating the Interpurchase Time Feature
Purpose: Add behavioral insight – average time between repeat purchases.
- You filter out users with only one purchase.
- Calculate Shopping_Cycle as the time between first and last purchase.
- Calculate Interpurchase_Time as:
Shopping_Cycle / Frequency

8. RFM Scoring
You assigned RFM scores from 1 to 5 using
qcut
:- R_score: Lower Recency is better (more recent), so reverse the scale.
- F_score and M_score: Higher Frequency and Monetary values are better.
ㅤ | Recency | Frequency | Monetary | Shopping_Cycle | Interpurchase_Time | R_score | F_score | M_score | RFM_Score |
CustomerID | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ |
12347.0 | 2 | 7 | 4310.00 | 365 | 52 | 5 | 4 | 5 | 545 |
12348.0 | 75 | 4 | 1770.78 | 282 | 70 | 2 | 3 | 4 | 234 |
12352.0 | 36 | 8 | 1756.34 | 260 | 32 | 3 | 5 | 4 | 354 |
12356.0 | 23 | 3 | 2811.43 | 302 | 100 | 3 | 2 | 5 | 325 |
12358.0 | 2 | 2 | 1150.42 | 149 | 74 | 5 | 1 | 3 | 513 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
18272.0 | 3 | 6 | 3078.58 | 244 | 40 | 5 | 4 | 5 | 545 |
18273.0 | 2 | 3 | 204.00 | 255 | 85 | 5 | 3 | 1 | 531 |
18282.0 | 8 | 2 | 178.05 | 118 | 59 | 5 | 2 | 1 | 521 |
18283.0 | 4 | 16 | 2045.53 | 333 | 20 | 5 | 5 | 4 | 554 |
18287.0 | 43 | 3 | 1837.28 | 158 | 52 | 3 | 3 | 4 | 334 |
2845 rows × 9 columns
9. Customer Segmentation Based on RFM Scores
You defined customer segments using a custom logic:
- CORE USER: High scores in all three dimensions
- KEY RETENTION: Good recency and frequency
- HIGH-VALUE CHURN: High spenders at risk
- NEW USER: Only recency is high
- GENERAL USER: Everyone else

10. Visualization
- Customer Segment Counts
- Heatmap of Average Monetary Spend for R-F combinations

Optional Enhancements
- Add Tableau or Power BI dashboards to visualize segment insights
- Replace rule-based segmentation with machine learning clustering
- Combine RFM scores with demographic or behavioral data
- Link segments to marketing actions (email campaigns, discounts, churn prevention)
Project Description(STAR Framework)
Project Title: Customer Segmentation Using RFM Model
- Situation: The company needed to identify key customer groups for personalized marketing campaigns
- Task: Build an RFM model to segment customers based on purchasing behavior
- Action:
- Cleaned and preprocessed over 500000 transaction records
- Calculated Recency, Frequency, and Monetary metrics for each customer
- Assigned scores using quantiles and created a segmentation rule set
- Visualized segment distribution and created strategic insights
- Result:
- Identified top 5 customer segments including champions and at-risk groups
- Enabled targeted campaigns that improved customer retention and boosted sales by X percent
So basically, I started with a dataset of online retail transactions. Each row represented a purchase, and it included things like invoice number, product code, quantity, unit price, date, customer ID, and country.
First, I cleaned the data — I removed any rows where the customer ID was missing because we can’t track those users. I also filtered out any cancelled orders, which usually have invoice numbers starting with "C", and I dropped any exact duplicates. Then I checked for negative values in quantity and price, and handled those by removing or capping them.
After that, I handled outliers. I used the 1st and 99th percentiles to set the boundaries and capped any extreme values for quantity and price to keep things more realistic. That way, a few huge or weird transactions don’t mess up the analysis.
Then I created a new column called Amount, which is just quantity multiplied by price — it tells us how much revenue came from each transaction.
Next, I calculated the core RFM values:
- Recency is how recently a customer made a purchase, calculated based on the most recent transaction.
- Frequency is how often they purchased, which I measured by counting the number of unique invoices.
- Monetary is how much they spent in total.
After calculating those, I added another feature called Interpurchase Time, which is basically the average time between purchases. It gives a sense of how regularly a customer shops, and it’s only calculated for customers who made more than one purchase.
Once I had all those features, I scored each customer from 1 to 5 for Recency, Frequency, and Monetary using quantiles. Then I combined those scores into a single RFM Score, like “555” or “213”.
Based on those scores, I grouped customers into segments. For example:
- Core Users are recent, frequent, and high-spending.
- Key Retention are valuable but may need encouragement to stay.
- High-Value Churn are big spenders who haven’t returned recently.
- New Users just started buying.
- And General Users are the rest.
Finally, I visualized everything — a bar plot to show how many customers are in each group, and a heatmap showing how average spending changes depending on Recency and Frequency combinations.
And that’s the whole RFM analysis — it helps businesses understand customer behavior, target marketing efforts, and improve retention.
1. How does RFM compare to unsupervised clustering like KMeans?
Answer:
RFM is rule-based and interpretable. It’s quick, simple, and business-friendly. You can clearly explain why a customer belongs to a segment.
KMeans, on the other hand, captures hidden patterns and interactions between features that RFM might miss. It works well for high-dimensional or nonlinear segmentation.
In practice, I like to use RFM for baseline segmentation and then apply clustering for further granularity.
2. Should RFM analysis be done for different product categories, like menswear vs womenswear?
Answer:
Yes. Customer behavior can vary drastically between categories. Segmenting based on all purchases might dilute important signals.
Performing RFM within categories like menswear vs womenswear can reveal insights such as which customer segments prefer which type of product, and help tailor campaigns more effectively.
3. Can RFM be integrated with customer lifetime value (LTV) models?
Answer:
Definitely. RFM is a short-term snapshot of customer behavior, while LTV forecasts long-term value. Combining both gives a more complete picture.
For example, a customer with high recency but low frequency might still have a high predicted LTV if they consistently spend big.
LTV can also help prioritize RFM segments — like focusing marketing budget on high-LTV users in retention-risk segments.
4. How would you create a Tableau dashboard for RFM to support marketing teams?
Answer:
I’d build a dashboard with filters by RFM segment, country, and product category. It would include KPIs like average order value, recent churn rates, and segment growth over time.
A heatmap of R vs F with color-coded average monetary value is also effective.
I’d also add interactivity — clicking a segment updates recommended campaign actions or customer lists.
5. How frequently should RFM scores be updated? Daily, monthly, quarterly?
Answer:
It depends on business dynamics.
- For fast-moving e-commerce: Weekly or biweekly makes sense.
- For retail or B2B: Monthly or quarterly might be enough.
More frequent updates help catch recent churn risks or newly active customers.
I’d also monitor trends in recency to adjust campaign timing dynamically.
6. How do different RFM segments trigger different marketing strategies?
Answer:
Segment | Strategy |
Core Users | VIP programs, early access, loyalty points |
New Users | Welcome offers, onboarding emails |
High-Value Churn Risk | Win-back emails, exclusive discounts |
Low-Frequency Spenders | Frequency boost campaigns |
Lapsed or Low RFM | Reactivation SMS, surveys, or remove from list |
These can also be tied into marketing automation platforms like HubSpot or Salesforce to trigger automatically.
7. What are some challenges of using RFM?
Answer:
- It assumes equal weight for R, F, M unless otherwise adjusted
- Not suitable for businesses with irregular purchase cycles (e.g. B2B with seasonal orders)
- Sensitive to outliers — hence I included outlier capping in preprocessing
- Frequency of one can be misleading (new customer vs one-off buyer)
8. How could RFM be extended or improved?
Answer:
- Add time-decayed versions of F and M to reflect recent trends
- Integrate behavioral features like product types, channel (mobile vs web)
- Use predictive modeling (e.g. churn prediction) on top of RFM scores
- Combine with web/app session data for a more complete user profile
- Author:Entropyobserver
- URL:https://tangly1024.com/article/1dbd698f-3512-80bf-b447-d728f8e783e5
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!