RFM Customer Segmentation

type

status

date

slug

summary

Step 1: Data Loading and Initial Cleaning

Goal: Prepare the dataset by removing irrelevant or problematic records.

Key Actions:

Load the dataset using pandas

Remove records with negative quantity (these are typically returns)

Remove rows with missing CustomerID

Create a new column for Amount = Quantity × UnitPrice

2. Data Cleaning and Preprocessing

Purpose: Ensure the dataset only contains valid, useful transaction data.

Drop missing customer IDs

Filter out cancelled transactions (InvoiceNo starting with 'C')

Remove exact duplicate rows

Remove negative Quantity and UnitPrice entries

3. Handling Outliers

Purpose: Cap extreme outlier values based on the 1st and 99th percentiles.

You defined two functions:

outlier_thresholds: Calculates upper and lower bounds using IQR method.

replace_with_threshold: Replaces outliers with upper or lower limits.

4. Creating Monetary Feature

You added a new column:

Amount = Quantity × UnitPrice – representing total transaction value.

5. Converting Dates and Defining Reference Date

You convert InvoiceDate to datetime and define Latest_Date as one day after the most recent transaction, for calculating recency.

6. Constructing the RFM Table

You group data by CustomerID and calculate:

Recency: Days since last purchase

Frequency: Number of unique invoices

Monetary: Total amount spent

7. Creating the Interpurchase Time Feature

Purpose: Add behavioral insight – average time between repeat purchases.

You filter out users with only one purchase.

Calculate Shopping_Cycle as the time between first and last purchase.

Calculate Interpurchase_Time as:

Shopping_Cycle / Frequency

8. RFM Scoring

You assigned RFM scores from 1 to 5 using qcut:

R_score: Lower Recency is better (more recent), so reverse the scale.

F_score and M_score: Higher Frequency and Monetary values are better.

ㅤ	Recency	Frequency	Monetary	Shopping_Cycle	Interpurchase_Time	R_score	F_score	M_score	RFM_Score
CustomerID	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ
12347.0	2	7	4310.00	365	52	5	4	5	545
12348.0	75	4	1770.78	282	70	2	3	4	234
12352.0	36	8	1756.34	260	32	3	5	4	354
12356.0	23	3	2811.43	302	100	3	2	5	325
12358.0	2	2	1150.42	149	74	5	1	3	513
...	...	...	...	...	...	...	...	...	...
18272.0	3	6	3078.58	244	40	5	4	5	545
18273.0	2	3	204.00	255	85	5	3	1	531
18282.0	8	2	178.05	118	59	5	2	1	521
18283.0	4	16	2045.53	333	20	5	5	4	554
18287.0	43	3	1837.28	158	52	3	3	4	334

2845 rows × 9 columns

9. Customer Segmentation Based on RFM Scores

You defined customer segments using a custom logic:

CORE USER: High scores in all three dimensions

KEY RETENTION: Good recency and frequency

HIGH-VALUE CHURN: High spenders at risk

NEW USER: Only recency is high

GENERAL USER: Everyone else

10. Visualization

Customer Segment Counts

Heatmap of Average Monetary Spend for R-F combinations

Optional Enhancements

Add Tableau or Power BI dashboards to visualize segment insights

Replace rule-based segmentation with machine learning clustering

Combine RFM scores with demographic or behavioral data

Link segments to marketing actions (email campaigns, discounts, churn prevention)

Project Description(STAR Framework)

Project Title: Customer Segmentation Using RFM Model

Situation: The company needed to identify key customer groups for personalized marketing campaigns

Task: Build an RFM model to segment customers based on purchasing behavior

Action:

Cleaned and preprocessed over 500000 transaction records
Calculated Recency, Frequency, and Monetary metrics for each customer
Assigned scores using quantiles and created a segmentation rule set
Visualized segment distribution and created strategic insights

Result:

Identified top 5 customer segments including champions and at-risk groups
Enabled targeted campaigns that improved customer retention and boosted sales by X percent

So basically, I started with a dataset of online retail transactions. Each row represented a purchase, and it included things like invoice number, product code, quantity, unit price, date, customer ID, and country.

First, I cleaned the data — I removed any rows where the customer ID was missing because we can’t track those users. I also filtered out any cancelled orders, which usually have invoice numbers starting with "C", and I dropped any exact duplicates. Then I checked for negative values in quantity and price, and handled those by removing or capping them.

After that, I handled outliers. I used the 1st and 99th percentiles to set the boundaries and capped any extreme values for quantity and price to keep things more realistic. That way, a few huge or weird transactions don’t mess up the analysis.

Then I created a new column called Amount, which is just quantity multiplied by price — it tells us how much revenue came from each transaction.

Next, I calculated the core RFM values:

Recency is how recently a customer made a purchase, calculated based on the most recent transaction.

Frequency is how often they purchased, which I measured by counting the number of unique invoices.

Monetary is how much they spent in total.

After calculating those, I added another feature called Interpurchase Time, which is basically the average time between purchases. It gives a sense of how regularly a customer shops, and it’s only calculated for customers who made more than one purchase.

Once I had all those features, I scored each customer from 1 to 5 for Recency, Frequency, and Monetary using quantiles. Then I combined those scores into a single RFM Score, like “555” or “213”.

Based on those scores, I grouped customers into segments. For example:

Core Users are recent, frequent, and high-spending.

Key Retention are valuable but may need encouragement to stay.

High-Value Churn are big spenders who haven’t returned recently.

New Users just started buying.

And General Users are the rest.

Finally, I visualized everything — a bar plot to show how many customers are in each group, and a heatmap showing how average spending changes depending on Recency and Frequency combinations.

And that’s the whole RFM analysis — it helps businesses understand customer behavior, target marketing efforts, and improve retention.

1. How does RFM compare to unsupervised clustering like KMeans?

Answer:

RFM is rule-based and interpretable. It’s quick, simple, and business-friendly. You can clearly explain why a customer belongs to a segment.

KMeans, on the other hand, captures hidden patterns and interactions between features that RFM might miss. It works well for high-dimensional or nonlinear segmentation.

In practice, I like to use RFM for baseline segmentation and then apply clustering for further granularity.

2. Should RFM analysis be done for different product categories, like menswear vs womenswear?

Answer:

Yes. Customer behavior can vary drastically between categories. Segmenting based on all purchases might dilute important signals.

Performing RFM within categories like menswear vs womenswear can reveal insights such as which customer segments prefer which type of product, and help tailor campaigns more effectively.

3. Can RFM be integrated with customer lifetime value (LTV) models?

Answer:

Definitely. RFM is a short-term snapshot of customer behavior, while LTV forecasts long-term value. Combining both gives a more complete picture.

For example, a customer with high recency but low frequency might still have a high predicted LTV if they consistently spend big.

LTV can also help prioritize RFM segments — like focusing marketing budget on high-LTV users in retention-risk segments.

4. How would you create a Tableau dashboard for RFM to support marketing teams?

Answer:

I’d build a dashboard with filters by RFM segment, country, and product category. It would include KPIs like average order value, recent churn rates, and segment growth over time.

A heatmap of R vs F with color-coded average monetary value is also effective.

I’d also add interactivity — clicking a segment updates recommended campaign actions or customer lists.

5. How frequently should RFM scores be updated? Daily, monthly, quarterly?

Answer:

It depends on business dynamics.

For fast-moving e-commerce: Weekly or biweekly makes sense.

For retail or B2B: Monthly or quarterly might be enough.

More frequent updates help catch recent churn risks or newly active customers.

I’d also monitor trends in recency to adjust campaign timing dynamically.

6. How do different RFM segments trigger different marketing strategies?

Answer:

Segment	Strategy
Core Users	VIP programs, early access, loyalty points
New Users	Welcome offers, onboarding emails
High-Value Churn Risk	Win-back emails, exclusive discounts
Low-Frequency Spenders	Frequency boost campaigns
Lapsed or Low RFM	Reactivation SMS, surveys, or remove from list