IQR | EntropyObserver

type

status

date

slug

summary

What is IQR (Interquartile Range)?

IQR is a statistical tool used to measure the spread of the middle 50% of a dataset. It is useful because it ignores extreme values on both ends, giving a more robust understanding of the data's central tendency.

Formula:

Where:

Q1 (25th percentile): 25% of the data falls below this value

Q3 (75th percentile): 75% of the data falls below this value

The middle 50% lies between Q1 and Q3

Retail Sales Example

Let’s say you're analyzing a retail dataset where you want to understand typical purchasing behavior based on Quantity—the number of items bought per transaction.

At first glance, most transactions are small purchases. The 100 looks like an unusual bulk order—possibly:

A wholesale purchase

A data entry error (e.g., extra zero)

A return mistakenly marked as a sale

Step-by-Step: Detect Outliers Using IQR

Step 1: Sort the data (internally handled by pandas during quantile calculation)

Step 2: Calculate Q1 and Q3

Output:

Step 3: Compute IQR

Step 4: Define outlier boundaries

This tells us:

Anything < -4 or > 12 is considered an outlier

These thresholds define a “normal” range for Quantity: [-4, 12]

Step 5: Identify Outliers

Output:

All values except 100 are normal

100 is an outlier

Handling Outliers in Practice

option 1: Remove the outlier

Use this if you're doing analysis that should ignore extreme cases (e.g., understanding “typical” customer behavior).

Option 2: Cap the outlier (Winsorizing)

Use this if you don’t want to lose data but want to limit the effect of extreme values on your models (e.g., for linear regression or clustering).

Option 3: Mark the outlier for review

Use this if you want to keep the data but treat it differently in your downstream tasks or reporting.

Why Use IQR in Sales Data?

Advantages:

Feature	Why It’s Useful
Robust to extreme values	Ignores very large/small outliers that could be typos or rare cases
No need for normal distribution	Works on skewed data like retail sales (which often are right-skewed)
Easy to implement	Just a few lines of code in Python or Excel
Protects business logic	Avoids decisions based on rare, unrealistic transactions