Statistical Test Decision Tree

type

status

date

slug

summary

Start with the question: Is the variable you are analyzing a discrete metric?

This means you are determining whether your data is categorical or continuous.

If the answer is Yes (Discrete / Categorical Data):

You are working with categories or proportions, such as "male/female", "success/failure", or any variable that falls into distinct groups.

First question: Is the sample size large?

If the sample size is large, you can use Pearson’s Chi-square test.

This test is suitable for determining whether there is a significant association between two categorical variables in a contingency table. It works best with larger sample sizes because it relies on expected frequencies being reasonably large (typically at least 5 per cell).

If the sample size is small, use Fisher’s exact test.

Fisher’s exact test is designed for small sample sizes, particularly for 2x2 contingency tables. It calculates the exact probability of obtaining the observed distribution under the null hypothesis.

Alternatively: Are you comparing two proportions?

This is called the 2-sample proportions test, commonly used to compare proportions between two groups (e.g., proportion of clicks between two website versions).

To determine whether a z-test for proportions is appropriate, check:

Do both np > 10 and n(1 – p) > 10?

If both conditions are satisfied, this means you can approximate the binomial distribution with the normal distribution. In this case, you can use the Z-test for proportions.

If either condition is not met, the sample is too small for a normal approximation. Use the Binomial test instead, which is exact and appropriate for small-sample binary outcomes.

If the answer is No (Continuous / Metric Data):

You are working with continuous variables, such as height, weight, temperature, or test scores.

Next: Is the sample size 30 or more?

If Yes, you are in the large sample case. The Central Limit Theorem allows you to assume normality of the sample mean, even if the data itself is not strictly normal.

Next, ask: Are the population variances known?

If variances are known, use the Z-test.

This test compares the means of two groups when the population variances are known and the data is normally distributed or the sample size is large.

If variances are not known, ask: Are the variances of the two samples similar (equal)?

If the variances are equal, use Student’s t-test.

This is the standard test for comparing the means of two independent samples under the assumption of equal variances.

If the variances are unequal, use Welch’s t-test.

This is a modification of the t-test that does not assume equal variances between groups.

If No, meaning the sample size is less than 30, normality becomes important.

Ask: Do the data follow a normal distribution?

If the data are approximately normally distributed, then:

If the variances are equal, use Student’s t-test.
If the variances are unequal, use Welch’s t-test.

If the data are not normally distributed, use the Mann–Whitney U-test.

This is a non-parametric alternative to the t-test that does not assume normality. It compares the medians of two independent samples.

Quick Matching Table

Condition	Test to Use
Categorical, large sample	Pearson’s Chi-square Test
Categorical, small sample	Fisher’s Exact Test
Two proportions, normal approximation	Z-test
Two proportions, small sample	Binomial Test
Continuous, known variances	Z-test
Continuous, unknown but equal variances	Student’s t-test
Continuous, unequal variances	Welch’s t-test
Continuous, non-normal distribution	Mann–Whitney U-test