Lazy loaded image
Mathematics
Lazy loaded imageStatistical Test Decision Tree
Words 487Read Time 2 min
May 1, 2020
May 3, 2025
type
status
date
slug
summary
tags
category
icon
password
notion image

Start with the question: Is the variable you are analyzing a discrete metric?

This means you are determining whether your data is categorical or continuous.

If the answer is Yes (Discrete / Categorical Data):

You are working with categories or proportions, such as "male/female", "success/failure", or any variable that falls into distinct groups.

First question: Is the sample size large?

  • If the sample size is large, you can use Pearson’s Chi-square test.
    • This test is suitable for determining whether there is a significant association between two categorical variables in a contingency table. It works best with larger sample sizes because it relies on expected frequencies being reasonably large (typically at least 5 per cell).
  • If the sample size is small, use Fisher’s exact test.
    • Fisher’s exact test is designed for small sample sizes, particularly for 2x2 contingency tables. It calculates the exact probability of obtaining the observed distribution under the null hypothesis.

Alternatively: Are you comparing two proportions?

This is called the 2-sample proportions test, commonly used to compare proportions between two groups (e.g., proportion of clicks between two website versions).
To determine whether a z-test for proportions is appropriate, check:
Do both np > 10 and n(1 – p) > 10?
  • If both conditions are satisfied, this means you can approximate the binomial distribution with the normal distribution. In this case, you can use the Z-test for proportions.
  • If either condition is not met, the sample is too small for a normal approximation. Use the Binomial test instead, which is exact and appropriate for small-sample binary outcomes.

If the answer is No (Continuous / Metric Data):

You are working with continuous variables, such as height, weight, temperature, or test scores.

Next: Is the sample size 30 or more?

  • If Yes, you are in the large sample case. The Central Limit Theorem allows you to assume normality of the sample mean, even if the data itself is not strictly normal.
    • Next, ask: Are the population variances known?
    • If variances are known, use the Z-test.
      • This test compares the means of two groups when the population variances are known and the data is normally distributed or the sample size is large.
    • If variances are not known, ask: Are the variances of the two samples similar (equal)?
      • If the variances are equal, use Student’s t-test.
        • This is the standard test for comparing the means of two independent samples under the assumption of equal variances.
      • If the variances are unequal, use Welch’s t-test.
        • This is a modification of the t-test that does not assume equal variances between groups.
  • If No, meaning the sample size is less than 30, normality becomes important.
    • Ask: Do the data follow a normal distribution?
    • If the data are approximately normally distributed, then:
      • If the variances are equal, use Student’s t-test.
      • If the variances are unequal, use Welch’s t-test.
    • If the data are not normally distributed, use the Mann–Whitney U-test.
      • This is a non-parametric alternative to the t-test that does not assume normality. It compares the medians of two independent samples.

Quick Matching Table

Condition
Test to Use
Categorical, large sample
Pearson’s Chi-square Test
Categorical, small sample
Fisher’s Exact Test
Two proportions, normal approximation
Z-test
Two proportions, small sample
Binomial Test
Continuous, known variances
Z-test
Continuous, unknown but equal variances
Student’s t-test
Continuous, unequal variances
Welch’s t-test
Continuous, non-normal distribution
Mann–Whitney U-test
 
上一篇
Prophet Model
下一篇
 Data Imputation