type
status
date
slug
summary
tags
category
icon
password

Start with the question: Is the variable you are analyzing a discrete metric?
This means you are determining whether your data is categorical or continuous.
If the answer is Yes (Discrete / Categorical Data):
You are working with categories or proportions, such as "male/female", "success/failure", or any variable that falls into distinct groups.
First question: Is the sample size large?
- If the sample size is large, you can use Pearson’s Chi-square test.
This test is suitable for determining whether there is a significant association between two categorical variables in a contingency table. It works best with larger sample sizes because it relies on expected frequencies being reasonably large (typically at least 5 per cell).
- If the sample size is small, use Fisher’s exact test.
Fisher’s exact test is designed for small sample sizes, particularly for 2x2 contingency tables. It calculates the exact probability of obtaining the observed distribution under the null hypothesis.
Alternatively: Are you comparing two proportions?
This is called the 2-sample proportions test, commonly used to compare proportions between two groups (e.g., proportion of clicks between two website versions).
To determine whether a z-test for proportions is appropriate, check:
Do both np > 10 and n(1 – p) > 10?
- If both conditions are satisfied, this means you can approximate the binomial distribution with the normal distribution. In this case, you can use the Z-test for proportions.
- If either condition is not met, the sample is too small for a normal approximation. Use the Binomial test instead, which is exact and appropriate for small-sample binary outcomes.
If the answer is No (Continuous / Metric Data):
You are working with continuous variables, such as height, weight, temperature, or test scores.
Next: Is the sample size 30 or more?
- If Yes, you are in the large sample case. The Central Limit Theorem allows you to assume normality of the sample mean, even if the data itself is not strictly normal.
- If variances are known, use the Z-test.
- If variances are not known, ask: Are the variances of the two samples similar (equal)?
- If the variances are equal, use Student’s t-test.
- If the variances are unequal, use Welch’s t-test.
Next, ask: Are the population variances known?
This test compares the means of two groups when the population variances are known and the data is normally distributed or the sample size is large.
This is the standard test for comparing the means of two independent samples under the assumption of equal variances.
This is a modification of the t-test that does not assume equal variances between groups.
- If No, meaning the sample size is less than 30, normality becomes important.
- If the data are approximately normally distributed, then:
- If the variances are equal, use Student’s t-test.
- If the variances are unequal, use Welch’s t-test.
- If the data are not normally distributed, use the Mann–Whitney U-test.
Ask: Do the data follow a normal distribution?
This is a non-parametric alternative to the t-test that does not assume normality. It compares the medians of two independent samples.
Quick Matching Table
Condition | Test to Use |
Categorical, large sample | Pearson’s Chi-square Test |
Categorical, small sample | Fisher’s Exact Test |
Two proportions, normal approximation | Z-test |
Two proportions, small sample | Binomial Test |
Continuous, known variances | Z-test |
Continuous, unknown but equal variances | Student’s t-test |
Continuous, unequal variances | Welch’s t-test |
Continuous, non-normal distribution | Mann–Whitney U-test |
- Author:Entropyobserver
- URL:https://tangly1024.com/article/1e6d698f-3512-801f-b8dc-e9c05ecc5ddc
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!