A Complete Guide to A/B Testing: Analyzing Webpage Design Impact on Conversion Rates

type

status

date

slug

summary

1. Project Objective

The main goal of this A/B testing project was to evaluate if the implementation of a new webpage design would increase conversion rates compared to the existing design. The specific objectives were:

Primary Metric: Measure conversion rate (percentage of users completing the desired action)

Secondary Analysis: Calculate relative improvement and determine statistical significance

Business Impact: Offer actionable insights to optimize the website design based on data

2. Hypothesis Formation

The hypotheses were formulated based on design principles and expected improvements in user experience:

Null Hypothesis (H₀): There is no difference in conversion rates between the control group (old design) and the treatment group (new design).

Alternative Hypothesis (H₁): The new webpage design (treatment group) leads to a significantly different conversion rate than the old design (control group).

Expected Outcome: The assumption was that the new design, through improved user interface elements, better navigation, and more appealing visuals, would result in higher user engagement and increased conversions.

3. Experimental Design

Variables and Groups

Independent Variable: Webpage design version (Control vs. Treatment)

Dependent Variable: Conversion status (Binary: 0 = No conversion, 1 = Conversion)

Control Group: Users exposed to the old webpage design

Treatment Group: Users exposed to the new webpage design

Data Structure

Each user's data consisted of the following variables:

user_id: A unique identifier for each participant

group: Indicates assignment to either "control" or "treatment"

landing_page: Identifies the page version ("old_page" or "new_page")

converted: A binary outcome indicating whether the user converted

4. Data Collection and Quality Assurance

Initial Data Assessment

The original dataset contained 294,480 records with some data quality issues. After performing a thorough data quality check:

Data Quality Issues Analysis

Duplicate User IDs (3,895 instances)

Issue: Some users were recorded multiple times in the dataset.

Impact: Duplicate entries could lead to overrepresentation of certain users, skewing results.

Solution: Retained only the first occurrence of each user.

Mismatched Group-Page Combinations (3,893 instances)

Issue: Some users in the treatment group were shown the old page, and vice versa for the control group.

Impact: These mismatched records compromised the integrity of the experiment.

Solution: Removed all mismatched data entries to ensure proper group assignment.

Data Cleaning Process

After cleaning, the dataset was reduced from 294,480 to 290,585 unique users, ensuring that all data points adhered to proper group-page alignment.

5. Results and Analysis

Conversion Rate Calculation

The conversion performance was summarized in the following table:

Group	Total Users	Conversions	Conversion Rate
Control	145,274	17,489	12.04%
Treatment	145,311	17,264	11.88%

Key Findings

Conversion Rate Difference: The treatment group had a lower conversion rate by 0.16 percentage points.

Relative Change: There was a 1.31% decrease in conversion rate for the treatment group.

Unexpected Outcome: The new design did not perform as expected and actually showed a slight decrease in conversions.

6. Statistical Significance Testing: Chi-Square Analysis

To determine if the difference was statistically significant, I performed a Chi-Square test of independence.

Contingency Table Setup

ㅤ	Not Converted	Converted	Row Total
Control	127,785	17,489	145,274
Treatment	128,047	17,264	145,311
Column Total	255,832	34,753	290,585

Chi-Square Calculation Steps

Step 1: Calculate Expected Frequencies (Eᵢⱼ)

Using the formula I calculated the expected frequencies for each cell.

Cell (1,1): Control & Not Converted

E₁₁ = (145,274 × 255,832) ÷ 290,585
    = 37,159,287,168 ÷ 290,585
    = 127,907.26

Cell (1,2): Control & Converted

E₁₂ = (145,274 × 34,753) ÷ 290,585
    = 5,049,275,322 ÷ 290,585
    = 17,366.74

Cell (2,1): Treatment & Not Converted

E₂₁ = (145,311 × 255,832) ÷ 290,585
    = 37,168,752,352 ÷ 290,585
    = 127,924.74

Cell (2,2): Treatment & Converted

E₂₂ = (145,311 × 34,753) ÷ 290,585
    = 5,050,559,283 ÷ 290,585
    = 17,386.26

Step 2: Calculate Chi-Square Components

The chi-square components are computed as:

Cell (1,1): Control & Not Converted

(127,785 - 127,907.26)² ÷ 127,907.26 = (-122.26)² ÷ 127,907.26 = 0.1169

Cell (1,2): Control & Converted

(17,489 - 17,366.74)² ÷ 17,366.74 = (122.26)² ÷ 17,366.74 = 0.8607

Cell (2,1): Treatment & Not Converted

(128,047 - 127,924.74)² ÷ 127,924.74 = (122.26)² ÷ 127,924.74 = 0.1169

Cell (2,2): Treatment & Converted

(17,264 - 17,386.26)² ÷ 17,386.26 = (-122.26)² ÷ 17,386.26 = 0.8606

Each component (difference squared divided by expected frequency) was calculated for all four cells, leading to:

Step 3: Degrees of Freedom (df)

Step 4: Calculate P-value

The chi-square statistic was compared to the chi-square distribution with 1 degree of freedom, and the p-value was calculated to be 0.1916.

Statistical Decision

Critical Value Approach:

At α = 0.05 and df = 1, the critical value of chi-square is 3.841.

Since χ² = 1.7054 < 3.841, we fail to reject H₀.

P-value Approach:

P-value = 0.1916, which is greater than 0.05, so we fail to reject H₀.

Test Results Summary

7. Interpretation and Business Implications

Statistical Conclusion

The results of the chi-square test indicate that the difference in conversion rates between the control and treatment groups is not statistically significant (p-value = 0.1916 > 0.05). Therefore, the observed difference in conversion rates (a decrease of 0.16 percentage points in the treatment group) is likely due to random chance.

Business Insights

No Immediate Design Implementation: Given the lack of statistical significance, the new design should not be implemented.

Cost-Benefit Analysis: The resources allocated to redesigning the webpage may not provide sufficient returns, suggesting a reevaluation of priorities.

Further Investigation: A deeper analysis of specific design elements may identify areas for improvement, especially if other user segments respond differently.

Potential Reasons for Results

Design Elements: The new design may have unintentionally introduced friction points or complexities that discouraged conversions.

User Familiarity: Users may have preferred the existing design, which they were already familiar with, leading to higher conversions.

Testing Duration: The testing period may have been too short to observe long-term behavioral changes.

User Segmentation: Different user groups (e.g., based on demographics or browsing history) may have reacted differently to the design changes.

8. Recommendations and Next Steps

Immediate Actions

Retain Current Design: Continue with the existing webpage design based on the current evidence.

Investigate Specific Design Elements: Further analysis should focus on individual components of the new design to understand what might have caused the slight decline in conversions.

User Feedback: Collect qualitative feedback from users to understand their preferences and potential issues with the new design.

Future Experimentation

Targeted Testing: Run tests focusing on specific design elements (e.g., buttons, call-to-action text, color schemes).

Segmented Analysis: Perform analysis based on user demographics, device types, or behavior patterns to assess how different segments respond.

Extended Duration: Conduct longer-term tests to account for user adaptation over time.

Multivariate Testing: Experiment with multiple variations of the design simultaneously to find the most effective combination.

Statistical Considerations

Sample Size: With 290,585 users, the sample size is large enough to detect even small differences in conversion rates.

Effect Size: Evaluate whether even a small improvement in conversions might have business significance.

Confidence Intervals: Use confidence intervals to understand the potential range of conversion rate differences between groups.

9. Technical Implementation

This analysis was carried out using Python with the following core functions:

Core Functions

load_data(): Dataset loading and initial inspection

check_data_quality(): Identifying and correcting data quality issues

clean_data(): Data cleaning and preprocessing

calculate_conversion_rates(): Conversion rate analysis

perform_statistical_test(): Implementation of the chi-square test

create_visualizations(): Visualization of key results

Statistical Libraries Used

pandas: Data manipulation

numpy: Numerical computations

scipy.stats: Statistical testing

matplotlib/seaborn: Data visualization

10. Conclusion

This comprehensive A/B testing analysis demonstrates the critical importance of using rigorous statistical methods to guide business decisions. Despite the expectations for improved performance with the new webpage design, the test revealed no significant improvement, highlighting the value of data-driven decision-making.

Key takeaways include:

Statistical Rigor: Ensuring that proper testing protocols are followed prevents false conclusions.

Business Value: Avoiding the implementation of a non-effective design saved potential revenue losses.

Iterative Approach: The analysis sets the stage for future experiments with refined testing parameters.

Evidence-Based Decisions: This analysis reinforces the importance of making decisions based on solid data rather than assumptions.

This project lays the foundation for future A/B testing experiments, ensuring that business decisions are rooted in objective, data-backed evidence.

Final Thoughts: This expanded version provides a more detailed and refined analysis of the A/B testing process, including a clearer breakdown of statistical steps and business implications, along with additional recommendations for future testing.