Lazy loaded image
Lazy loaded imageA Hands-On Guide to Gradient Descent in Linear Regression
Words 610Read Time 2 min
Mar 1, 2020
Aug 28, 2025
type
status
date
slug
summary
tags
category
icon
password
Summary: This post walks through a full, runnable implementation of linear regression trained with gradient descent (batch, stochastic, mini-batch, and momentum), explains the math and code line-by-line, fixes and improves the original notebook, and compares with scikit-learn's SGDRegressor (different learning rate schedules). All code is provided as ready-to-run Python snippets for a Jupyter notebook.

Table of contents

  1. Problem statement
  1. Dataset and loading
  1. Hypothesis, cost, gradients — math and vectorized code
  1. Implementations: batch, stochastic, mini-batch, momentum
  1. Compare optimizers (plots + fitted lines)
  1. scikit-learn's SGDRegressor and learning-rate schedules
  1. Practical tips, pitfalls and hyperparameter guidance
  1. Extensions and next steps

1 — Problem statement

You have historical data of different cities:
  • x: population (in tens of thousands)
  • y: profit (in $10,000s)
Goal: Fit a linear model y = w * x + b using gradient-based optimizers and use the learned model to predict profit for new cities and compare optimizer behaviours.

2 — Dataset and loading

Place your data.txt (comma-separated, two columns) in the same folder. Each line: population,profit.
 
Notes:
  • After loading, x_train and y_train are 1-D arrays of shape (m,).
  • For scikit-learn later we will reshape x into (m, 1).

3 — Hypothesis, cost, gradients (math + vectorized code)

Hypothesis (prediction):
 
Cost function (MSE / 2):
The 1/2 factor is conventional because it cancels when differentiating.
Gradient (partial derivatives):
These formulas are exact and simple to vectorize with NumPy.
Why vectorize?
  • Loops in Python are slower; NumPy performs operations in C. Vectorized code is concise and faster for moderate to large datasets.
Equivalent loop version (for clarity):
Both versions give the same result; prefer vectorized for performance.

4 — Gradient descent implementations (clean, corrected, runnable)

Below are four implementations. Note we've fixed earlier problems (removed stray code tokens, ensured shapes are correct, and included safety for printing).

4.1 Batch gradient descent (standard)

4.2 Stochastic gradient descent (single random sample per step)
 
上一篇
Logistic Regression
下一篇
Random Forest