type
status
date
slug
summary
tags
category
icon
password
Summary: This post walks through a full, runnable implementation of linear regression trained with gradient descent (batch, stochastic, mini-batch, and momentum), explains the math and code line-by-line, fixes and improves the original notebook, and compares with
scikit-learn
's SGDRegressor
(different learning rate schedules). All code is provided as ready-to-run Python snippets for a Jupyter notebook.Table of contents
- Problem statement
- Dataset and loading
- Hypothesis, cost, gradients — math and vectorized code
- Implementations: batch, stochastic, mini-batch, momentum
- Compare optimizers (plots + fitted lines)
scikit-learn
'sSGDRegressor
and learning-rate schedules
- Practical tips, pitfalls and hyperparameter guidance
- Extensions and next steps
1 — Problem statement
You have historical data of different cities:
x
: population (in tens of thousands)
y
: profit (in $10,000s)
Goal: Fit a linear model
y = w * x + b
using gradient-based optimizers and use the learned model to predict profit for new cities and compare optimizer behaviours.2 — Dataset and loading
Place your
data.txt
(comma-separated, two columns) in the same folder. Each line: population,profit
.Notes:
- After loading,
x_train
andy_train
are 1-D arrays of shape(m,)
.
- For
scikit-learn
later we will reshapex
into(m, 1)
.
3 — Hypothesis, cost, gradients (math + vectorized code)
Hypothesis (prediction):
Cost function (MSE / 2):
The
1/2
factor is conventional because it cancels when differentiating.Gradient (partial derivatives):
These formulas are exact and simple to vectorize with NumPy.
Why vectorize?
- Loops in Python are slower; NumPy performs operations in C. Vectorized code is concise and faster for moderate to large datasets.
Equivalent loop version (for clarity):
Both versions give the same result; prefer vectorized for performance.
4 — Gradient descent implementations (clean, corrected, runnable)
Below are four implementations. Note we've fixed earlier problems (removed stray
code
tokens, ensured shapes are correct, and included safety for printing).4.1 Batch gradient descent (standard)
4.2 Stochastic gradient descent (single random sample per step)
- Author:Entropyobserver
- URL:https://tangly1024.com/article/25dd698f-3512-80d0-bb23-c8ab0a3503a2
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!