A Hands-On Guide to Gradient Descent in Linear Regression | EntropyObserver

Technology

Machine Learning

A Hands-On Guide to Gradient Descent in Linear Regression

Words 610Read Time≈ 2 min

Aug 28, 2025

type

status

date

slug

summary

tags

category

icon

password

Summary: This post walks through a full, runnable implementation of linear regression trained with gradient descent (batch, stochastic, mini-batch, and momentum), explains the math and code line-by-line, fixes and improves the original notebook, and compares with scikit-learn's SGDRegressor (different learning rate schedules). All code is provided as ready-to-run Python snippets for a Jupyter notebook.

Table of contents

Problem statement

Dataset and loading

Hypothesis, cost, gradients — math and vectorized code

Implementations: batch, stochastic, mini-batch, momentum

Compare optimizers (plots + fitted lines)

scikit-learn's SGDRegressor and learning-rate schedules

Practical tips, pitfalls and hyperparameter guidance

Extensions and next steps

1 — Problem statement

You have historical data of different cities:

x: population (in tens of thousands)

y: profit (in $10,000s)

Goal: Fit a linear model y = w * x + b using gradient-based optimizers and use the learned model to predict profit for new cities and compare optimizer behaviours.

2 — Dataset and loading

Place your data.txt (comma-separated, two columns) in the same folder. Each line: population,profit.

Notes:

After loading, x_train and y_train are 1-D arrays of shape (m,).

For scikit-learn later we will reshape x into (m, 1).

3 — Hypothesis, cost, gradients (math + vectorized code)

Hypothesis (prediction):

Cost function (MSE / 2):

The 1/2 factor is conventional because it cancels when differentiating.

Gradient (partial derivatives):

These formulas are exact and simple to vectorize with NumPy.

Why vectorize?

Loops in Python are slower; NumPy performs operations in C. Vectorized code is concise and faster for moderate to large datasets.

Equivalent loop version (for clarity):

Both versions give the same result; prefer vectorized for performance.

4 — Gradient descent implementations (clean, corrected, runnable)

Below are four implementations. Note we've fixed earlier problems (removed stray code tokens, ensured shapes are correct, and included safety for printing).

4.1 Batch gradient descent (standard)

4.2 Stochastic gradient descent (single random sample per step)

Logistic Regression

Author:Entropyobserver
URL:https://tangly1024.com/article/25dd698f-3512-80d0-bb23-c8ab0a3503a2
Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!

Relate Posts

Logistic Regression

Lazy loaded image

Lazy loaded image

K-Means Algorithm

Lazy loaded image

Sklearn Workflow

Lazy loaded image

Overview of Machine Learning

Lazy loaded image

Sparse vs Dense vectors

Lazy loaded image

Catalog

Hello! I am

Entropyobserver

Entropyobserver

Tech Tutorials & Coding Tips

AI Tools and Digital Insights

Design and Development Inspirations

English for IT Professionals

Catalog

Discussion Channel

Join our community for discussion and sharing

Click to Join the Community

Latest posts

Lazy loaded image

Lazy loaded image

Lazy loaded image

Lazy loaded image

Lazy loaded image

Lazy loaded image

¹²

Machine Learning

¹¹

¹⁰

⁸

⁸

⁸

⁷

⁶

¹

Number of Posts:

65

Days Since Site Launch:

2645 天