type
status
date
slug
summary
tags
category
icon
password
What is a Sparse Vector?
A sparse vector is a vector in which most of the elements are zero.
We store only the non-zero values and their positions to save memory and computation.
A sentence converted using Bag-of-Words or TF-IDF might look like this:
Only 3 values are non-zero — the rest are zeros. This is a sparse representation
What is a Dense Vector?
A dense vector is a vector where most or all elements have non-zero values.
Every element is stored explicitly, including zeros.
A BERT embedding of a sentence might look like:
This is a dense representation — all values are floats, learned by the model, and none are skipped.
Key Differences Between Sparse and Dense Vectors:
Feature | Sparse Vector | Dense Vector |
Values | Mostly 0s | Mostly non-zero |
Memory | Efficient (stores only non-zeros) | Takes more memory |
Origin | Rule-based (BoW, TF-IDF) | Learned (embeddings, neural nets) |
Meaning | Surface-level (word counts/freq) | Semantic-level (contextual meaning) |
Used in | Classical ML | Deep learning / neural networks |
Training Process with Sparse Vectors
Let’s dive into a logistic regression example where we are using sparse vectors for input features.
1. Initialization
We first initialize the weights www and bias bbb, similar to what you would do for dense vectors. The weights are learned during the training process.
Assume we have the following sparse input vector:
Sparse Vector Example:
- X_sparse = [0,1,0,0,0,0,1] (This vector represents a document in a sparse format, where only positions 2 and 7 have non-zero values.)
For simplicity, let's initialize the weights and bias as follows:
- Weights (w) = [0.1,−0.2,0.05,0,0,0,−0.1]
- Bias (b) = 0.1
2. Prediction Calculation
The model’s prediction is computed by calculating the linear combination of the input features with the weights, plus the bias:
Where are the weights corresponding to the features, and are the feature values in the sparse vector.
For our example, we only need to focus on the non-zero values in X_sparse, which are at positions 2 and 7.
The calculation of zzz:
Since , and are zero, we only compute for the non-zero elements:
3. Sigmoid Activation
Now, we pass the value of zzz through the sigmoid function to get the predicted probability:
So, the predicted probability that this document belongs to class 1 is approximately 45%.
4. Loss Calculation
Next, we compute the loss using the binary cross-entropy loss function:
Where:
- y is the true label (let’s say it's 1).
- is the predicted value (0.45 in this case).
The loss is:
5. Gradient Calculation and Update
To minimize the loss, we compute the gradients of the loss with respect to the weights and bias, and then update them using gradient descent.
For each weight wjw_jwj and bias bbb, the gradient is computed as:
Since we only have non-zero values for x2x_2x2 and x7x_7x7, the gradients with respect to the weights corresponding to the non-zero values are:
We can now update the weights using the learning rate ():
So the updated weights for and are:
6. Repeat the Process
This process continues iteratively over multiple training examples (documents) and epochs until the model converges, i.e., the weights and biases stabilize.
Summary
- Sparse Vectors: Only store non-zero values, saving memory and computational resources.
- Training Process: For sparse vectors, only the non-zero values are involved in each computation, making it more efficient.
- Gradient Descent Update: We compute gradients and update weights and biases just for the non-zero features in each vector.
This approach is very common in NLP tasks, where text data is usually represented in a sparse manner (e.g., bag-of-words or TF-IDF representations).
Dense Vector and Its Use in Machine Learning
Example: Logistic Regression with Dense Vectors
Let’s go through the training process using a dense input vector in a logistic regression model.
1. Initialization
We initialize the weights www and bias bbb. Assume the input vector is:
Dense Vector Example:
(Still 7-dimensional, just like before, but now explicitly represented as dense.)
Weights and bias:
2. Prediction Calculation
We compute:
Since this is dense, we do the full dot product:
3. Sigmoid Activation
4. Loss Calculation
Using the binary cross-entropy loss:
Assume true label y = 1, then:
5. Gradient Calculation and Update
We compute gradients for all weights, since the input is dense.
Let’s calculate:
Feature j | x_j | |
1 | 0 | |
2 | 1 | |
3 | 0 | |
4 | 0 | |
5 | 0 | |
6 | 0 | |
7 | 1 |
Bias gradient:
Using learning rate :
Updated weights:
- (no change)
Updated bias:
6. Repeat the Process
Repeat the above steps for multiple training examples and epochs until the model converges.
Sparse vs Dense Vector Usage Across Machine Learning Models
Model Type | Examples | Input Vector Type | Sparse or Dense? | Supports Sparse Input? |
Linear Models | LogisticRegression , LinearSVC | Bag-of-Words, TF-IDF | ✅ Sparse | ✅ Yes (highly optimized) |
Naive Bayes | MultinomialNB , BernoulliNB | Bag-of-Words, TF-IDF | ✅ Sparse | ✅ Yes |
Tree-Based Models | DecisionTree , RandomForest , XGBoost | Any numeric features | 🚫 Usually Dense | ⚠️ Partial (e.g. XGBoost has sparse optimizations) |
K-Nearest Neighbors | KNeighborsClassifier | TF-IDF or other features | 🚫 Usually Dense | ⚠️ Technically yes, but inefficient |
MLP / Shallow Neural Networks | MLPClassifier , Keras, PyTorch | Dense embeddings or numeric | ✅ Dense | 🚫 No — must convert to dense |
Transformers | BERT, RoBERTa, GPT | Token embeddings | ✅ Dense | 🚫 No — only dense tensors supported |
RNN / LSTM / GRU | NLP sequence models | Embedding sequences | ✅ Dense | 🚫 No |
CNN (Text/Image) | TextCNN, ResNet, etc. | Dense embeddings or image tensors | ✅ Dense | 🚫 No |
Recommendation Models | Matrix Factorization, LightFM | User-item interaction matrix | ✅ Often Sparse | ✅ Yes — optimized for sparse input |
- Author:Entropyobserver
- URL:https://tangly1024.com/article/214d698f-3512-807a-a790-cafa4d7ad52e
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!