Lazy loaded image
Technology
Lazy loaded imageCosine Similarity
Words 471Read Time 2 min
Jul 2, 2021
May 1, 2025
type
status
date
slug
summary
tags
category
icon
password

Step 1: Formula for Cosine Similarity

The formula for cosine similarity between two vectors and is:
Where:
  • is the dot product of the vectors.
  • is the magnitude (norm) of vector , and similarly for ∥B∥

Step 2: Understanding the Matrix

The provided matrix has the following format (rows represent documents, and columns represent terms, where the values are the term frequencies or weights for the terms in each document):
Terms
better
dog
food
good
great
jumbo
powdered
price
processed
product
products
quality
taffy
tiny
treat
unsalted
vendor
vitality
witch
yummy
Doc 0
0.406496
0.406496
0.406496
0.327959
0.000000
0.000000
0.000000
0.000000
0.203248
0.327959
0.203248
0.406496
0.000000
0.000000
0.000000
0.000000
0.000000
0.203248
0.000000
0.000000
Doc 1
0.000000
0.000000
0.000000
0.000000
0.000000
0.681849
0.000000
0.000000
0.000000
0.550112
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.340925
0.340925
0.000000
0.000000
Doc 2
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.270657
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.541314
0.000000
0.000000
0.000000
0.541314
0.218364
Doc 3
0.000000
0.000000
0.000000
1.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
Doc 4
0.193706
0.000000
0.000000
0.000000
0.581119
0.000000
0.000000
0.193706
0.000000
0.000000
0.000000
0.000000
0.774826
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.156281
We will calculate the cosine similarity between two documents, say Doc 0 and Doc 1.

Step 3: Compute the Dot Product

The dot product of two vectors is calculated as the sum of the element-wise multiplication of the corresponding components.
Let’s compute the dot product between Doc 0 and Doc 1:
We calculate this for all terms, noting that many terms will multiply by zero, so only the non-zero values matter.
Thus, the dot product between Doc 0 and Doc 1 is 0.
The dot product results for each document pair are:
Pair of Docs
Dot Product
Doc 0, Doc 1
0
Doc 0, Doc 2
0
Doc 0, Doc 3
0.327959
Doc 0, Doc 4
0.078755
Doc 1, Doc 2
0.184607
Doc 1, Doc 3
0
Doc 1, Doc 4
0
Doc 2, Doc 3
0
Doc 2, Doc 4
0
Doc 3, Doc 4
0
 

Step 4: Compute the Magnitudes (Norms)

The magnitude (norm) of a vector A\mathbf{A}A is calculated using the formula:
Let’s compute the magnitude of Doc 0:
First, we square each value:
Then sum all the squared values and take the square root. This is done for each document to find their magnitudes.

Step 5: Calculate the Cosine Similarity

Once we have the dot product and the magnitudes of both vectors, we apply the cosine similarity formula:
Since the dot product between Doc 0 and Doc 1 is 0, the cosine similarity will also be 0. This means that Doc 0 and Doc 1 are completely dissimilar based on the cosine similarity metric.

Summary of Steps for Cosine Similarity Calculation:

  1. Dot Product: Multiply corresponding values and sum them up.
  1. Magnitude of Each Document: Square the values, sum them, and take the square root.
  1. Cosine Similarity: Apply the formula by dividing the dot product by the product of the magnitudes.

Conclusion:

From the above detailed calculation, we can conclude that Doc 0 and Doc 1 are completely dissimilar since their cosine similarity is 0.
4o mini
上一篇
Weighted Euclidean Distance
下一篇
Prophet Model