type
status
date
slug
summary
tags
category
icon
password
Step 1: Formula for Cosine Similarity
The formula for cosine similarity between two vectors and is:
Where:
- is the dot product of the vectors.
- is the magnitude (norm) of vector , and similarly for ∥B∥
Step 2: Understanding the Matrix
The provided matrix has the following format (rows represent documents, and columns represent terms, where the values are the term frequencies or weights for the terms in each document):
Terms | better | dog | food | good | great | jumbo | powdered | price | processed | product | products | quality | taffy | tiny | treat | unsalted | vendor | vitality | witch | yummy |
Doc 0 | 0.406496 | 0.406496 | 0.406496 | 0.327959 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.203248 | 0.327959 | 0.203248 | 0.406496 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.203248 | 0.000000 | 0.000000 |
Doc 1 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.681849 | 0.000000 | 0.000000 | 0.000000 | 0.550112 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.340925 | 0.340925 | 0.000000 | 0.000000 |
Doc 2 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.270657 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.541314 | 0.000000 | 0.000000 | 0.000000 | 0.541314 | 0.218364 |
Doc 3 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
Doc 4 | 0.193706 | 0.000000 | 0.000000 | 0.000000 | 0.581119 | 0.000000 | 0.000000 | 0.193706 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.774826 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.156281 |
We will calculate the cosine similarity between two documents, say Doc 0 and Doc 1.
Step 3: Compute the Dot Product
The dot product of two vectors is calculated as the sum of the element-wise multiplication of the corresponding components.
Let’s compute the dot product between Doc 0 and Doc 1:
We calculate this for all terms, noting that many terms will multiply by zero, so only the non-zero values matter.
Thus, the dot product between Doc 0 and Doc 1 is 0.
The dot product results for each document pair are:
Pair of Docs | Dot Product |
Doc 0, Doc 1 | 0 |
Doc 0, Doc 2 | 0 |
Doc 0, Doc 3 | 0.327959 |
Doc 0, Doc 4 | 0.078755 |
Doc 1, Doc 2 | 0.184607 |
Doc 1, Doc 3 | 0 |
Doc 1, Doc 4 | 0 |
Doc 2, Doc 3 | 0 |
Doc 2, Doc 4 | 0 |
Doc 3, Doc 4 | 0 |
Step 4: Compute the Magnitudes (Norms)
The magnitude (norm) of a vector A\mathbf{A}A is calculated using the formula:
Let’s compute the magnitude of Doc 0:
First, we square each value:
Then sum all the squared values and take the square root. This is done for each document to find their magnitudes.
Step 5: Calculate the Cosine Similarity
Once we have the dot product and the magnitudes of both vectors, we apply the cosine similarity formula:
Since the dot product between Doc 0 and Doc 1 is 0, the cosine similarity will also be 0. This means that Doc 0 and Doc 1 are completely dissimilar based on the cosine similarity metric.
Summary of Steps for Cosine Similarity Calculation:
- Dot Product: Multiply corresponding values and sum them up.
- Magnitude of Each Document: Square the values, sum them, and take the square root.
- Cosine Similarity: Apply the formula by dividing the dot product by the product of the magnitudes.
Conclusion:
From the above detailed calculation, we can conclude that Doc 0 and Doc 1 are completely dissimilar since their cosine similarity is 0.
4o mini
- Author:Entropyobserver
- URL:https://tangly1024.com/article/1c6d698f-3512-8171-b823-d6f3f93391ef
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!