What Is Vector Similarity? Cosine, Dot Product, and L2 Distance
By Tharindu Gunawardana | SearchMinistry Media |
Vector similarity measures how close two vectors are in a high-dimensional embedding space. The three main measures are cosine similarity, dot product similarity, and Euclidean (L2) distance. Each produces a different ranking of neighbours and each is preferred in different retrieval scenarios.
Cosine Similarity
Cosine similarity measures the angle between two vectors, ignoring their magnitudes. Two vectors pointing in the same direction score 1.0 regardless of whether one is twice the length of the other. This makes cosine similarity robust to document length differences: a short and a long document about the same topic score similarly if they discuss it in the same proportions.
Dot Product Similarity
Dot product multiplies corresponding dimensions and sums them, incorporating both direction and magnitude. Larger-magnitude vectors score higher dot products for the same directional alignment. Many modern embedding models are trained with dot product as the similarity metric. When embeddings are L2-normalised (unit length), dot product and cosine similarity are equivalent.
Euclidean (L2) Distance
L2 distance measures the geometric distance between two points in the embedding space. Smaller L2 distance means greater similarity. L2 is sensitive to magnitude: two vectors with the same direction but different lengths will have non-zero L2 distance even though they represent the same semantic direction.
SEO Implications
AI retrieval systems configured for cosine similarity retrieve content based on topical alignment regardless of document length. Systems using dot product reward both topical relevance and semantic strength of the embedding. Content that is semantically dense and directly on-topic scores well under both metrics. Understanding the similarity metric your target system uses informs content structure decisions.