Distance Metrics

When you query a vector database, it ranks results by how “similar” the query vector is to each stored vector. But similarity is defined mathematically - and the definition you choose affects the quality of your results. VecLabs supports three distance metrics: cosine similarity, euclidean distance, and dot product.

Cosine Similarity

Cosine similarity measures the angle between two vectors. It ignores magnitude entirely and only cares about direction.

cosine(A, B) = (A · B) / (|A| × |B|)

Range: -1 to 1 (VecLabs returns 0 to 1 after normalization)

Score of 1.0 = identical direction = maximum similarity
Score of 0.0 = perpendicular = no relationship
Score of -1.0 = opposite direction = maximum dissimilarity

Why it works for text: Embedding models like text-embedding-ada-002 produce vectors where the direction encodes meaning and the magnitude is largely noise. Two sentences can have different lengths and different word frequencies, producing vectors of different magnitudes - but if they mean the same thing, they’ll point in the same direction. When to use cosine:

Text similarity and semantic search
Document retrieval and RAG
AI agent memory
Any task where you’re using a language model’s embeddings

Example:

const collection = sv.collection("my-docs", {
  dimensions: 1536,
  metric: "cosine", // default - best for most use cases
});

Euclidean Distance

Euclidean distance measures the straight-line distance between two points in vector space. It considers both direction and magnitude.

euclidean(A, B) = sqrt(sum((A_i - B_i)^2))

Range: 0 to infinity

Distance of 0 = identical vectors
Larger distance = less similar

VecLabs returns this as a similarity score (inverted), so higher is still better. When to use euclidean:

Image embeddings where magnitude carries meaning
Audio or signal processing embeddings
Cases where your embedding model was specifically trained with euclidean distance
Recommendation systems based on collaborative filtering

When NOT to use euclidean:

Text embeddings from transformer models - cosine is almost always better for these

const collection = sv.collection("images", {
  dimensions: 512,
  metric: "euclidean",
});

Dot Product

Dot product is the raw inner product of two vectors without normalization.

dot(A, B) = sum(A_i × B_i)

Range: Unbounded

Higher is more similar
Magnitude affects the score - larger vectors score higher

When to use dot product:

Models that explicitly recommend dot product (e.g., some Cohere models)
Maximum inner product search (MIPS) problems
Recommendation systems where popularity (magnitude) should boost scores
Cases where you want to reward longer, more information-rich vectors

Note: If your vectors are normalized to unit length (magnitude = 1), dot product and cosine similarity produce identical results. Many embedding models output normalized vectors.

const collection = sv.collection("recommendations", {
  dimensions: 256,
  metric: "dot",
});

Quick decision guide

Are you using a language model's text embeddings?
  → Cosine similarity (almost always correct)

Are you working with images or audio?
  → Euclidean distance

Does your embedding model documentation specify a metric?
  → Use whatever it recommends

Are you building a recommendation system where popularity matters?
  → Dot product

Not sure?
  → Cosine similarity (safe default)

You cannot change metrics after collection creation

The distance metric is fixed at collection creation time. Changing it requires creating a new collection and re-indexing all your vectors. Choose carefully upfront.

Creating a collection with the wrong metric will produce silently incorrect search results - queries will return results, but they won’t be the right ones. Always verify your metric matches what your embedding model was trained with.

VecLabs raw benchmark numbers by metric

Measured on Apple M3, 384 dimensions, 1000 samples:

Metric	Raw computation time
Dot product	196ns
Euclidean	212ns
Cosine	303ns

Cosine is slightly slower because it requires computing the magnitude of both vectors before dividing. The difference is negligible at query time - the HNSW graph traversal dominates overall latency, not the distance computation.

Getting Started

Core Concepts

Why VecLabs

Security & Data Privacy

Use Cases

TypeScript SDK

Python SDK

Guides

Reference

Distance Metrics

Distance Metrics

Cosine Similarity

Euclidean Distance

Dot Product

Quick decision guide

You cannot change metrics after collection creation

VecLabs raw benchmark numbers by metric

Getting Started

Core Concepts

Why VecLabs

Security & Data Privacy

Use Cases

TypeScript SDK

Python SDK

Guides

Reference

​Distance Metrics

​Cosine Similarity

​Euclidean Distance

​Dot Product

​Quick decision guide

​You cannot change metrics after collection creation

​VecLabs raw benchmark numbers by metric

Distance Metrics

Cosine Similarity

Euclidean Distance

Dot Product

Quick decision guide

You cannot change metrics after collection creation

VecLabs raw benchmark numbers by metric