Skip to main content

Performance

VecLabs achieves 4.7ms p99 at 100K vectors, 1536 dimensions (OpenAI ada-002 size). This page explains what makes that number possible and how it compares to alternatives.

Benchmark results

Measured on Apple M3, 100K vectors, 1536 dimensions, cosine similarity, top-10 ANN, 1,000 samples:
PercentileVecLabsPinecone s1Qdrant
p502.995ms~10ms~6ms
p953.854ms~20ms~12ms
p994.688ms~30ms~18ms
p99.95.674ms~50ms~30ms
Reproduce: cargo run --release --example percentile_bench -p solvec-core

Why VecLabs is faster

1. No network round-trip Pinecone, Qdrant Cloud, and Weaviate Cloud all require a network request for every query. That round-trip adds 5-50ms depending on your network and region. VecLabs runs the HNSW index in-process - the query never leaves your application’s memory. 2. No garbage collector Python (hnswlib, Chroma) and Go (Weaviate) have garbage collectors that cause unpredictable pauses. These pauses show up at p99 and p99.9 as latency spikes. Rust has no GC - there are no pauses. 3. Zero-copy query path Vectors are stored in memory as native f32 arrays. There’s no serialization, deserialization, or copying on the query hot path. The distance computation accesses memory directly. 4. Cache-optimized data layout The HNSW graph structure and its associated vectors are laid out in memory to maximize CPU cache hits during graph traversal. The inner loop of nearest-neighbor search hits L1/L2 cache rather than main memory.

Write vs query latency

These are independent and should not be confused:
OperationLatencyBlocking?
HNSW insert (upsert)~2ms✅ Blocks until done
Shadow Drive upload~500-2000ms❌ Async background
Solana Merkle root~400ms❌ Async background
HNSW query3-5ms p99✅ Blocks until done
Verify (Solana RPC)~400ms✅ Blocks until done
Your application waits for the HNSW operations. Everything else is fire-and-forget in the background.