HNSW Parameter Tuning: M, efConstruction, efSearch Explained
The three HNSW knobs (M, efConstruction, efSearch) move your recall-latency curve more than most teams realize. Pick defaults deliberately, not because the library shipped them.
Two HNSW indexes built from the same vectors with the same parameters can differ by up to 17 percentage points in relative recall, just because documents arrived in different orders (Marqo, 2024). That single result, covered in the book's chapter on latency, throughput, and scaling, upends the way most teams reason about HNSW tuning: the parameters are not the only variable, and the graph you ship is not the only graph the parameters describe.
What the three parameters actually do
HNSW is a graph-based approximate nearest neighbor index. Queries traverse a multi-layer graph, descending from coarse to fine connectivity until they land near the query point. Three parameters shape that graph, and each has a well-measured effect.
M is the number of bidirectional edges each node keeps. Higher M makes the graph denser, improves recall at a fixed efSearch, and increases memory. The graph structure alone adds approximately M x 8 to 10 bytes per element on top of the stored vectors (hnswlib, 2024). On SIFT-128 with one million vectors, moving M from 2 to 512 increases total index memory from roughly 0.5 GB to 5 GB (Pinecone, 2024d). The common default across vector databases is M = 16 (Marqo, 2024); for high-dimensional embeddings in the 768 to 1024 range, M = 48 to 64 is the recommended range (hnswlib, 2024).
efConstruction controls how hard the index works to find good neighbors while inserting each point. It does not affect query-time latency, only graph quality and build time. Under-configured efConstruction can cost up to 18% NDCG@10 (Marqo, 2024), which is a large retrieval quality hit for a parameter that is easy to get wrong and impossible to change without a full reindex.
efSearch is the query-time breadth of the search. It is the only one of the three that can be changed per query, and it is the parameter most teams actually tune in production.
The shape of the efSearch curve
The recall-latency curve is not linear and not forgiving. On SIFT-128 using hnswlib with M = 16 and efConstruction = 500, efSearch = 50 yields 0.950 recall at 28,022 QPS; efSearch = 500 yields 1.000 recall at 4,116 QPS (Aumuller et al., 2020). That is a 6.8x drop in throughput to buy five percentage points of recall. On the same benchmark, efSearch = 10 runs at 69,663 QPS but only reaches 0.713 recall.
The shape matters more than any single point on it. The cheap recall is at the bottom of the curve. The expensive recall is at the top. Picking a target of 1.000 instead of 0.985 is rarely a product decision; it is usually an artifact of nobody having measured the cost. Every 100 milliseconds of added search latency reduces daily searches by approximately 0.2% (Brutlag, 2009), and the latency difference between efSearch = 100 and efSearch = 500 at billion-scale fan-out can easily exceed that threshold.
Why defaults can mislead
Library defaults are tuned for generic benchmarks, usually SIFT-128 or GloVe-100. A modern embedding pipeline running 768 to 3072 dimensions is not SIFT-128. The M = 16 default that works on SIFT is exactly the lower bound of the recommended range for high-dimensional embeddings, not the middle of it. Running 1024-dimensional vectors through an HNSW index built with M = 16 is a choice, not a safe baseline.
The insertion-order result is the deeper warning. Two teams can run identical parameters, identical data, identical hardware, and produce indexes with 17% relative recall spread between them. That means a benchmark number reported without a reproducible build procedure is a point estimate with error bars the size of the effect most teams are trying to measure. It also means that reindexing the same corpus (after a model change, after a backfill, after a disaster recovery) can move your recall in ways that no parameter audit will catch.
What this leaves unresolved
HNSW parameters are one lever among several in a production retrieval stack, and they interact in ways a single-index benchmark does not capture. Vector retrieval is one stage inside a fixed latency budget that also has to accommodate query understanding, lexical retrieval, fusion, and reranking. Quantization changes the memory calculation, the distance computation cost, and the recall retention curve all at once, and the trade-offs differ between scalar, product, and binary schemes. Fan-out across shards amplifies tail latency non-linearly: a 1% per-shard slow rate at fan-out 100 produces a slow response on 63% of user requests. Hedged requests, adaptive replica selection, and partial-result policies address that tail, but only under specific utilization regimes. Chapter 15 works through each of these together, which is the only level at which HNSW tuning actually makes sense.
Related chapter
Chapter 15: Latency, Throughput, and Scaling
Every 100 milliseconds of added search latency measurably reduces user engagement, and hybrid pipelines must fit multiple retrieval stages into a fixed latency budget. This chapter explains how to allocate that budget across stages, use caching effectively, scale horizontally, tune ANN index parameters, and manage the long tail of slow queries.
Get notified when the book launches.
Laszlo Csontos
Author of Designing Hybrid Search Systems. Works on search and retrieval systems, and writes about the engineering trade-offs involved in combining keyword and vector search.
Related Posts
In a hybrid pipeline at scale, embedding computation and the vector index dominate cost. Stale embeddings are a second, quieter bill your users pay in quality.
January 19, 2026
A search system's quality can degrade without the latency, error rate, or any other standard dashboard noticing. Embedding drift monitoring is one of the pieces that standard ML observability tends to miss.
January 5, 2026
Stacked compression can cut vector index RAM by up to 192x, but the quality losses are non-additive. A validation workflow is the only way to find the Pareto point.
December 29, 2025