Brief look at HNSW and IVF

February 05, 2026

Brief look at HNSW and IVF - Vector Index

Imagine this,

We have 100,000 document chunks
Each chunk is converted into a vector
A user asks a question
We must find the most semantically similar chunks now without latency

HNSW does not perform full scan. It actually exists to avoid full comparison.

HNSW builds a graph where,

Each vector = a node
Nodes are connected to nearby nodes
Some nodes gets long-distance links

And then it adds layers.

Top Layer

Very few nodes
Very long connections
Purpose is to get closest node fast

Middle Layers

More nodes
Medium connections
Purpose is to redefine directions

Bottom Layer

Most of the nodes
Short connections
Fine grained accuracy

Query Execution Flow

Starts at the random entry at top layer
Move to whichever connected node is closer to the query
Repeat until no better node exists
Drop down one layer
Repeat and continue until botton layer
Return top-k nearest neighbors

Basic idea is not avoid scanning and do the navigation through nodes. This is the same way social medio and internet routing works.

HNSW shines when,

Queries are frequent
Queries must be fast
You care about high recall
Data is read-heavy

These things best matches to,

Chatbots
RAG-based Q&A
Copilots
Conversational Search

But HNSW consumes more memory, does not suit for high-write applications.

In short, HNSW = fast, accurate, memory-hungry semantic search for real-time Gen AI.

Lets talk about IVF (Inverted File Index) - Complementary idea that exists because HNSW has limits.

While HNSW prioritizes speed and accuracy compromising the memory usage, IVF prioritizes scaling with less memory footprints.

How IVF works

During indexing

All vectors are grouped into clusters
Each cluster has a representative called centroid

During search

Query vector is compared with the centroids
Only closest centroids (clusters) are selected
Search happens only inside those clusters

So instead of search all document chunks, search will happen against a subset of those in the selected clusters.

When IVF is preferred over HNSW?

You have millions of vectors
Queries are not latency sensitive
Cost efficient
Dataset grows continously

Do not use IVF when,

You need real-time conversation speed
You need best possible answer

Overall,

HNSW = Speed First.
IVF = Scale First.

Search This Blog

Karthik Venkatesalu