Brief look at HNSW and IVF - Vector Index
Imagine this,
- We have 100,000 document chunks
- Each chunk is converted into a vector
- A user asks a question
- We must find the most semantically similar chunks now without latency
HNSW does not perform full scan. It actually exists to avoid full comparison.
HNSW builds a graph where,
- Each vector = a node
- Nodes are connected to nearby nodes
- Some nodes gets long-distance links
And then it adds layers.
Top Layer
- Very few nodes
- Very long connections
- Purpose is to get closest node fast
Middle Layers
- More nodes
- Medium connections
- Purpose is to redefine directions
Bottom Layer
- Most of the nodes
- Short connections
- Fine grained accuracy
Query Execution Flow
- Starts at the random entry at top layer
- Move to whichever connected node is closer to the query
- Repeat until no better node exists
- Drop down one layer
- Repeat and continue until botton layer
- Return top-k nearest neighbors
Basic idea is not avoid scanning and do the navigation through nodes. This is the same way social medio and internet routing works.
HNSW shines when,
- Queries are frequent
- Queries must be fast
- You care about high recall
- Data is read-heavy
These things best matches to,
- Chatbots
- RAG-based Q&A
- Copilots
- Conversational Search
But HNSW consumes more memory, does not suit for high-write applications.
In short, HNSW = fast, accurate, memory-hungry semantic search for real-time Gen AI.
Lets talk about IVF (Inverted File Index) - Complementary idea that exists because HNSW has limits.
While HNSW prioritizes speed and accuracy compromising the memory usage, IVF prioritizes scaling with less memory footprints.
How IVF works
- During indexing
- All vectors are grouped into clusters
- Each cluster has a representative called centroid
- During search
- Query vector is compared with the centroids
- Only closest centroids (clusters) are selected
- Search happens only inside those clusters
So instead of search all document chunks, search will happen against a subset of those in the selected clusters.
When IVF is preferred over HNSW?
- You have millions of vectors
- Queries are not latency sensitive
- Cost efficient
- Dataset grows continously
Do not use IVF when,
- You need real-time conversation speed
- You need best possible answer
Overall,
- HNSW = Speed First.
- IVF = Scale First.


Comments
Post a Comment