Brief look at HNSW and IVF - Vector Index

 


Imagine this,

  • We have 100,000 document chunks
  • Each chunk is converted into a vector
  • A user asks a question
  • We must find the most semantically similar chunks now without latency
HNSW does not perform full scan. It actually exists to avoid full comparison.

HNSW builds a graph where,
  • Each vector = a node
  • Nodes are connected to nearby nodes
  • Some nodes gets long-distance links
And then it adds layers.

Top Layer

  • Very few nodes
  • Very long connections
  • Purpose is to get closest node fast

Middle Layers

  • More nodes
  • Medium connections
  • Purpose is to redefine directions

Bottom Layer

  • Most of the nodes
  • Short connections
  • Fine grained accuracy

Query Execution Flow

  1. Starts at the random entry at top layer
  2. Move to whichever connected node is closer to the query
  3. Repeat until no better node exists
  4. Drop down one layer
  5. Repeat and continue until botton layer
  6. Return top-k nearest neighbors
Basic idea is not avoid scanning and do the navigation through nodes. This is the same way social medio and internet routing works.

HNSW shines when,
  • Queries are frequent
  • Queries must be fast
  • You care about high recall
  • Data is read-heavy
These things best matches to,
  • Chatbots
  • RAG-based Q&A
  • Copilots
  • Conversational Search
But HNSW consumes more memory, does not suit for high-write applications.

In short, HNSW = fast, accurate, memory-hungry semantic search for real-time Gen AI.

Lets talk about IVF (Inverted File Index) - Complementary idea that exists because HNSW has limits.

While HNSW prioritizes speed and accuracy compromising the memory usage, IVF prioritizes scaling with less memory footprints.

How IVF works

  • During indexing
    • All vectors are grouped into clusters
    • Each cluster has a representative called centroid
  • During search
    • Query vector is compared with the centroids
    • Only closest centroids (clusters) are selected
    • Search happens only inside those clusters
So instead of search all document chunks, search will happen against a subset of those in the selected clusters.

When IVF is preferred over HNSW?

  • You have millions of vectors
  • Queries are not latency sensitive
  • Cost efficient 
  • Dataset grows continously
Do not use IVF when,
  • You need real-time conversation speed
  • You need best possible answer
Overall, 
  • HNSW = Speed First. 
  • IVF = Scale First.

Comments

Popular Posts