Compressing Vector DB data
Question: If a system has 100 million user-product interactions, and if each order is mapped to
a vector of 768 dimensions, how much space will the system require to store all vectors?
Answer: 100 million interactions = 10
8
interactions.
Each order is stored as a 768-dimensional vector. That’s 7.68 x 10
10 values.
If every dimensional value is stored as a 32-bit floating point, we need 7.68 x 10
10 x 32 bits.
= 7.68 x 10
10 x 4 Bytes
= 3.072 x 10
11 Bytes
= 307.2 GB
Caching this data is very challenging.
This makes the system slow to query, hurting user experience.
Why Compress
The goal of compression is not saving disk space. Compression helps:
● Fitting more vectors in RAM
● Reducing search latency
● Making SIMD or GPU compute feasible
How It Works
- Product Quantization (PQ)
● Break each vector into subb-vectors (e.g., split a 128D vector into 8 chunks of 16D). - ● For each chunk, find the nearest centroid from a pre-trained codebook.
- ● Store only the index of the centroid, not the float values.
- So instead of storing 128 floats (512 bytes), you store 8 integers (8 bytes).
That’s a 64x reduction.
PQ is used heavily in Facebook’s FAISS, Milvus, and other modern vector DBs.