Scalable Stats Collection in a URL Shortener
A URL shortener may handle millions of redirects per second.
Each redirect is also a click event we want to count.
But there’s a challenge:
Redirects must be ultra fast, while stats storage is write-heavy.
If we update the database on every click, the system collapses.
So we use an asynchronous event pipeline with aggregation.
⚠️ The Naive Approach (Doesn’t Scale)
Every redirect doing this:
UPDATE stats SET clicks = clicks + 1 WHERE shortCode = 'abc123';
At scale, this causes:
- DB overload
- Lock contention
- Slow redirects
So analytics must be decoupled from the redirect path.
User Request
↓
Redirect Service
↓ (async event)
Message Queue
↓
Stream Aggregator
↓
Analytics Database
Redirect returns immediately. Stats processing happens later.
🧩 Step 1: Event Generation
For every redirect, we emit a lightweight event:
{
shortCode: "abc123",
timestamp: 10:01:05
}
If URL gets 10 hits in 1 second → 10 events are pushed to the queue.
This is cheap and non-blocking.
🔄 Step 2: Grouping (Aggregation)
We do windowed aggregation.
Grouping Key
We group events using:
(shortCode, time_bucket)
Where:
time_bucket = timestamp truncated to minute
Example:
| Event Time | Bucket |
|---|---|
| 10:01:05 | 10:01 |
| 10:01:32 | 10:01 |
| 10:01:58 | 10:01 |
So grouping key becomes:
("abc123", "10:01")
⚙️ Stream Processor Logic
It keeps an in-memory table:
| Key | Count |
|---|---|
| (“abc123”, “10:01”) | 10 |
Each event increments:
count[key] += 1
📝 Step 3: Writing to Database
When the 1-minute window closes:
shortCode = abc123
minute = 10:01
clicks = 10
Only one DB write instead of 10.
📉 Write Reduction
| Raw Events | After Aggregation |
|---|---|
| 10 | 1 |
| 1000 | 1 |
| 1M | 1 |
This massively reduces database load.