How Protobuf Stores Data in Binary Format
Unlike JSON, Protocol Buffers (Protobuf) use a compact binary format that removes redundant information. Here’s how data is structured and stored efficiently.
1️⃣ Protobuf Message Structure
Let’s take a JSON-like structure:
{
"id": 123,
"name": "John Doe",
"email": "[email protected]"
}
In Protobuf, we define this as:
message User {
int32 id = 1;
string name = 2;
string email = 3;
}
Each field has:
✔ Type (int32, string, etc.)
✔ Field Number (1, 2, 3, etc.)
2️⃣ How Protobuf Stores Data in Binary Format
Protobuf does not store keys as strings (like JSON). Instead, it encodes data in a compact format using:
1️⃣ Field numbers (instead of field names).
2️⃣ Varint encoding (for numbers, making small numbers smaller).
3️⃣ Length-prefix encoding (for strings and byte arrays).
Let’s say we encode:
User { id = 123, name = "John Doe", email = "[email protected]" }
Binary Encoding Process (Efficient Storage Format)
Field | Field Number | Wire Type | Encoded Value |
---|---|---|---|
id = 123 | 1 | Varint | 08 7B |
name = "John Doe" | 2 | Length-prefixed | 12 08 4A 6F 68 6E 20 44 6F 65 |
email = "[email protected]" | 3 | Length-prefixed | 1A 12 6A 6F 68 6E 40 65 78 61 6D 70 6C 65 2E 63 6F 6D |
4️⃣ Breakdown of Binary Representation
- Field Key = (Field Number << 3) | Wire Type
- Example:
id = 123
- Field Number:
1
(shifted left:1 << 3
=8
) - Wire Type:
0
(Varint) - Encoded as:
08
(1st byte) - 123 is stored as:
7B
(in Varint encoding) - Final:
08 7B
- Field Number:
- Example:
- Strings Are Length-Prefixed
"John Doe"
- Field Key:
2 << 3 | 2
→12
- Length:
8
- UTF-8 Bytes:
4A 6F 68 6E 20 44 6F 65
- Final:
12 08 4A 6F 68 6E 20 44 6F 65
- Field Key:
5️⃣ Why Is This More Efficient Than JSON?
Feature | JSON | Protobuf |
---|---|---|
Keys | Stored as full strings ("id" , "name" ) | Uses field numbers (1 , 2 ) |
Numbers | Stored as ASCII (wastes space) | Uses Varint encoding (smaller size) |
Strings | Stored as UTF-8 with quotes | Length-prefixed (removes overhead) |
Parsing | Slower (requires key lookup) | Faster (fixed positions) |
Size | Larger due to redundant key names | Smaller (binary format) |
6️⃣ Real Example: Comparing JSON vs Protobuf Encoded Data
JSON Representation (58 bytes)
{"id":123,"name":"John Doe","email":"[email protected]"}
- Keys take space:
"id"
,"name"
,"email"
- Numbers stored as strings (less efficient)
Protobuf Binary Representation (28 bytes)
08 7B 12 08 4A 6F 68 6E 20 44 6F 65 1A 12 6A 6F 68 6E 40 65 78 61 6D 70 6C 65 2E 63 6F 6D
- Field numbers replace key names
- Compact encoding for numbers and strings
7️⃣ Conclusion: Why gRPC & Protobuf Are Efficient
✅ Smaller size → No redundant key names
✅ Faster serialization → Direct binary format
✅ Less CPU usage → Efficient parsing with field numbers
✅ Optimized for network communication
This is why gRPC is preferred over JSON-based REST APIs for high-performance systems! 🚀