Length-Prefixed Strings in Protocol Buffers — Deep Dive
Length-prefixed strings are one of the key reasons why Protocol Buffers (ProtoBuf) are smaller and faster than JSON or XML.
In JSON or XML, strings are wrapped in quotes and require closing tags or delimiters — making them verbose.
🔑 What is a Length-Prefixed String?
A length-prefixed string means: 👉 First, store the length of the string (number of bytes)
👉 Then, store the actual string bytes
It eliminates the need for closing tags or delimiters — making serialization more compact.
🔥 Example Walkthrough
Let’s assume we have this simple data:
JSON Example:
{
"name": "John"
}
✅ JSON Breakdown:
Part | Bytes | Explanation |
---|---|---|
"name" | 6 | Key with quotes + colon |
"John" | 6 | Value with quotes |
Total | 12 bytes |
ProtoBuf Example:
In ProtoBuf, the same data would look like:
Person {
string name = 1;
}
When the value "John"
is serialized, it will be:
Field No. | Wire Type | Length | UTF-8 Bytes |
---|---|---|---|
0A | Length-delimited (Wire Type 2) | 04 | 4A 6F 68 6E (UTF-8 for “John”) |
How it Works:
0A
→ (Field 1 + WireType 2)04
→ Length of"John"
(4 bytes)4A 6F 68 6E
→"John"
in UTF-8
✅ Total Size → 6 bytes 🔥 (50% smaller than JSON)
How is This More Efficient?
Feature | JSON | ProtoBuf |
---|---|---|
String Storage | "John" (6 bytes) | 04 4A 6F 68 6E (6 bytes, including length) |
Key Storage | "name" (6 bytes) | 0A (1 byte field + wire type) |
Total Size | 12 bytes | 6 bytes 🔥 |
Extra Overhead | Quotes, Colons | None |
🔍 What Happens If String is Empty?
If "name"
is empty:
- JSON →
"name": ""
→ 9 bytes - ProtoBuf →
0A 00
→ 2 bytes
✅ How Length Prefix Helps:
String Length | ProtoBuf Size | JSON Size |
---|---|---|
0 (Empty) | 2 bytes | 9 bytes |
4 (“John”) | 6 bytes | 12 bytes |
50 | 52 bytes | 60+ bytes |
1000 | 1002 bytes | 1000+ bytes |
How Does ProtoBuf Decode the String?
When the client receives 0A 04 4A 6F 68 6E
:
- Read the 0A → Field Number 1 + Type Length Delimited
- Read 04 → Length 4 bytes
- Read the next 4 bytes →
"John"
🌶️ What If There Are Two Strings?
Person {
string first_name = 1;
string last_name = 2;
}
Input:
{
"first_name": "John",
"last_name": "Doe"
}
JSON Size Breakdown
Given JSON:
{
"first_name": "John",
"last_name": "Doe"
}
Now let’s calculate the actual byte size character by character.
JSON is stored in UTF-8 encoding.
✅ Each character = 1 byte (in UTF-8 for English letters)
Breakdown:
Part | Bytes | Explanation |
---|---|---|
{ | 1 | Opening brace |
"first_name" | 12 | "first_name" (10 letters + 2 quotes) |
: | 1 | Colon |
"John" | 6 | "John" (4 letters + 2 quotes) |
, | 1 | Comma |
"last_name" | 11 | "last_name" (9 letters + 2 quotes) |
: | 1 | Colon |
"Doe" | 5 | "Doe" (3 letters + 2 quotes) |
} | 1 | Closing brace |
Total = 38 bytes 🔥 (Not 16 bytes 😏)
Now ProtoBuf 🔥
Proto File:
Person {
string first_name = 1;
string last_name = 2;
}
Serialized Data:
0A 04 4A 6F 68 6E 12 03 44 6F 65
✅ How to Decode This:
Byte | Meaning |
---|---|
0A | Field 1 → first_name (Length-Prefixed String) |
04 | Length = 4 bytes |
4A 6F 68 6E | "John" |
12 | Field 2 → last_name (Length-Prefixed String) |
03 | Length = 3 bytes |
44 6F 65 | "Doe" |
Total = 11 bytes 🔥🔥
Why This Huge Difference?
Feature | JSON | ProtoBuf |
---|---|---|
Field Name | ✅ Stored | ❌ Not Stored (Only Field Number) |
Length | ❌ Not Stored | ✅ Stored |
Quotes | ✅ | ❌ |
Curly Braces | ✅ | ❌ |
The Formula:
👉 JSON = Key + Colon + Quotes + Value
👉 ProtoBuf = Field Number + Length + Value
Final Size Comparison
Format | Size |
---|---|
JSON | 38 bytes 🔥 (with keys) |
ProtoBuf | 11 bytes 🚀 |
Conclusion
✅ If your JSON has long keys + small values → ProtoBuf will crush JSON.
❌ If your JSON has small keys + long values → The size difference won’t be that much.