Length-Prefixed Strings in Protocol Buffers — Deep Dive

by marjavamitjava · February 28, 2025

Length-prefixed strings are one of the key reasons why Protocol Buffers (ProtoBuf) are smaller and faster than JSON or XML.

In JSON or XML, strings are wrapped in quotes and require closing tags or delimiters — making them verbose.

🔑 What is a Length-Prefixed String?

A length-prefixed string means: 👉 First, store the length of the string (number of bytes)
👉 Then, store the actual string bytes

It eliminates the need for closing tags or delimiters — making serialization more compact.

🔥 Example Walkthrough

Let’s assume we have this simple data:

JSON Example:

{
  "name": "John"
}

✅ JSON Breakdown:

Part	Bytes	Explanation
`"name"`	6	Key with quotes + colon
`"John"`	6	Value with quotes
Total	12 bytes

ProtoBuf Example:

In ProtoBuf, the same data would look like:

Person {
  string name = 1;
}

When the value "John" is serialized, it will be:

Field No.	Wire Type	Length	UTF-8 Bytes
`0A`	Length-delimited (Wire Type 2)	`04`	`4A 6F 68 6E` (UTF-8 for “John”)

How it Works:

0A → (Field 1 + WireType 2)
04 → Length of "John" (4 bytes)
4A 6F 68 6E → "John" in UTF-8

✅ Total Size → 6 bytes 🔥 (50% smaller than JSON)

How is This More Efficient?

Feature	JSON	ProtoBuf
String Storage	`"John"` (6 bytes)	`04 4A 6F 68 6E` (6 bytes, including length)
Key Storage	`"name"` (6 bytes)	`0A` (1 byte field + wire type)
Total Size	12 bytes	6 bytes 🔥
Extra Overhead	Quotes, Colons	None

🔍 What Happens If String is Empty?

If "name" is empty:

JSON → "name": "" → 9 bytes
ProtoBuf → 0A 00 → 2 bytes

✅ How Length Prefix Helps:

String Length	ProtoBuf Size	JSON Size
0 (Empty)	2 bytes	9 bytes
4 (“John”)	6 bytes	12 bytes
50	52 bytes	60+ bytes
1000	1002 bytes	1000+ bytes

How Does ProtoBuf Decode the String?

When the client receives 0A 04 4A 6F 68 6E:

Read the 0A → Field Number 1 + Type Length Delimited
Read 04 → Length 4 bytes
Read the next 4 bytes → "John"

🌶️ What If There Are Two Strings?

Person {
  string first_name = 1;
  string last_name = 2;
}

Input:

{
  "first_name": "John",
  "last_name": "Doe"
}

JSON Size Breakdown

Given JSON:

{
  "first_name": "John",
  "last_name": "Doe"
}

Now let’s calculate the actual byte size character by character.

JSON is stored in UTF-8 encoding.

✅ Each character = 1 byte (in UTF-8 for English letters)

Breakdown:

Part	Bytes	Explanation
`{`	1	Opening brace
`"first_name"`	12	`"first_name"` (10 letters + 2 quotes)
`:`	1	Colon
`"John"`	6	`"John"` (4 letters + 2 quotes)
`,`	1	Comma
`"last_name"`	11	`"last_name"` (9 letters + 2 quotes)
`:`	1	Colon
`"Doe"`	5	`"Doe"` (3 letters + 2 quotes)
`}`	1	Closing brace

Total = 38 bytes 🔥 (Not 16 bytes 😏)

Now ProtoBuf 🔥

Proto File:

Person {
  string first_name = 1;
  string last_name = 2;
}

Serialized Data:

0A 04 4A 6F 68 6E 12 03 44 6F 65

✅ How to Decode This:

Byte	Meaning
`0A`	Field 1 → first_name (Length-Prefixed String)
`04`	Length = 4 bytes
`4A 6F 68 6E`	`"John"`
`12`	Field 2 → last_name (Length-Prefixed String)
`03`	Length = 3 bytes
`44 6F 65`	`"Doe"`

Total = 11 bytes 🔥🔥

Why This Huge Difference?

Feature	JSON	ProtoBuf
Field Name	✅ Stored	❌ Not Stored (Only Field Number)
Length	❌ Not Stored	✅ Stored
Quotes	✅	❌
Curly Braces	✅	❌

The Formula:

👉 JSON = Key + Colon + Quotes + Value
👉 ProtoBuf = Field Number + Length + Value

Final Size Comparison

Format	Size
JSON	38 bytes 🔥 (with keys)
ProtoBuf	11 bytes 🚀

Conclusion

✅ If your JSON has long keys + small values → ProtoBuf will crush JSON.
❌ If your JSON has small keys + long values → The size difference won’t be that much.