• Uncategorised
  • 0

Length-Prefixed Strings in Protocol Buffers — Deep Dive

Length-prefixed strings are one of the key reasons why Protocol Buffers (ProtoBuf) are smaller and faster than JSON or XML.

In JSON or XML, strings are wrapped in quotes and require closing tags or delimiters — making them verbose.


🔑 What is a Length-Prefixed String?

A length-prefixed string means: 👉 First, store the length of the string (number of bytes)
👉 Then, store the actual string bytes

It eliminates the need for closing tags or delimiters — making serialization more compact.


🔥 Example Walkthrough

Let’s assume we have this simple data:

JSON Example:

{
"name": "John"
}

✅ JSON Breakdown:

PartBytesExplanation
"name"6Key with quotes + colon
"John"6Value with quotes
Total12 bytes

ProtoBuf Example:

In ProtoBuf, the same data would look like:

Person {
string name = 1;
}

When the value "John" is serialized, it will be:

Field No.Wire TypeLengthUTF-8 Bytes
0ALength-delimited (Wire Type 2)044A 6F 68 6E (UTF-8 for “John”)

How it Works:

  • 0A → (Field 1 + WireType 2)
  • 04 → Length of "John" (4 bytes)
  • 4A 6F 68 6E"John" in UTF-8

✅ Total Size → 6 bytes 🔥 (50% smaller than JSON)


How is This More Efficient?

FeatureJSONProtoBuf
String Storage"John" (6 bytes)04 4A 6F 68 6E (6 bytes, including length)
Key Storage"name" (6 bytes)0A (1 byte field + wire type)
Total Size12 bytes6 bytes 🔥
Extra OverheadQuotes, ColonsNone

🔍 What Happens If String is Empty?

If "name" is empty:

  • JSON → "name": ""9 bytes
  • ProtoBuf → 0A 002 bytes

✅ How Length Prefix Helps:

String LengthProtoBuf SizeJSON Size
0 (Empty)2 bytes9 bytes
4 (“John”)6 bytes12 bytes
5052 bytes60+ bytes
10001002 bytes1000+ bytes

How Does ProtoBuf Decode the String?

When the client receives 0A 04 4A 6F 68 6E:

  1. Read the 0A → Field Number 1 + Type Length Delimited
  2. Read 04 → Length 4 bytes
  3. Read the next 4 bytes"John"

🌶️ What If There Are Two Strings?

Person {
string first_name = 1;
string last_name = 2;
}

Input:

{
"first_name": "John",
"last_name": "Doe"
}

JSON Size Breakdown

Given JSON:

{
"first_name": "John",
"last_name": "Doe"
}

Now let’s calculate the actual byte size character by character.


JSON is stored in UTF-8 encoding.

✅ Each character = 1 byte (in UTF-8 for English letters)


Breakdown:

PartBytesExplanation
{1Opening brace
"first_name"12"first_name" (10 letters + 2 quotes)
:1Colon
"John"6"John" (4 letters + 2 quotes)
,1Comma
"last_name"11"last_name" (9 letters + 2 quotes)
:1Colon
"Doe"5"Doe" (3 letters + 2 quotes)
}1Closing brace

Total = 38 bytes 🔥 (Not 16 bytes 😏)



Now ProtoBuf 🔥

Proto File:

Person {
string first_name = 1;
string last_name = 2;
}

Serialized Data:

0A 04 4A 6F 68 6E 12 03 44 6F 65

✅ How to Decode This:

ByteMeaning
0AField 1 → first_name (Length-Prefixed String)
04Length = 4 bytes
4A 6F 68 6E"John"
12Field 2 → last_name (Length-Prefixed String)
03Length = 3 bytes
44 6F 65"Doe"

Total = 11 bytes 🔥🔥



Why This Huge Difference?

FeatureJSONProtoBuf
Field Name✅ Stored❌ Not Stored (Only Field Number)
Length❌ Not Stored✅ Stored
Quotes
Curly Braces

The Formula:

👉 JSON = Key + Colon + Quotes + Value
👉 ProtoBuf = Field Number + Length + Value



Final Size Comparison

FormatSize
JSON38 bytes 🔥 (with keys)
ProtoBuf11 bytes 🚀


Conclusion

✅ If your JSON has long keys + small values → ProtoBuf will crush JSON.
❌ If your JSON has small keys + long values → The size difference won’t be that much.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *