• Uncategorised
  • 0

How Protobuf Stores Data in Binary Format

Unlike JSON, Protocol Buffers (Protobuf) use a compact binary format that removes redundant information. Here’s how data is structured and stored efficiently.


1️⃣ Protobuf Message Structure

Let’s take a JSON-like structure:

{
  "id": 123,
  "name": "John Doe",
  "email": "[email protected]"
}

In Protobuf, we define this as:

message User {
  int32 id = 1;
  string name = 2;
  string email = 3;
}

Each field has:
Type (int32, string, etc.)
Field Number (1, 2, 3, etc.)


2️⃣ How Protobuf Stores Data in Binary Format

Protobuf does not store keys as strings (like JSON). Instead, it encodes data in a compact format using:
1️⃣ Field numbers (instead of field names).
2️⃣ Varint encoding (for numbers, making small numbers smaller).
3️⃣ Length-prefix encoding (for strings and byte arrays).

Let’s say we encode:

User { id = 123, name = "John Doe", email = "[email protected]" }

Binary Encoding Process (Efficient Storage Format)

FieldField NumberWire TypeEncoded Value
id = 1231Varint08 7B
name = "John Doe"2Length-prefixed12 08 4A 6F 68 6E 20 44 6F 65
email = "[email protected]"3Length-prefixed1A 12 6A 6F 68 6E 40 65 78 61 6D 70 6C 65 2E 63 6F 6D

4️⃣ Breakdown of Binary Representation

  • Field Key = (Field Number << 3) | Wire Type
    • Example: id = 123
      • Field Number: 1 (shifted left: 1 << 3 = 8)
      • Wire Type: 0 (Varint)
      • Encoded as: 08 (1st byte)
      • 123 is stored as: 7B (in Varint encoding)
      • Final: 08 7B
  • Strings Are Length-Prefixed
    • "John Doe"
      • Field Key: 2 << 3 | 212
      • Length: 8
      • UTF-8 Bytes: 4A 6F 68 6E 20 44 6F 65
      • Final: 12 08 4A 6F 68 6E 20 44 6F 65

5️⃣ Why Is This More Efficient Than JSON?

FeatureJSONProtobuf
KeysStored as full strings ("id", "name")Uses field numbers (1, 2)
NumbersStored as ASCII (wastes space)Uses Varint encoding (smaller size)
StringsStored as UTF-8 with quotesLength-prefixed (removes overhead)
ParsingSlower (requires key lookup)Faster (fixed positions)
SizeLarger due to redundant key namesSmaller (binary format)

6️⃣ Real Example: Comparing JSON vs Protobuf Encoded Data

JSON Representation (58 bytes)

{"id":123,"name":"John Doe","email":"[email protected]"}
  • Keys take space: "id", "name", "email"
  • Numbers stored as strings (less efficient)

Protobuf Binary Representation (28 bytes)

08 7B 12 08 4A 6F 68 6E 20 44 6F 65 1A 12 6A 6F 68 6E 40 65 78 61 6D 70 6C 65 2E 63 6F 6D
  • Field numbers replace key names
  • Compact encoding for numbers and strings

7️⃣ Conclusion: Why gRPC & Protobuf Are Efficient

Smaller size → No redundant key names
Faster serialization → Direct binary format
Less CPU usage → Efficient parsing with field numbers
Optimized for network communication

This is why gRPC is preferred over JSON-based REST APIs for high-performance systems! 🚀

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *