Efficient Data Replication Using Merkle Trees for Node Synchronization

by marjavamitjava · November 8, 2024

Here’s a simple example of how Merkle trees work for data replication in a scenario where the root hash of the two nodes differs due to missing or changed keys in one of the buckets.

Scenario:

We have two nodes, Node 1 and Node 2, with the following buckets:

Node 1 has:
- Bucket 1: a, b, c
- Bucket 2: d, e, f
Node 2 has:
- Bucket 1: a, b, c (same as Node 1)
- Bucket 2: d, e, g

Step 1: Construct Merkle Trees for Both Nodes

The Merkle trees for both Bucket 1 and Bucket 2 will be constructed as previously discussed.

Merkle Tree for Bucket 1 on Node 1 and Node 2:

Both trees will be identical because the contents of Bucket 1 (a, b, c) are the same on both nodes.

Merkle Tree for Bucket 2 on Node 1:

plaintextCopy code                       Root1B (H(def))
                      /            \
                H(de)             H(f)
               /    \
            H(d)     H(e)

Merkle Tree for Bucket 2 on Node 2:

plaintextCopy code                       Root2B (H(deg))
                      /            \
                H(de)             H(g)
               /    \
            H(d)     H(e)

Step 2: Compare Merkle Roots

Root1A and Root2A for Bucket 1 are the same, so no data needs to be replicated for Bucket 1.
Root1B and Root2B for Bucket 2 are different, indicating a mismatch between Bucket 2 on the two nodes.

Step 3: Compare Subtrees for Bucket 2

H(de) is the same on both Node 1 and Node 2, so the keys d and e are identical.
The difference is in the right subtrees:
- Node 1 has H(f) (key f).
- Node 2 has H(g) (key g).

Since Node 2 has key g and Node 1 has key f, these are the keys that need to be merged.

Step 4: Merge and Replicate Data

In Bucket 2, we merge the keys from both nodes:

Node 1 has d, e, f.
Node 2 has d, e, g.

The final merged state of Bucket 2 on Node 2 after replication will be:

Bucket 2 (after merge): d, e, f, g

Replicated Data: Key f will be copied from Node 1 to Node 2, and key g remains unchanged on Node 2.

Final Result After Replication

After replication, both nodes will have:

Bucket 1: a, b, c (no changes)
Bucket 2: d, e, f, g (merged data from both nodes)

Summary:

No Change for Bucket 1: The keys in Bucket 1 (a, b, c) were identical, so no replication was needed.
Bucket 2 Merged: The keys f and g from Bucket 2 were merged. Key f was replicated from Node 1 to Node 2, and key g remained unchanged on Node 2.

This ensures that both nodes now have the complete set of keys (d, e, f, g) in Bucket 2, with only the missing keys replicated, and no unnecessary data transfer occurred for unchanged keys.