Lesson 4 of 510 min

Merkle Trees

A Merkle tree hashes pairs of data repeatedly up to a single root — letting you prove a specific item is in a large dataset by sharing only log₂(n) hashes, not all n items.

Why this matters

Merkle trees are what enable lightweight Bitcoin wallets (SPV clients) to exist. Without them, every wallet would need to download the full blockchain (~600 GB). With them, a mobile wallet can cryptographically verify a payment with a handful of hashes. They also underpin Git's content-addressed storage, Ethereum's state proofs, and many ZK proof systems.

The Intuition

Bitcoin blocks contain thousands of transactions. A light wallet wants to know if a specific transaction is in a block — but can't download 100+ MB per block. A Merkle tree solves this:

Hash every transaction. Then hash each pair of hashes together. Then hash each pair of those. Continue until one hash remains — the Merkle root, stored in the block header. Now you can prove a transaction is in the block by providing only log₂(n) sibling hashes along the path to the root. For 1,024 transactions, that's 10 hashes instead of 1,024.

See it concretely

Concrete example

Imagine a library with 1,024 books. You want to prove a specific book is in the library without carrying all 1,024 books. Solution: assign each book a summary label encoding its contents. Then combine adjacent labels into summaries of summaries, all the way up to one master summary posted at the library entrance.

To prove 'Book #537 is here,' you only need to show: Book #537's label, its pair's label, the pair-of-pairs label, and so on — 10 items total instead of 1,024. The master summary at the entrance lets anyone verify your proof instantly.

⚠

Tempting — but wrong

The precise version

For n transactions T₁…Tₙ:

Leaf hashes: hᵢ = SHA256(SHA256(Tᵢ))
Internal nodes: h_{ij} = SHA256(SHA256(hᵢ ∥ hⱼ))
Continue pairing until one root remains. If n is odd, the last hash is duplicated.

A Merkle proof (SPV proof) for transaction Tᵢ consists of the sibling hash at each level of the tree — exactly ⌈log₂(n)⌉ hashes.

Verification: recompute the root bottom-up from Tᵢ and the proof hashes. If it matches the block header's Merkle root, Tᵢ is provably in the block.

Scaling: 10 hashes prove membership in 1,024 txs. 20 hashes for 1,048,576 txs. 30 hashes for 1 billion txs. Logarithmic scaling is the key insight.

h_{ij} = \text{SHA256}^2\!\left(h_i \,\|\, h_j\right)

Check your understanding

How many hashes are needed to prove a transaction is in a block of 1,024 transactions?

Click to reveal answer

What is an SPV (Simplified Payment Verification) client?

Click to reveal answer

Why does Bitcoin use SHA256(SHA256(data)) — double hashing — for Merkle tree nodes?

Click to reveal answer

Before moving on

I understand how Merkle trees enable logarithmic membership proofs
I know what an SPV proof is and why light wallets need it
I can explain what the Merkle root proves — and what it doesn't

?Checkpoint

A block contains 8,192 transactions. How many hashes does a Merkle proof require to verify one specific transaction is included?

Previous Next lesson