Merkle Trees
A Merkle tree hashes pairs of data repeatedly up to a single root — letting you prove a specific item is in a large dataset by sharing only log₂(n) hashes, not all n items.
Merkle trees are what enable lightweight Bitcoin wallets (SPV clients) to exist. Without them, every wallet would need to download the full blockchain (~600 GB). With them, a mobile wallet can cryptographically verify a payment with a handful of hashes. They also underpin Git's content-addressed storage, Ethereum's state proofs, and many ZK proof systems.
The Intuition
Bitcoin blocks contain thousands of transactions. A light wallet wants to know if a specific transaction is in a block — but can't download 100+ MB per block. A Merkle tree solves this:
Hash every transaction. Then hash each pair of hashes together. Then hash each pair of those. Continue until one hash remains — the Merkle root, stored in the block header. Now you can prove a transaction is in the block by providing only log₂(n) sibling hashes along the path to the root. For 1,024 transactions, that's 10 hashes instead of 1,024.
See it concretely
Imagine a library with 1,024 books. You want to prove a specific book is in the library without carrying all 1,024 books. Solution: assign each book a summary label encoding its contents. Then combine adjacent labels into summaries of summaries, all the way up to one master summary posted at the library entrance.
To prove 'Book #537 is here,' you only need to show: Book #537's label, its pair's label, the pair-of-pairs label, and so on — 10 items total instead of 1,024. The master summary at the entrance lets anyone verify your proof instantly.
Tempting — but wrong
The precise version
For n transactions T₁…Tₙ:
Leaf hashes: hᵢ = SHA256(SHA256(Tᵢ))
Internal nodes: h_{ij} = SHA256(SHA256(hᵢ ∥ hⱼ))
Continue pairing until one root remains. If n is odd, the last hash is duplicated.
A Merkle proof (SPV proof) for transaction Tᵢ consists of the sibling hash at each level of the tree — exactly ⌈log₂(n)⌉ hashes.
Verification: recompute the root bottom-up from Tᵢ and the proof hashes. If it matches the block header's Merkle root, Tᵢ is provably in the block.
Scaling: 10 hashes prove membership in 1,024 txs. 20 hashes for 1,048,576 txs. 30 hashes for 1 billion txs. Logarithmic scaling is the key insight.
h_{ij} = \text{SHA256}^2\!\left(h_i \,\|\, h_j\right)Check your understanding
How many hashes are needed to prove a transaction is in a block of 1,024 transactions?
Click to reveal answer
What is an SPV (Simplified Payment Verification) client?
Click to reveal answer
Why does Bitcoin use SHA256(SHA256(data)) — double hashing — for Merkle tree nodes?
Click to reveal answer
- I understand how Merkle trees enable logarithmic membership proofs
- I know what an SPV proof is and why light wallets need it
- I can explain what the Merkle root proves — and what it doesn't
A block contains 8,192 transactions. How many hashes does a Merkle proof require to verify one specific transaction is included?