Ethereum state tree format change using an overlay – Guillaume Ballet

0 1


One of the many problems affecting Ethereum is the way accounts and contract data are stored. The structure that has been chosen is called a Merkle Patricia Tree (MPT). Even though it made a lot of theoretical sense, in practice it brings more problems than it solves. Core developers have been discussing transitioning to a binary tree for years, and in this article I will present my take on the problem and how to solve it.

The proposed process introduces a transition period during which two trees are maintained. This has the advantage that the main chain can keep operating while the tree is being converted, and it also ensures that all accounts will be translated to a binary format.

At the moment, accounts are stored into a hexary tree. Hexary means that a node has 16 children. In theory this is nice, because that means that you need less “stages” to store all the data you have.

For example, this is what it takes to represent the key and value pair (170, v) in the form of a hexary tree. In hexadecimal, 170 is written 0xaa, so you just need two levels: one for the first a and another one for the second a.

Figure 1 An example hexary trie showing how value “v" is stored at key 0xaa. This tree only has 2-byte long keys and only the subtrees along the 0xaa key have been expanded. Irrelevant sub-trees are replaced with “…” for brevity.

Note that the tree is very shallow, and also very wide. Compare this to the following binary tree representation of the same key and value pair. In binary, 170 is represented as 10101010.

Figure 2 The same key value pair as Figure 1, stored as a binary tree. Irrelevant sub-trees are denoted as “…” for brevity.

You can see that the tree is much deeper and also much narrower.

In Ethereum, each block contains a stateRoot field, which is the value of the hash of the root of the MPT. To summarize it, that hash is obtained by hashing the list of hashes of the root’s 16 children. Each one of these children’s hashes is in turn the hash of the list of hashes of their children’s hashes, and so on.

Each time a new block is going to be produced, a miner updates the account tree and recalculates its root hash. The hash is stored in the stateRoot field of the new block, and the new block is then sealed.

Figure 3 The state root field of a block headers points to the root of the hexary tree.

The problem already appears here: it would be too long to recalculate the hash root by hashing all nodes, so in order to calculate the root node, the miner will retrieve the sibling hashes from the database. Even though it doesn’t take as much time as getting all the leaves from the database and hashing the entire tree, this operation still takes a lot of time. This is because each hash has to be fetched from the database.

In a hexary tree, you have in general to fetch 15 sibling hashes per stage. In the contrived example above, that’s 30 hashes.

Even though it goes deeper, a binary tree will only require one sibling hash per stage. In the contrived example above, that’s a mere 8 hashes! This is why in practice, binary trees are actually nicer.

Unfortunately, switching to a binary tree is not that simple. There is a lot of data to convert, and it will take more than the 15 second block time to perform the changes.
On top of that, imagine that you are translating a 5000 page book, and the author keeps calling you to tell you they made changes in the story, and that affects pages that you have already translated… this could go on forever. It’s the same problem here, since users could update addresses that you have already converted, which means you have to start the conversion process over.

The proposal to fix this is to have a transition period, during which an overlay tree is laid on top of the hexary tree. This overlay tree is a binary tree, and its role is to hold all the changes happening to the state, until the base tree is converted to binary. The transition takes place in three steps.

Step 1 — The conversion

In this approach, it is decided that at a block height H1, blocks have two stateRoots: one for the “base” hexary tree and one for the “overlay” binary tree.

Figure 4 During the conversion process, blocks have 2 state roots: one is the read-only root of the traditional hexary tree, and the second one is the root of the overlay binary tree.

The hexary tree is considered read-only, so any update to the state will be updates to the overlay tree.

When a transaction reads or updates an account, the system first searches the overlay tree. If the account can not be found there, the system will then search the old, hexary tree for the value.

Meanwhile, the hexary tree is being converted in the background. That can now be done without worrying about inserts, because all changes are stored in the top tree.

Step 2 — The base switch

When the background conversion process is done, miners will advertise they are ready to switch by replacing the read-only hexary base root with the result of the conversion: a read-only binary base root. Reading and writing to the state works the same as in step 1.

Figure 5 During the second phase of the transition, the block header replaces the hexary base root with that of its binary conversion, to signal the network that they are ready.

When a sufficiently large sequence blocks have the same value for the converted base root, this means most miners are done with the conversion and agree on what the converted tree looks like. The merge process starts.

Step 3 — Merging the two trees

The merge process happens gradually: each time a new block is produced, n keys are deleted from the overlay and reinserted into the base tree. This process will continue until all keys are removed from the overlay. At this stage, the overlay state root will be dropped from the block header.

On top of that, if a transaction execution writes to a key that is found in the overlay, this key will be deleted from the overlay and the write will directly go to the base tree.

Next steps

A low resolution prototype has already been created, in order to estimate the time needed to complete the transition. We are confident that the whole process can happen in a reasonable time frame, i.e. a couple days. I will publish more details as the algorithm is refined.

This proposal has benefited from the valuable input from Alexey Akhunov, Vitalik Buterin, Anna George, Sina Mahmoodi, Tomasz Stanczak and Martin H. Swende.

You might also like
WhatsApp chat