Geth v1.13 is a new release that follows closely after the previous 1.12 release. The main feature of this release, which has been in development for 6 years, is a new database model for storing the Ethereum state. This new model is faster than the previous one and includes proper pruning, preventing junk accumulation and enabling offline pruning. However, around 589GB of ancient data is excluded from this new model to avoid issues with storage capacity. The hash scheme used for full sync exceeded the capacity of a 1.8TB SSD at block ~15.43M. The difference in size between snap sync and the new model is due to compaction overhead.
The development of this new data model has been a significant effort, led by Gary Rong over the past 2 years. The old way of storing the Ethereum state did not allow for efficient pruning, leading to the accumulation of junk in the database. To implement proper pruning without leaving any junk behind, several changes had to be made to Geth’s codebase.
The new data model stores state trie nodes based on their path instead of their hash. This change allows for the storage of branches with the same content separately, solving the deduplication issue. Additionally, multiple state tries can now be stored in the database, introducing a different form of deduplication. The database is restricted to contain exactly one state trie at any given time, with the Genesis state being the initial trie and subsequent tries following the chain state as it progresses.
To handle reorganizations and side-chain switches, Geth maintains trie changes done in the last 128 blocks in memory. The persistent state is locked to a specific block to enable fast reorganizations within the top 128 blocks. A dirty cache is used to accumulate writes and optimize disk access. When the buffer gets full, the changes are flushed to disk. In case a deeper reorganization is needed, Geth can apply reverse diffs stored on disk to mutate the state back to an older version and switch to a different side-chain.
Due to the significant changes in Geth’s internals, Geth v1.13.0 offers two modes of operation. The old data model remains the default, ensuring compatibility for existing nodes. However, users can switch to the new data model by resyncing the state and specifying the path model using the –state.scheme=path flag. In the future, Geth v1.14.x may make the path model the default if no serious issues arise.
It’s important to note that if you are running private Geth networks using geth init, you need to specify –state.scheme for the init step to ensure you have the desired data model. Archive node operators can also use the new data model, which will bring… (the rest of the content is incomplete)