Ethereum Clients' Node Syncing Methods

An overview of the syncing methods used by Ethereum Clients, Which syncing strategy is better - Fast or Snap Sync, Execution clients and its Syncing strategies and Key Points on the Syncing processes.

21 Jul 2022

•

7 Min

By: Priyam Keshwa

A node is a computer running Ethereum client software. A client is an implementation of Ethereum that verifies all transactions in each block. Until The Merge, a single piece of software is required to run a full node but after The Merge, two pieces of client software are required to run a full node, one to handle and gossip transactions - execution client and one to handle block gossip and fork choice - the consensus client.

Content

What are syncing methods ?
Types of Syncing Methods
Why use Snap Sync rather than Fast Sync?
Execution Clients and its Syncing Strategies
Key points on the Syncing Process

What are syncing methods ?

Synchronization refers to how quickly it can get the most up-to-date information on Ethereum's state.

During the synchronization, the goal of nodes is to store and propagate the blockchain. Therefore getting a valid copy is the most critical part of running a node. To do this on Ethereum you need to validate the blockchain headers and download a copy of the state trie.

Types of Syncing Methods

Here are the ways to sync a node on the Ethereum Blockchain:

Full sync

Full sync are for the nodes that stores the full blockchain data. It downloads every transaction ever made on the network and computes the state.

Full sync serves the network and provides data on request.

The Full Sync strategy is to execute each block since genesis. There is a starting genesis state (account balances, contract bytecodes, storage slots, etc). Then each block reads the previous state and writes new state, verifying the new state root on the header. Full sync is painfully slow on mainnet, and the time to finish full sync will increase without bound as the network gets older. So “Fast” Sync was born.

Fast Sync

This synchronization method downloads the block headers and a recent state trie from peers who are already present on the ethereum network. This will only validate state transitions after it has been synced to a certain block height.

Before Fast Sync can execute the launch block, it needs the state's bytecode, accounts and contract storage. Transactions can read any of these values during execution. Therefore, Fast Sync requests a snapshot of the state just before the start block from the peer. The snapshot is referenced by a state root hash, which is a Merkle tree root hash of the entire state.

Warp Sync

Warp Sync is an extension to the Ethereum Wire protocol, which involves sending snapshots over the network to get the full state at a given block extremely quickly, and then filling in the blocks between the genesis and the snapshot in the background.

Warp Sync is a subprotocol built on the DevP2P networking layer, with 3-byte identification.

Warp sync relies on static snapshots taken every 30000 blocks. This means that the serving node will have to regenerate the snapshot every 5 days, but repeating the entire state trial can actually take longer. This means that warp synchronization is not sustainable in the long run.

Warp Sync was used by Parity.

Snap Sync

The snap protocol runs on top of RLPx, facilitating the exchange of Ethereum state snapshots between peers. RLP is a TCP-based transport protocol used for communication among Ethereum nodes.

The current version of the snap protocol is snap/1.

The snap protocol is designed for semi real-time data retrieval. The main goal is to make dynamic snapshots of recent states available for peers and to serve nodes that provides fast iterable access to the state of the most recent N blocks. The snap protocol does not take part in chain maintenance and it is meant to be run on par with the eth protocol.

The essence of the snapshot synchronization or Snap Sync is making contiguous ranges of accounts and storage slots available for remote retrieval. The sort order is the same as the state trie iteration order, which makes it possible to request N subsequent accounts.

Important properties of Snap Sync:

It is the fastest and default sync mode of Geth.
Compared to fast sync, we only need to transfer the useful leaf data from the state trie and can reconstruct internal nodes locally.
Compared to warp sync, we can download small chunks of accounts and storage slots and immediately verify their Merkle proofs, making junk attacks impossible. Here the synchronization concurrency is totally dependent on client implementation and is not forced by the protocol.

Beam Sync - A new Syncing method

Beam Sync is an evolution of FastSync. The main difference is that Beam Sync starts executing the start block and requests only the missing state data from the local database. Input and output states are saved locally. Beam Sync then proceeds to the next block, repeating the process and requesting the missing data as needed.

Over time, less and less data will be lost. If you do not access the state, the client will not request it. Therefore, run another process in the background to fill these gaps. Finally, this backfill process allows Beam Sync to populate the local database with all state data and the node to switch to Full Sync.

Nethermind also used the Beam Synchronization.

Archive Nodes

It stores everything kept in the full node and builds an archive of historical states. It is necessary if you want a query like an account balance at a certain block or simply and reliably test your own transactions set without mining them using OpenEthereum.

These data are units of terabytes which makes archive nodes less attractive for average users but can be handy for services like block explorers, wallet vendors, and chain analytics.

Syncing the clients in any mode other than archive will result in pruned blockchain data. This means that there is no archive of all historical states but the full node is able to build them on demand.

Light Nodes

Light nodes only downloads the blockheaders. These headers contain information about the contents of the block. If a light node requires any information, it can taken from the full nodes.

The light node can then independently verify the data they receive against the state roots in the block headers. Light nodes enable users to participate in the Ethereum network without the powerful hardware or high bandwidth required to run full nodes. Eventually, light nodes might run on mobile phones or embedded devices. The light nodes do not participate in consensus (i.e. they cannot be miners/validators), but they can access the Ethereum blockchain with the same functionality as a full node.

Why use Snap Sync rather than Fast Sync?

An operator can choose to run Fast Sync when setting up the node, which downloads the state associated with the last 64 blocks that currently comes at about 90Gb. This can still take more than 24 hours in a fast machine. With Snap Sync, the sync time is reduced to 2-3h with a download of 30Gb.
This reduction in sync time and download size has to do with the specific way in which Ethereum’s state is stored in a node: Merkle trees

Source

With Fast Sync, a node downloads the headers of each block and retrieves all the nodes beneath it until it reaches the leaves. On the contratry, Snap Sync only downloads the leaf nodes, generating the remaining nodes locally which saves time and packets downloaded.

In the case of fast sync, the main problem was latency, caused by Ethereum’s data model.

Fast sync spends a whopping 6.3 hours doing nothing, just waiting for data:

If you have an above average network link
If you have a good number of serving peers
If your peers don’t serve anyone else but you

Snap sync was designed to solve all three mentioned problems. The idea is fairly simple: instead of downloading the trie node-by-node, snap sync downloads the contiguous chunks of useful state data, and reconstructs the Merkle trie locally:

Without downloading intermediate Merkle trie nodes, state data can be fetched in large batches, removing the delay caused by network latency.
Without downloading Merkle nodes, downstream data drops to half; and without addressing each piece of data individually, upstream data gets insignificant, removing the delay caused by bandwidth.
Without requesting randomly keyed data, peers do only a couple contiguous disk.

Here are some of the basic differences put on a table:

Execution Clients and its Syncing Strategies

Here are some of the execution clients and its syncing strategies:

Key points on the Syncing Process

The syncing process is very long and can take upto 48-72 hours.
The syncing speed depends on your internet speed and writing speed of your storage device.
As the data are stored in blocks and linked together, corruption in one block can corrupt the whole chaindata. Therefore, it is very important to shut down your Geth properly.
The progress bar on your Ethereum Wallet is NOT accurate.

References

Author

Priyam Keshwa

Ethereum Clients' Node Syncing Methods

Content

What are syncing methods ?