Non-finality, inactivity leak but Ethereum didn't stop!

Follow the recent incident on Ethereum blockchain. Plenty of block space when blocks were produced. Transactions were not affected. The network experienced high gas fees but nothing compared to an airdrop event.

Non-finality, inactivity leak but Ethereum didn't stop!

Ethereum, the second-largest cryptocurrency by market capitalization, has proven to be a remarkably resilient platform since its launch in 2015. Despite facing numerous challenges over the years, such as security breaches and network congestion, Ethereum has continued to thrive and evolve.

A day before one-month anniversary of Shapella Upgrade, an unidentified issue with the Ethereum Beacon Chain resulted in a 25-minute delay in transaction finality, starting approximately at 8:15 pm UTC. Despite the proposal of new blocks, an unknown issue prevented their finalization. In this blog post, we will follow the sequence of events and mitigations.

The state of Non-Finality

Since The Merge, Ethereum is having PoS consensus. Here, non-finality refers to the fact that the current state of the chain is not yet considered permanent or "final". In other words, there is still a possibility that some transactions or blocks on the chain may be reorganized or discarded in the future.

On May 11th, despite proposed blocks, Ethereum observed non-finality. The blockchain recorded a significant drop in the number of attestations from Ethereum epochs 200,552 to 200,554. The root cause of the problem stemmed from an influx of attestations for previous epochs, received by consensus clients, particularly Prysm nodes. These attestations did not correspond with the latest checkpoint in the fork choice.

This situation is usually a result of peered nodes lacking access to all the blocks for the remainder of the epoch. Consequently, Prysm had to expend significant resources to replay the state, resulting in CPU spikes and out-of-memory (OOM) issues. This situation is often referred to as a "death spiral."

Consensus clients are responsible for processing transactions and maintaining the state of the blockchain. They have a key role in attesting and proposing blocks. However, during this incident, the strain on these consensus clients were considerable high. Attestation from nodes were struggling to synchronize or having an outdated view of the chain were broadcast across the network, increasing the pressure on consensus clients.

Nodes, especially those operating on weaker hardware, struggled to keep up with the chain, leading to the non-finalization of transactions. Consensus clients had to recreate the beacon state multiple times to validate these attestations, leading to high CPU usage and memory bloat. This issue was especially noticeable in Prysm, where a cache designed to handle this situation quickly filled up due to an increase in validator sizes and the volume of untimely attestations.

First Inactivity Leak

During this incident, the Ethereum mainnet experienced its first-ever inactivity leak. This is an emergency state that alters the rewards and penalties for validators. It is triggered when the beacon chain fails to finalize a checkpoint for more than four epochs.

The inactivity leak is a safeguard mechanism designed to reinstate finality if over a third of the validators become offline. It operates by progressively diminishing the stakes of inactive validators until the active validators hold two-thirds of the remaining stake. Once this threshold is met, checkpoints can be finalized once again.

Mitigation Plan

During the "death spiral," a subtle bug was discovered in Prysm nodes where they were not utilizing the correct state for shuffling computation. In response to this issue, Ethereum developers have proposed several solutions. These suggestions include optimizing the caching scheme and implementing heuristics to filter out untenable attestations.

Moreover, a corrective measure is under development to enhance Prysm's resilience to similar incidents in the future. This solution involves disregarding attestations that are known to target older epochs and have not served as a checkpoint in any chain known to their node.

Client Diversity in Ethereum

This incident highlights the importance of client diversity in a decentralized network like Ethereum. Had the bug observed in any other client, the loss of finality possibly could have been gone unnoticed. But, the dominance of Prysm like Geth (Execution client) could have made it even worse.

Screenshot-2023-05-13-at-3.03.26-PM

Source

Client diversity contributes to the overall resilience of the network. It diminishes the risk of the entire network collapsing due to an issue with a single client and aids in preserving the decentralized nature of the network. Thus, it's an essential element of the network's long-term sustainability and security.

Next Steps

The developers are committed to rectifying the problem and preventing similar incidents from occurring in the future. They are currently drafting a document to detail the incident and the measures taken to resolve it.

In addition, an update is anticipated to be released soon. This will include several improvements to address the issues that arose during the incident. These improvements include the previously mentioned optimizations to the caching scheme, the application of heuristics to filter out unfeasible attestations, and a fix to enhance the resilience of Prysm.

Although the incident caused a temporary disruption, it didn't cripple the Ethereum network. The developers are diligently working on solutions and optimizations to avoid similar problems in the future. The inactivity leak that transpired during this incident is a crucial part of the network's resilience mechanism, designed to assure long-term chain protection in the event of catastrophic circumstances.

Credits: @superphiz, @potuz1, @benjaminion_xyz, @terencechain
& @preston_vanloon

Related Videos

Cover image: Original photo by Brian Suman modified by team EtherWorld.

______________________________________________________________________

Disclaimer: The information contained on this web page is for education purposes only. Readers are suggested to conduct their own research, review, analyze and verify the content before relying on them.

To publish press releases, project updates and guest posts with us, please email at contact@etherworld.co.

Subscribe to EtherWorld YouTube channel for ELI5 content.

Support us at Gitcoin

You've something to share with the blockchain community, join us on Discord!

Follow us at Twitter, Facebook, LinkedIn, and Instagram.


Share Tweet Send
0 Comments
Loading...
You've successfully subscribed to EtherWorld.co
Great! Next, complete checkout for full access to EtherWorld.co
Welcome back! You've successfully signed in
Success! Your account is fully activated, you now have access to all content.