Aftermath of Holesky Testnet Incident: Lessons Learned

A deep dive into the Holesky incident, exploring its root causes, validator challenges, recovery efforts, & key lessons for improving Ethereum’s resilience.

Aftermath of Holesky Testnet Incident: Lessons Learned

The Holesky testnet experienced a significant disruption following the Pectra upgrade on February 24, 2025. This incident led to a network split, validator slashing, and an extended period of non-finalization.

The event exposed bugs in client implementations, coordination challenges among validators, and weaknesses in the mechanisms ensuring chain finality. As a result, developers, client teams, and node operators had to work extensively to restore network health, synchronize validators, and implement fixes to prevent future disruptions.

This blog post provides a detailed analysis of the incident, covering its root causes, mitigation efforts, and long-term implications for the Ethereum network.

Key Events Leading to the Incident

The Holesky incident was the result of several interrelated failures across both the execution layer (EL) and the consensus layer (CL). The misconfiguration of the deposit contract address, the failure of validators to synchronize, and the challenges in finalizing the correct chain all contributed to a cascading failure that took the network into an extended period of non-finalization.

Screenshot-2025-03-03-at-12.40.05-PM

Execution clients misconfigured the Holesky deposit contract address, causing some validators to accept an invalid block while others rejected it. The expected deposit contract address was 0x4242424242424242424242424242424242424242, but some clients either defaulted to 0x0000…0000 or mistakenly used the mainnet deposit contract address.

Due to invalid block processing, the network diverged into two chains. Erigon and Reth rejected the block and remained on the valid chain, while Geth, Nethermind, and Besu accepted the invalid block and formed a dominant, yet incorrect, chain. Since the majority of validators were on the invalid chain, consensus layer clients struggled to synchronize to the correct chain, leading to network instability.

The invalid chain reached justified status but failed to finalize, causing validators to enter a syncing loop. Validators who had attested to the invalid checkpoint faced "surround" slashing risks if they attempted to switch back to the correct chain. This created a negative feedback loop where few validators could sync and propose blocks, making it nearly impossible to recover finalization.

Execution layer issues included clients using different fork parameters, preventing proper syncing. Additionally, incorrect deposit contract handling created a split in validator attestation.

Consensus layer issues further exacerbated the problem. Nodes struggled to peer with the correct chain, validators who attested to the wrong chain faced potential slashing, and liveness dropped significantly, worsening network recovery.

Execution Layer Issues

The primary issue arose from a misconfigured deposit contract address.

  • Incorrect Deposit Contract Configuration: The deposit contract is a key mechanism that allows new validators to join the network. It must be correctly configured for each network.

On Holesky, the deposit contract address should have been: 0x4242424242424242424242424242424242424242.

However, several execution clients, including Geth, Nethermind, and Besu, either defaulted to an incorrect address (0x0000…0000) or mistakenly used the Mainnet deposit contract address. As a result, these clients misprocessed validator deposits, leading to an incorrect deposit request hash in block production.

  • Block Processing Errors & Network Split:

At slot 3711006 (block 3419724), an execution block containing a deposit transaction was proposed. Since some execution clients had incorrect configurations, they processed the deposit with an empty requests list.

This resulted in an incorrect block hash, which some clients rejected while others accepted, creating a network split. The impact of the split was significant. Erigon and Reth correctly rejected the invalid block and remained on the valid chain.

In contrast, Geth, Nethermind, and Besu accepted the block, leading to the formation of an invalid dominant chain. This divergence caused synchronization issues across the network, making it difficult for validators to align on a single canonical chain.

Consensus Layer Issues

The consensus layer, responsible for validator coordination and chain finality, encountered severe difficulties due to the execution layer split. Validators struggled to sync, slashing risks prevented them from switching chains, and finalization could not be achieved, leading to a prolonged period of network instability.

  • Validators Struggling to Sync to the Correct Chain: Because a majority of validators followed the incorrect chain, those on the correct chain had trouble finding peers. This made it difficult for new nodes to sync, further weakening the valid chain’s weight.

As a result, validators who wanted to participate in the correct chain faced significant obstacles, reducing overall network stability. In epoch 115968, the invalid chain gained justification, though it was not finalized.

This made the invalid chain appear stronger to many validators and clients, reinforcing the problem. Since consensus clients are designed to follow the chain with the most attestation weight, the invalid chain continued to attract validators, making recovery even more challenging.

  • Slashing Protection Prevented Validators from Switching to the Correct Chain: Validators who had attested to the invalid chain could not easily switch back to the correct chain due to slashing risks. The main risks included:
  1. Surround slashing: Validators who attested to an older checkpoint on the incorrect chain would be slashed if they attested to a new, correct chain.
  2. Double vote slashing: Validators signing conflicting chains risked penalties.

As a result, many validators remained inactive rather than risk slashing, causing low participation rates. This hesitation further delayed the network's ability to reach finalization.

  • Network Liveness & Finalization Failure: Finalization requires at least two-thirds (66%) of validators to agree on a valid chain. However, due to multiple factors, finalization could not be achieved:
  1. The invalid chain gained majority attestation weight.
  2. The correct chain had too few validators.
  3. The fear of slashing prevented validators from attesting.

Validators were left with two poor choices:

  1. Attempt to attest and risk slashing.
  2. Remain offline and hope the network stabilizes.

This self-reinforcing cycle led to an extended period of non-finalization, requiring manual coordination for recovery. Without external intervention, the network would have remained stalled, unable to finalize blocks or recover validator participation.

Network Recovery Strategy

Once the root causes were identified, Ethereum developers and client teams coordinated efforts to restore the network. The primary recovery strategy involved three major phases, each focusing on different aspects of stabilization and validator participation.

  • Phase 1: Immediate Recovery Efforts (February 25-26, 2025)
  1. Validator Coordination & Node Synchronization: Developers released updated client versions with corrected deposit contract addresses. Community members were instructed to restart validators and sync them to the correct chain. However, syncing to the correct chain proved difficult due to a lack of peers. Additionally, some clients experienced issues with state reconstruction, requiring manual intervention to restore network health.
  2. Increasing Block Production: To prevent chain stagnation, validators were encouraged to continue producing blocks, even if they could not safely attest. The goal was to increase the weight of the correct chain, allowing more validators to sync and join, thereby improving overall network participation.
  3. Disabling Slashing Protection: Validators who had attested to the invalid chain had to disable slashing protection to rejoin the correct chain. Client teams provided specific commands to disable slashing protection for execution and consensus layer clients, including Geth, Nethermind, Besu, Erigon, Lighthouse, Nimbus, Lodestar, Teku, and Prysm.
  • Phase 2: Coordinated Slashings (February 28, 2025 – Slot 3737760)

The primary objective was to slash validators still following the invalid chain, reduce the effective balance of validators on the incorrect chain to allow finalization to occur faster, and force inactive validators to exit, thereby improving the proportion of validators on the correct chain.

Screenshot-2025-03-03-at-7.54.31-PM
Source: dora-holesky.pk910.de

At slot 3737760 (February 28, 15:12 UTC), validators were instructed to disable slashing protection, triggering a mass slashing event. This removed a significant portion of the network that was still following the invalid chain. However, some validators refused to disable slashing protection out of caution.

An estimated 400,000 validators were slashed, leading to a rapid reduction in the required threshold for finalization. As slashed validators could no longer participate, the network’s recovery efforts gained momentum. While finalization remained unstable, the event marked a turning point in stabilizing the Holesky testnet.

  • Phase 3: Long-Term Stability & Lessons Learned (March 2025 & Beyond)

Improved validator recovery procedures were also discussed to allow validators to rejoin the correct chain more efficiently. Additionally, refinements in Ethereum’s slashing protection rules were proposed to avoid scenarios where slashing inadvertently caused network stagnation. Discussions began on whether to deprecate Holesky in favor of a new testnets.

Impact of Mass Slashings on Network Health

The decision to coordinate validator slashings at slot 3737760 on February 28, 2025, at 15:12 UTC was a calculated risk aimed at restoring network finalization. The outcomes of the mass slashings were mixed, with both positive effects and unintended consequences.

One of the key positive outcomes was the reduction in the weight of the invalid chain. By slashing validators that continued to attest to the incorrect chain, the proportion of validators on the correct chain increased. This made it easier for nodes to sync to the correct chain without getting stuck in forked states.

Screenshot-2025-03-03-at-7.52.16-PM
Source: dora-holesky.pk910.de

Additionally, validator exits were accelerated, as those on the incorrect chain lost their effective balance, leading to automatic exits. This helped lower Ethereum’s network finalization threshold, bringing the network closer to stability. The event also provided valuable insights into the effectiveness of mass slashings in restoring finalization.

However, there were significant negative consequences as well. Despite the slashings, finalization was not immediately achieved due to the large number of offline validators. The estimated time for finalization recovery was between 18 and 24 days, significantly delaying further Pectra testing.

There was also a concern about validator centralization, as large staking providers and institutional validators were heavily affected while many smaller independent validators exited. This raised concerns about increasing centralization risks in future testnets. Additionally, there was the potential for cascading failures—if slashings had been handled too aggressively, there was a risk that even honest validators could have been slashed, further destabilizing the network.

The incident provided essential lessons for managing similar scenarios in both testnets and potential mainnet disruptions.

Proposed Solutions for Similar Incidents

In response to the Holesky incident, Ethereum developers have proposed several key improvements to reduce the risk of similar failures in future testnets and mainnet upgrades. These solutions aim to enhance slashing protection, improve fork detection, and streamline validator exit and recovery mechanisms, ensuring greater network resilience.

  1. One of the primary areas of focus is enhancing slashing protection mechanisms. Developers have suggested implementing a "grace period" for validators recovering from a network failure, allowing them time to safely rejoin without immediate risk of slashing.
  2. Additionally, automated slash protection warnings could be introduced to help validators understand their risk level before rejoining the network, reducing unintended penalties.
  3. Another critical improvement involves smarter fork detection and chain selection. Enhancing automatic fork detection algorithms would enable clients to identify and reject invalid chains earlier, preventing validators from getting stuck on the wrong chain. This flexibility could help validators recover more efficiently during network disruptions.
  4. Faster validator exits and improved recovery mechanisms are also being considered. Modifications to inactivity leak parameters would enable faster automatic exits for validators that have lost effectiveness, reducing the strain on network finalization.

The Holesky incident was a major challenge for Ethereum’s testnet infrastructure, exposing weaknesses in validator coordination, slashing protection, and fork detection. While the network was eventually stabilized through coordinated recovery efforts, including validator synchronization and mass slashings, the incident highlighted the complexities of managing a large-scale validator set.

In response, developers are working on improvements like better slashing protection, smarter fork detection, and faster validator exits and recovery processes. These changes aim to make future testnets and mainnet upgrades more resilient.

Resources:

  1. Holesky Validator Incident Response Call
  2. Holesky Debrief

Related Articles

  1. Ethereum’s Institutional & Government Adoption
  2. Solving the Puzzle of Duplicate Blocks in Ethereum
  3. Ethereum Developers are Rethinking Transaction Signatures & Authority
  4. The Debate Over Freezing Ethereum's Core for Good
  5. Fixing Ethereum’s Message Signing Chaos
_____________________________________________________________________

Disclaimer: The information contained in this website is for general informational purposes only. The content provided on this website, including articles, blog posts, opinions, and analysis related to blockchain technology and cryptocurrencies, is not intended as financial or investment advice. The website and its content should not be relied upon for making financial decisions. Read full disclaimer and privacy Policy.

For Press Releases, project updates and guest posts publishing with us, email to contact@etherworld.co.

Subscribe to EtherWorld YouTube channel for ELI5 content.

Share if you like the content. Donate at avarch.eth or Gitcoin

You've something to share with the blockchain community, join us on Discord!

Follow us at Twitter, Facebook, LinkedIn, and Instagram.


Share Tweet Send
0 Comments
Loading...
You've successfully subscribed to EtherWorld.co
Great! Next, complete checkout for full access to EtherWorld.co
Welcome back! You've successfully signed in
Success! Your account is fully activated, you now have access to all content.