The All Core Devs Execution (ACDE) call #206, held on February 27, 2025, focused on the Holesky fork recovery, validator coordination, and execution layer misconfigurations. Key discussions included network stabilization efforts, validator slashing strategies, custom checkpoint sync proposals, and improvements in execution & consensus layer coordination. The call also covered the Sepolia testnet upgrade, Ethereum’s finality mechanisms, and long-term enhancements to prevent similar incidents on Mainnet.
- Holesky Testnet Incident & Postmortem
- Proposed Fixes for Optimistic Sync
- Custom Checkpoint Sync Proposal for Consensus Layer
- Sepolia Testnet Fork
Holesky Testnet Incident & Postmortem
The Holesky testnet experienced a significant fork issue following the Pectra upgrade that was scheduled for February 24th. This fork was triggered due to a misconfiguration related to the deposit contract addresses on the execution layer (EL) of the network.
The deposit contract address on the Holesky testnet was not set correctly by certain execution layer teams. Unlike Mainnet and Sepolia, the deposit contract address in Holesky was unique, and the execution layer teams failed to configure it properly. With the Pectra upgrade, execution layer (EL) nodes were expected to process deposits directly.
However, some EL clients misinterpreted deposits because of the wrong address. The result was a divergence in the chain state, leading to a hard fork where different clients followed different versions of the chain.
The Holesky fork had a significant impact on network stability, as nearly 90% of validators followed the incorrect chain, leading to a minority fork. Validators on this incorrect fork were unable to participate in the consensus process, resulting in reduced block proposals and an inability to reach finality. Meanwhile, only about 10% of validators attested to the correct chain, making it difficult for the network to recover.
continued instructions for Holesky validators: continue to try to sync to the correct chain.
— nixo.eth 🦇🔊🥐 (@nixorokish) February 27, 2025
⚠️ DO NOT remove slashing protection!! ⚠️
await further instructions from your CL client devs (coming tonight or tomorrow morning)
Since most of the stake was concentrated on the incorrect chain, the minority chain required at least two-thirds of the total stake to finalize properly. Without intervention, achieving finality could take weeks or even months due to the slow process of leaking out inactive validators.
Client Team Updates
Client | Status Update |
---|---|
Nethermind | Running smoothly with increased validator participation (100,000+ keys). |
Prysm | Some validators recovered, but not fully operational. |
Nimbus | Validators mostly back online. |
Besu | All validators back online & pursuing chain head. |
Lighthouse | Coordinating recovery efforts. |
As of the latest updates, around 44% of slots were producing blocks, compared to only 10% previously. The goal was to increase block production to at least 75% before considering slashing mechanisms. Major discussions revolved around whether to start validator slashing immediately or to allow natural validator leakage.
The biggest challenge was ensuring validators rejoined the correct chain while avoiding mass slashing. If slashing was not coordinated properly, it could lead to massive validator losses, making recovery even harder.
To recover the network, two primary methods were considered: coordinated slashing and gradual validator leaks. The coordinated slashing strategy aimed to expedite recovery by immediately removing validators stuck on the incorrect fork. While this approach could quickly restore network health, it carried the risk of high validator losses and potential further instability.
Tomorrow at 15:00 UTC, validators on Holesky will disable slashing protection and attest even if they get slashed. The goal: finalize an epoch quickly
— terence (@terencechain) February 27, 2025
Why now? Slashed validators still face >18 days in the exit queue, and the network won’t finalize them. This gives us a clean…
Alternatively, allowing validator leaks over time provided a more natural recovery process with less harsh penalties. However, this method could take months to fully restore finality, prolonging the period of network uncertainty.
To prevent similar issues in the future, several mitigation strategies were proposed.
- First, clearer configuration standards were recommended, including standardizing deposit contract addresses across testnets and implementing stronger pre-fork validation tests for both execution and consensus layer clients.
- Second, improved validator synchronization mechanisms were suggested, such as implementing checkpoint sync to allow validators to recover quickly and developing better tooling for validator migrations.
- Lastly, there was a discussion about reviewing Ethereum’s finality model, as a similar issue on Mainnet could have severe consequences.
The community debated whether Ethereum’s finality and slashing mechanisms should be adjusted to handle such incidents more effectively.
Proposed Fixes for Optimistic Sync
Optimistic sync is a feature that allows consensus layer clients (CLs) to assume that blocks proposed by the execution layer are valid, which helps speed up synchronization. However, during the Holesky fork recovery, this mechanism contributed to several issues that made network restoration more challenging.
One major problem was that validators who restarted their nodes after the fork re-synced to the wrong chain due to optimistic sync, causing them to unknowingly continue following an invalid state. Additionally, some execution layer clients marked invalid blocks as valid because they optimistically assumed the execution layer state was correct, further complicating the recovery process. Another issue was the lack of a mechanism to reject bad data from peers, which meant that nodes continued to receive and trust incorrect information, slowing down efforts to reestablish finality.
To address these problems, several fixes were proposed. One approach was making optimistic sync optional, allowing validators to disable it in emergency situations to prevent syncing to an invalid chain. Another solution involved implementing stricter validation for execution layer blocks before syncing, ensuring that clients do not automatically accept invalid data. Additionally, it was suggested that nodes be given the ability to manually override optimistic sync and sync from a known valid checkpoint, similar to a custom checkpoint sync mechanism. These improvements would provide validators with greater control over network recovery and help mitigate the risks associated with optimistic sync in future incidents.
Custom Checkpoint Sync Proposal for Consensus Layer
During the Holesky fork, one major issue was that consensus layer clients (CLs) struggled to sync from a valid state. This created difficulties in network recovery, as validators could not easily reconnect to the correct chain. To address this, a custom checkpoint sync feature was proposed, which would allow validators to sync from an arbitrary, manually specified checkpoint rather than relying solely on the execution layer’s state.
The introduction of a custom checkpoint sync would provide several benefits. It would enable faster recovery in case of forks by allowing validators to be pointed directly to a known valid state. This approach would also reduce reliance on execution layer behavior, minimizing the risk of validators following an invalid chain. Additionally, it would offer greater flexibility for node operators, who could manually set their sync state and recover from network disruptions without waiting for automatic corrections.
However, implementing such a feature comes with challenges. A key concern is trust assumptions, as nodes would rely on manually chosen checkpoints, which introduces a risk of manipulation if not properly validated. Additionally, client support would be required across different consensus layer implementations, meaning each CL client would need to incorporate this functionality separately. Lastly, coordination among node operators would be essential to ensure they are syncing to the same checkpoint, avoiding further fragmentation in the network. Despite these challenges, custom checkpoint sync could be a critical tool for improving network resilience during forks and recovery scenarios.
Sepolia Testnet Fork
The Sepolia testnet fork was scheduled to take place shortly after the Holesky incident. Unlike Holesky, Sepolia had a smaller validator set, which required fewer modifications, making the upgrade process comparatively simpler. However, given the challenges faced during the Holesky fork, extra precautions were taken to ensure a smooth transition.
One of the primary concerns was ensuring all execution layer clients were patched to prevent a repeat of the Holesky misconfiguration. Additionally, teams focused on verifying that deposit contract addresses were correctly configured across all clients and updating client releases to maintain compatibility with the upcoming fork.
Since the misconfiguration of deposit contracts played a critical role in the Holesky failure, execution layer teams had to manually verify their configurations before the Sepolia upgrade. To prevent any misconfiguration issues similar to those in Holesky, a thorough manual review of configuration files was conducted.
A final decision was made to disable slashing protection at a specific time to facilitate network recovery and ensure that validators could safely transition to the correct chain. To execute this plan, a manual document was created with clear instructions for all validators, detailing the necessary steps for disabling slashing protection.
Following this, a structured recovery schedule was outlined. The first key step was the validator slashing protection disablement, scheduled for 15:00 UTC. Over the next 24 hours, teams would closely monitor validator participation rates to assess whether the network was stabilizing.
In conclusion, the meeting wrapped up with an acknowledgment of the significant efforts made by core developers, execution layer teams, and consensus layer teams in addressing and mitigating the Holesky incident.
Related Articles
- Highlights of Ethereum's All Core Devs Meeting (ACDE) #205
- Highlights of Ethereum's All Core Devs Meeting (ACDC) #150
- Highlights of Ethereum's All Core Devs Meeting (ACDE) #204
- Highlights of Ethereum's All Core Devs Meeting (ACDC) #149
- Highlights of Ethereum's All Core Devs Meeting (ACDE) #203
Disclaimer: The information contained in this website is for general informational purposes only. The content provided on this website, including articles, blog posts, opinions, and analysis related to blockchain technology and cryptocurrencies, is not intended as financial or investment advice. The website and its content should not be relied upon for making financial decisions. Read full disclaimer and privacy Policy.
For Press Releases, project updates and guest posts publishing with us, email to contact@etherworld.co.
Subscribe to EtherWorld YouTube channel for ELI5 content.
Share if you like the content. Donate at avarch.eth or Gitcoin
You've something to share with the blockchain community, join us on Discord!