Ethereum 2.0 network crash and update
Ethereum 2.0 testnet Medalla crashed over the weekend leading to a very large network outage that lasted several days. This network crash happened a little over a week after the Etereum 2.0 network went live.
This is signifigant because it is the first time the Ethereum network has ever fully went offline and in trying to fix the bug that caused it, developers ended up creating an even bigger issue for the network. What happened and how was it eventually solved? Read on for the details.
The Etereum 2.0 bug that crashed the system
Late in the afternoon on August 14th 2020 users reported that they were getting the following error message “WARN Roughtime: Roughtime reports your clock is off by more than 2 seconds offset=4h0m0.028854657s.”
This message was caused by a time relay issue on the Prysm validator network that Medalla uses to validate transactions on the testnet. This bug caused a time synchronizer to report the time 4 hours ahead. This resulted in the network flagging blocks as being altered as their time signatures did not match. According to the official report released by Prysm labs:
“The cloudflare roughtime servers all returned wrong information, and Prysm nodes did not properly fallback from this situation. This bug caused all Prysm nodes to exhibit clock skew. Because of this clock skew, validators incorrectly proposed blocks and attestations for future slots.”
This error effected everybody who was connected to the network using the Prism NTP server which was a large majority of the network participants since Prysm was the only client to release a detailed guide to onboarding users. According to reports only 5% of the network validators were not effected by this error causing the participation rate to drop under the minimum threshold needed to validate blocks and generate block rewards. This resulted in the network officially being taken offline.
Developers create a bigger problem for the test net
In trying to fix the time bug Prysmatic developers accidentally created a bigger problem for users of their servers. Raul Jordan, an Ethereum 2.0 developer for Prysmatic released the following statement:
“In fixing this bug, we accidentally removed all critical features for Prysm nodes to function, making the issue infinitely worse.”
This lead to the network being down and remaining in it’s offline status for an extended period of time.
Update Bug fix finally released
Prysmatic finally released an update to their network that is intended to solve this specific issue in the future. The update; named Alpha 23, reportedly provides “initial sync improvements that may assist in resolving ongoing sync issues in the Medalla test network”.
What does this mean?
Ethereum 2.0’s Medalla network just launched as a test net. Bugs are expected to be discovered and fixed, and Validators were made aware of this risk going into the project. The fact that this bug happened under a test environment without jeopardizing the integrity of the live main net client is a good thing for Ethereum in the long term. Even if it does bring some otherwise unwanted attention to the project in the short term. Ethereum 2.0 represents one of the single largest changes to a major cryptocurrency ever attempted and we remain hopeful that subsequent bugs will be dealt with in a timely and concise manner.