Ethereum 2.0 Development Update #40 — Prysmatic Labs
Our biweekly updates written by the entire Prysmatic Labs team on the Ethereum Serenity roadmap.
Bitfly’s Block Explorer Recap
We want to give a huge shout-out to the Bitfly team for their work on an Ethereum 2.0 blockchain explorer at https://beaconcha.in. They have created an excellent set of features for anyone to explore what’s currently going on regarding validators, blocks, votes (known as attestations), and finality in our proof-of-stake testnet.
Aside from giving a great overview of blocks and votes as they happen, the explorer allows for very interactive, deep linking of items such as validator indices, attestations cast, finality of epochs over time, and more. It is an excellent starting point to build on the rich, yet nascent eth2 ecosystem. More importantly, all the code is open source here. The team is leveraging our public API for eth2 https://api.prylabs.network with our official API schema and definitions maintained in the prysmaticlabs/ethereumapis repository on Github.
Finalized Epoch Reversion Bug
On 11/21, Prysm nodes in the testnet observed a finalized checkpoint reversion which caused all incoming attestations to be rejected. Upon further investigation, we discovered an edge case in the fork choice specification where the recently processed head state allows finality reversion but the fork choice cache in-memory doesn’t allow reversion. From the Prysm client’s point of view, it led to a catastrophic difference in perspective on what the right finalized checkpoint was, and all the incoming attestations were validated and rejected based on the in-memory state but the attestations were signed using the state persisted in the DB. The issue in spec is currently being worked on and can be tracked here: https://github.com/ethereum/eth2.0-specs/pull/1495
Many lessons were learned from this incident. We wrote tools to extract the bad node’s DB for playback and used a lot of prometheus graphs to help narrow down the problem. We also added metrics to capture the finalized checkpoint from both the latest processed head state and the in-memory finalization cache.
Initial Chain Sync Radically Improved
We’ve made significant improvements to initial sync block processing times by reducing concurrency issues in the database and enabling data compression to reduce overall disk I/O and storage requirements. At the moment, a modern consumer grade laptop can sync 50 to 60 blocks per second on the Prysm test network and the final database size ends up at less than 600Mb when combined with experimental feature flag “prune-states”. This means you can sync to the testnet’s head block in under 1 hour at 150k slots!
API Improvements & Archival Node Fixes
Thanks to the hard work from users of our testnet and other developers building on our public API https://api.prylabs.network, we have improved our documentation, features, and developer UX substantially.
Among the new features we added in the past few weeks are:
- Allow for filtering of active validators in the ListValidators API call
- Return whether or not an epoch is finalized in our GetValidatorParticipation endpoint
- Allow for explicit filtering of genesis data for beacon blocks and attestations
Some important bug fixes we resolved regarding our API were:
- Data from archival nodes “missing” due to beacon chain skip slots
- Standardization of empty results and pagination
- Panics from our beacon node when requesting data with incorrect parameters
End to End Tests & Benchmarks Merged In
Our teammate Ivan Martinez put together a much needed utility for eth2: a full-suite of test runners for end-to-end functionality of the blockchain, with tools ensuring our beacon chain can properly reach finality, can properly start a chain upon reaching a minimum number of validators on an eth1 deposit contract, among others. Having an end-to-end suite ensures we always have confidence in the core aspects of eth2, re-running the whole suite upon every Github pull request pre-submit. This means there we minimize the possibility something small regarding old, key functionality breaks along the way while we’re developing new features, even many months into the future.
Blazing Fast Serialization for Prysm Data Structures
As the production-readiness of our feature set for the beacon chain improves we’ve been taking a closer look at some of the biggest bottlenecks in the network, and given we have already addressed the elephant in the room which was BLS signature verification through the awesome Herumi library, some of our key serialization primitives are still too inefficient even by testnet standards. Specifically, serializing the state data structure of an eth2 beacon node is currently very expensive in Prysm due to all the amount of type-inference and reflection done on data, as well as our lack of proper caches for fields in the state which do not change substantially.
We opted to try out a crazy experiment and instead implement a fully-custom and Prysm-specific serialization algorithm for our beacon state which tries to keep a minimal memory footprint and reduce unnecessary computation. This experiment offered a 3x improvement across the board and are planning on integrating it into our runtime soon alongside a more dedicated caching mechanism on top of it. The pull request for this is here.
Integrating Slasher with the Beacon Node
After completing the initial implementation of slashing design it is time to make it a part of the system. Concurrency, Optimization of db read/write, feeding in attestations to it and submitting slashing offenses to the beacon chain are still things that have to be added in order to make the slasher an integral part of the system that can withstand the load of attestations in gossipsub and benefit the network with its valuable data. In the coming weeks we will design and implement those missing parts starting with concurrency and db read/write optimization.
P2P Network Peering Fixes
We’ve been observing all time high network activity on our public test network. At the moment, 75 participants on the network! As a result, we are seeing all time high messaging volume in gossip sub. These message rates have uncovered various pain points and bottlenecks in the attestation processing pipeline. With 75 participants and about 500 active validators, we are seeing 150 to 250 messages per second. Many of these messages are duplicates received from peers and we expect the attestation propagation pipeline to be significantly improved in spec version 0.9.2 where committees broadcast into subnet topics for unaggregated attestations. Although Prysm nodes are receiving 200 to 250 attestations per second, those nodes are only processing 4 to 5 distinct attestations while the rest are duplicates which have already been seen by the node.
In another effort to reduce the number of messages per second, we’re looking to enhance the p2p connection manager from libp2p to have a hard limit on peers.
The current libp2p connection manager aims to maintain a peer count between an upper and lower limit. For example, they may choose to maintain 500 to 800 peers. When reaching 800 peers, the connection manager would trim to 500. When we tried this with Prysm to maintain 40 peers, we saw that the connection manager would trim connections to peers such that there would be 40 peers, but those disconnected peers would immediately reconnect. This created a very violent churn in peer to peer communications.
In the coming weeks, team member Nishant is investigating a more reliable solution to enforce peer count such as rejecting new connections once the threshold has been reached rather than reactively trimming peers after the threshold has been exceeded.
We are always looking for devs interested in helping us out. If you know Go or Solidity and want to contribute to the forefront of research on Ethereum, please drop us a line and we’d be more than happy to help onboard you :).