Ethereum Hardware Resource Analysis

Understanding the differences between different Consensus Layer Clients: v3

Introduction

As the number of people using Ethereum has grown, the network is hitting a limit in throughput. A lot of discussion has happened around how to scale Ethereum and there have been different approaches to this problem.

Lately, rollups have positioned themselves as the quick solution to the scalability issues, offering users around 3 - 8x lower fees than traditional Ethereum transactions. However, this is still high for a rollup, and the ecosystem intends to reduce this almost to 0, to allow a broader scalability. To solve this issue, blobs are introduced in the Ethereum chain.

On March 13th 2024, the Ethereum network was upgraded as per the Deneb fork. In this report, we will review what benefits this fork brings into the Ethereum ecosystem and how they impact the hardware resources of the consensus clients.

The Deneb fork

The Deneb fork mainly includes the changes proposed in EIP-4844 (a.k.a “blobs”). Several new data structures and benefits are introduced:

A new transaction type (blob-carrying transaction) is introduced to use blobs.
Two new concepts are introduced: blob gas and blob gas fee

Blobs are temporary storage and they aim to replace the traditional calldata storage. They are mainly purposed for L2s to submit their data, which is why they are only stored for 18 days (by default, this can be modified) in each Ethereum node.

This hard fork represents a small step in a long journey. While L2s allow a higher transaction throughput at a cheaper cost, it is not intended to be the final solution. With blobs, the Deneb fork introduces the base data structures for the transition towards Data Availability Sampling (DAS).

What Blobs mean to the network

A new topic is introduced in the Ethereum Consensus p2p network: blob_sidecar_{subnet_id}
Beacon nodes will use this topic to send and receive blob sidecars, a wrapper around blobs to be transmitted over the p2p network. Thus, blobs are only stored and transmitted by the beacon node.

Each blob has a fixed size of 128KB (which is reduced at compression time). The main goal is that the transaction calldata or submission from the L2s will be stored in these temporary data structures while referencing the blob hash at the transaction. With a target of 3 blobs per block and a maximum of 6 blobs per block, the amount of data transmitted over the network at each slot is incremented. We are now facing a maximum of 868KB per slot (assuming the average block size of 100KB).

At the time of this writing, there have been around 668532 slots since the Dencun hard fork, out of which around 40% have blobs. Our goal with this report is to showcase whether the CL clients’ resource consumption has changed as per the recent changes.

Methodology

This experiment consists of running the 6 main consensus layer clients at two different times: one before the fork and one after it. To run the 6 main Ethereum CL clients, we have used this repository. All the clients have run together with the same execution client: Nethermind.

The Setup

Each consensus client was run on a different machine. Therefore each machine was running:

Consensus node
Execution node
Prometheus
Node exporter
CAdvisor

As the introduction of blobs happened on March 13th (with the hard fork), we have run the clients at two different times that will serve as a comparison:

The first run in February 2024 (pre-blobs)
The second run in May 2024 (post-blobs)

During both experiments, the same setup was used:

The same machine specifications
The same clients (both CL and EL)
The same duration (one month)
The same engine (docker)
The same location (Europe)

All metrics were recorded using a locally deployed Prometheus and then sent to a central Prometheus through a remote write. This process has made it easier to analyze data, as we had all contained within a single Prometheus server. The resource consumption metrics are recorded every second for more granularity and the beacon metrics are recorded every 12 seconds.

Data has been retrieved and represented using Python + plotly graph objects. To create the plots, we download the needed data into CSV and then plot to an HTML file embedded into a website.

Hardware Specification

Component	Specification
Disk	2TB NVME MZQLB1T9HAJR-00007 SAMSUNG
Memory	44GB Memory
CPU	8 vCores@2.4GHz

Client Versions

Client	February 2024	May 2024
Prysm	v4.2.1	v5.0.3
Lighthouse	v4.6.0 v4.6.222-exp	v5.1.3 v5.1.222-exp
Teku	v24.1.1	v24.4.0
Nimbus	v24.2.0	v24.4.0
Lodestar	v1.15.0	v1.18
Grandine	unstable	0.4.0
Nethermind	1.25.3	1.26.0

The Experiment

Each run has consisted of 4 phases:

Run the client in default mode (25K slots)
Run the client in all-subnets mode (25K slots)
Run the client in default mode while simulating block production at every slot (25K slots)
Run the client in all-subnets mode while simulating block production at every slot (25K slots)

A more detailed description of the phases at each run can be found here.

Data Comparison

As the basic data comparison, we will only take into account the first two phases of each run, which represent a regular context for a node. Simulating block production at every slot is far from the reality a node will face but it can give us some interesting hints about how it behaves under extreme conditions.

Default mode

Network Bandwidth Incoming

With the introduction of blobs, the amount of data transmitted in every slot is increased by up to almost 8 times (from 100KB to 868KB). It would not be a surprise that we see a higher network bandwidth usage (both incoming and outgoing) observed after the fork.

In Figure 1, we can observe a general increase in the network incoming throughput, which could be related to the introduction of blobs. Nimbus presents the most significant change, from a median of 0.11 MB/s pre-blobs to a median of 0.27 MB/s post-blobs but remains as the client with the least incoming throughput. Teku presents the least variance out of all the measured clients.

Figure 1: Network Bandwidth (incoming) usage of CL clients pre-blobs vs post-blobs

More specifically, as we can observe in Figures 2 and 3, Nimbus and Grandine present a clear increase throughout the whole run, when we compare post-blobs metrics to pre-blobs metrics.

Figure 2: Network Bandwidth (incoming) usage of CL clients pre-blobs vs post-blobs: Grandine

Figure 3: Network Bandwidth (incoming) usage of CL clients pre-blobs vs post-blobs: Nimbus

Figure 4: Network Bandwidth (incoming) packets of CL clients pre-blobs vs post-blobs

In Figure 4 we can also observe a slight increase in the rate of received packets in all clients, especially in Nimbus, Prysm, and Lighthouse. However, Teku shows a reduction in the number of packets received, which is the opposite behavior of the rest of the clients.

Network Bandwidth Outgoing

While we have seen several differences in the amount of received data, we do not observe such a difference in the network outgoing throughput. As shown in Figure 5, most clients present an increase in the amount of data transmitted, except Teku, which shows a decrease from a median of 0.4 MB/s to 0.32 MB/s when comparing post-blobs metrics to pre-blobs metrics. This will be further detailed below together with Figure 17, where this behavior is further described.

Figure 5: Network Bandwidth (outgoing) of CL clients pre-blobs vs post-blobs

However, we do see some significant differences in the rate of packets sent by the node in some of the clients. More specifically, for Nimbus, Lighthouse, and Grandine we see some increase in the rate of sent packets per second after the introduction of blobs, compared to pre-blobs. This increase can be observed in Figure 7.

Figure 6: Network Bandwidth (outgoing) packets pre-blobs vs post-blobs

Opposite to the expected, we see some decrease in the rate of packets sent for Lodestar and Teku. As we can see in Figure 6, Lodestar’s rate decreased from an average of 1250 packets per second pre-blobs to an average of 1000 post-blobs. After discussing with the Lodestar team, this is a result of a batching upgrade applied to the client between the two runs.

Similarly, we observe the same behavior with Teku and the network outgoing bandwidth post-blobs, which is strongly related to the executor queue being full during the run post-blobs.

Lighthouse Tree States Release Additionally, we have also tested the experimental release from Lighthouse, which is a rework around how the states are stored and comes with several improvements. We have run the latest experimental release at each run (February and May) for some days in default mode. In Figure 7 we can observe how the memory consumption of the experimental release is around 4GB, while the stable release (at that moment) varies between 4 GB and 8 GB (see Figure 11).

Figure 7: Memory Usage of Lighthouse Tree States release pre-blobs vs post-blobs: Lighthouse Tree States

We can appreciate a different behavior between pre-blobs and post-blobs, where after the node is synced (around hours 50) some pruning is applied and the memory usage is lower than during sync time. This behavior is very similar to the expected after the latest upgrades in the experimental release.

Default + All Subnets

CPU Usage

With the introduction of blobs, more data will be handled and validated. However, blobs are more expensive as data (to be transmitted and stored) rather than CPU effort. When comparing the CPU Usage of the two experiments we have carried out, we do not find any big differences, most clients maintain a similar use of the CPU.

Figure 8: CPU Usage of CL clients pre-blobs

Figure 9: CPU Usage of CL clients post-blobs

Figures 8 and 9 present the CPU usage. The first vertical line marks the end of the sync process, the second vertical line marks the switch to the all subnets mode and the third vertical line marks the end of running the all subnets mode. Taking a deeper look into those, Grandine shows an increase in CPU usage from an average of 15% to an average of almost 60% in all subnets mode, when comparing pre-blobs to post-blobs.

Memory Regarding memory usage, in Figure 10 we can observe most CL clients show a significant increase in memory consumption when comparing pre and post-blobs. However a Prysm a Nimbus have reduced their memory consumption. More specifically, Nimbus presents a large drop in the memory consumption.

Figure 10: Memory consumption of CL clients pre-blobs vs post-blobs (includes RSS)

In Figures 11 and 12, we can observe the memory usage of Lighthouse and Grandine, respectively. In both Figures, we can also observe the switch to the all-subnets mode, which happens around hour 90 for the post-blobs run (May), and around hour 100 for the pre-blobs run (February).

Having a deeper look at both Figures, Lighthouse shows a significantly higher use, with an overall increase of 3GB on average. Grandine also shows an increase of around 1GB of memory after the introduction of blobs. Additionally, the memory usage of Grandine always increases while in the all-subnets mode, and this behavior is still observed post-blobs.

Figure 11: Memory consumption of Lighthouse pre-blobs vs post-blobs (includes RSS): Lighthouse

Figure 12: Memory consumption of Grandine pre-blobs vs post-blobs (includes RSS): Grandine

We must mention that the calculated memory also includes RSS. Some engines and programming languages (e.g., Go) usually allocate more memory than needed, but it is not returned to the OS unless needed, so this should be taken into account when reading these figures.

Connected Peers

We have measured the number of peers connected through libp2p, as each client exposes this in their Prometheus metrics. We have observed that most clients maintained a stable number of connected peers throughout the whole experiment.

While running in default mode, Teku positions itself as the client with the greatest number of connected peers at an average of 100, Nimbus positions itself as the client with the lowest number of connected peers, at an average of 40.

In Figures 13 and 14, it is represented the number of peers of each of the consensus clients. The first vertical line marks the end of the sync process, the second one the switch to all-subnets, and the third one the end of running in all-subnets mode.

However, when switching to all subnets, Nimbus increases its number of connected peers to an average of 140. Such change can be observed in Figure 13, which shows the number of connected peers during the first two phases of the first experiment. This has been discussed with the Nimbus team and it is not the expected behavior, as Nimbus should maintain a stable number of peers around 160 independently of the mode. We continue to investigate this issue with the Nimbus team, but at this time it is unclear why we observed this behavior.

Figure 13: Number of connected peers during the first two phases of the pre-blobs run

Looking at the same metric after the introduction of blobs does not show any great differences, but we do see a strong and consistent variance in Lodestar while configured in the all subnets mode. This behavior can be observed in Figure 14.

Figure 14: Number of connected peers during the first two phases of the post-blobs run

Disk Usage

Running a node always implies a synchronization process, even while using checkpoint sync. The execution client (at the time of this writing) always has to sync the chain from scratch. On the other hand, consensus clients benefit from checkpoint sync, but they still need to backfill some minimum amount of data (most clients backfill until the weak subjectivity point). As the clients do not backfill until genesis, the database size of the consensus client is similar, independent of the age of the blockchain. Most metrics plateau or strongly stabilize after the synchronization has finished, as it is a very intense process where a lot of data is downloaded and persisted.

To run the nodes, we used a checkpoint sync and waited until both, the execution and consensus nodes were synced. However, while analyzing the disk usage before and after the introduction of blobs, we see some different behavior right before plateauing.

Figure 15: Disk Usage in GB in the first run (Feb 2024)

Figure 16: Disk Usage in GB in the second run (May 2024)

It is important to mention that the metrics in Figure 15 and Figure 16 show the disk usage of the Consensus and the Execution Client together.

In Figure 16, we can observe how, right before finishing the sync process, most clients show a more volatile disk activity, which could be due to backfilling the 18 days of blobs or due to pruning. Once stable, most clients present an increase of around 30 - 50GB, which could be the result of introducing blobs. For example, in the first run of the experiment, Prysm + Nethermind stabilizes at around 980GB, while in the second run, it stabilizes at around 1020 GB.

Network Outgoing Bandwidth Rate

We have already mentioned that nodes will receive up to 8x more data at every slot, and they will need to retransmit these packets to other nodes. Therefore, it would also not be a surprise that nodes send more network packets than before the Deneb hard fork (see Figure 5). In Figure 17, we can observe how Teku shows a lower network outgoing bandwidth usage, from an average of 4MB/s pre-blobs to an average of 1.5 MB/s after blobs (on the all-subnets mode). This behavior has been discussed with the Teku team and they have pointed out that the node might be discarding a lot of the gossip messages, due to the executor queue being full. We have dug into the logs of our instance and this has been confirmed. After analysis and discussion with the Teku team we confirmed that this bandwidth issue can be resolved by increasing the queue sizes and the heap size according to the needs of the network.

Figure 17: Network Bandwidth (outgoing) of Teku pre-blobs vs post-blobs

Slot Time Utilization

Aside from looking at data throughout the whole experiment, it is also nice to see how the node behaves during the 12 seconds inside the slot. We gathered data from several hours and then aggregated it by the slot second.

Figure 18: Network Bandwidth (incoming) of Grandine inside the slot pre-blobs (whole machine)

Figure 19: Network Bandwidth (incoming) of Grandine inside the slot post-blobs (whole machine)

In Figures 18 and 19, we can observe the network (incoming) bandwidth used by Grandine pre and post-blobs, respectively. We may appreciate how it presents two peaks of data received, the first one before second 4, and the second one after second 8. These two peaks are higher post-blobs compared to pre-blobs, the first one being significantly higher, which could be a result of the introduction of blobs.

Figure 20: Network Bandwidth (incoming) of Prysm and Lighthouse inside the slot pre-blobs (whole machine)

Figure 21: Network Bandwidth (incoming) of Prysm and Lighthouse inside the slot post-blobs (whole machine)

In Figures 20 and 21, we can observe the network (incoming) bandwidth used by Prysm and Lighthouse pre and post-blobs, respectively. Once again, we can identify two peaks inside the slot at similar times as commented before in Figures 18 and 19. Both in Prysm and Lighthouse, we can identify the same pattern: the first peak is higher post-blobs compared to pre-blobs, while the second peak is lower.

In most clients, we do see a higher initial peak post-blobs, which could be related to receiving blobs together with the block.

Conclusion

We can extract the following insights from the experiment:

We can observe a general increase in network bandwidth use when we compare pre and post-blobs data. This increase is due to the introduction of blobs in the Ethereum ecosystem.
Most clients show an increase in memory usage when we compare pre and post-blobs metrics. Once again, this increase could be strongly correlated with the introduction of blobs, as they need to be transmitted over the network and it is probable that the client holds them in memory for some time. However, Nimbus shows completely the opposite behavior.
Most clients present an increase in network (outgoing) bandwidth usage when we compare pre and post-blobs metrics. This is strongly related to the introduction of the blob subnets, which are used to retransmit blobs around peers. Teku shows a different behavior but this is due to gossipSub messages being dropped and it is not the expected behavior.
As expected, we do not see any significant differences in the CPU usage of any clients while comparing pre and post-blobs metrics. Blobs mostly consume space (and thus, network bandwidth when transmitted over the network).
During this experiment, we observed that post-blobs disk usage increases by around 30 GB - 50 GB when compared to pre-blobs. This metric also considers the execution layer client database, but this increase could strongly be related to the introduction of blobs, which showcases that the cost to run a node after the Deneb hard fork is very similar to before. While a full Ethereum node requires a minimum of around 1.1 TB of disk, the introduction of blobs represents an increase of 4% in disk space.
Looking at slot time utilization, most clients present a higher initial peak of data received when comparing post to pre-blobs metrics. This increase is the result of the introduction of blobs, which are usually transmitted during the first 4 seconds of the slot.

Open Data

In this report, we have described the most significant insights that we have collected during the experiment. All the generated plots have been uploaded to our website, which are interactive and we hope helps anyone out there understand the metrics better and gather deeper insights on this hardware resources experiment.

About MigaLabs

We are a research group specialized in next-generation Blockchain technology. Our team works on in-depth studies and solutions for Blockchain Scalability, Security and Sustainability.

Ethereum Hardware Resource Analysis Update