MigaLabs - Blockchain Data Analytics Transforming Insights in Real Time

Introduction

The introduction and release of EIP 4844 in the Dencun fork protocol upgrade is a significant milestone in the Ethereum ecosystem. Commonly dubbed as “proto-dansharking”, EIP 4844 is designed to enhance Ethereum’s data capacity while reducing costs for layer-2 (L2) rollups, thus addressing two critical challenges: scalability and user fees. This update is crucial for developers and users alike, as it supports more complex applications at lower operational costs, potentially increasing Ethereum's adoption across a broader spectrum of digital applications. To put it in simple terms: since the release of EIP 4844, some L2’s such as Starknet have reported significant reductions (up to 95%) in monthly on-chain costs. In this post, we share the results of a detailed analysis of the network since EIP 4844 release on mainnet. These include an overview of block data since the fork date and a blob propagation analysis. While our analysis includes much more data and figures than shared in this blog post, in each section, we provide links for you to take a deeper dive, access granular data and observe trends across various dimensions of the chain (consensus implementation, country, blob sizes, propagation delays, etc.).

Tools and Datasets Used

For our analysis, we utilized data from Xatu provided by the Ethereum Foundation, which is available at https://github.com/ethpandaops/xatu. Xatu offers comprehensive datasets that are instrumental in understanding blob propagation delays.

Our analytical approach involved leveraging Python Notebooks, which allowed for efficient data manipulation and visualization. We employed various data science techniques to process and analyze the data, ensuring that our findings were robust and insightful. The tools and scripts we used and created for this analysis are entirely free and open-source.

Mainnet Blob Data Overview

EIP 4844 was released as part of the Dencun fork with the chain at epoch 269,568 (2024-03-13 13:55:59 UTC). This fork came with a new type of transaction known as type 3 or blob-carrying transaction. Blobs are a temporary storage for transaction call data. They are mainly designed for rollups and L2s to submit their data, which is why they are only stored for 18 days (4096 epochs). This timeframe not only allows the L2s to report any inconsistent transactions but also enables cheaper storage as it is temporary. Since Dencun, until the time this analysis was conducted (2024-06-07 07:29:59 UTC), 617,270 slots and 93,020,502 transactions with an average of 151 transactions per slot have been added to the chain. 558,868 of the total transactions (0.6%) in these slots are type 3 transactions. While a block can contain a maximum of 6 blobs, a significant number of blocks do not contain blobs as can be seen in Figure 1. In fact, blocks that contain blob-carrying transactions often contain 6 blobs (see Figure 1).

Next, we will take a look at the trends in blob propagation.

Figure 1: Distribution of the number of blobs from slot 8626178 to 9243448

Mainnet Blob Propagation

Imagine a global network of food delivery services relying on blockchain for real-time order tracking; delays in data propagation could mean the difference between a hot meal and a cold disappointment. In the context of EIP 4844, blob propagation times vary fairly across different blob counts in a block, as illustrated in the boxplot in Figure 2. A boxplot, or box-and-whisker plot, displays data distribution using five key numbers: lower whisker, first quartile (Q1), median, third quartile (Q3), and upper whisker. The box shows the interquartile range (IQR), where 50% of the data lie, with a line marking the median. "Whiskers" extend to the smallest and largest values within 1.5 times the IQR from the quartiles, while outliers are plotted as individual points. For Figure 2, the average blob propagation delay is generally around 2 seconds, however, many blobs are seen to arrive long after the typical block treatment window of between 4 to 8 seconds. We even observed 30,225 outliers that go beyond 12 seconds of which 3,485 were first time recorded arrivals. A large number of these (2,638) were from Teku clients in AU and the US. We clipped at 12 seconds because beyond 8 seconds already it would be meaningless and would skew our analysis. It is important to note that this does not necessarily (but likely) mean a latency problem but rather could be a consequence of the underlying network implementation and how block data is treated and published. But does the number of blobs impact propagation time? In the analysis that follows, we explore further to find answers to this question.

Figure 2: Blob propagation time by number of blobs

Time difference between blobs arrival

Blob propagation delay could be a consequence of all blobs in a given transaction not arriving on-time. To confirm this, we analyzed the time-difference between the arrival of the first and last blob when the number of blobs is more than one. It is interesting to find out that the average time difference between the arrival of the first and last blob in 99.77% of the blocks analyzed is below 2 seconds, as shown in Figure 3. Figure 3 also suggests that the time difference between the first and last blob arrival generally increases with the number of blobs. However, this does not confirm the nature of this trend and what other factors may be responsible for blob propagation delay. To get a better sense of the trend, let us look at a weekly spread of the arrival of blobs from the first to the sixth blob.

Figure 3: Time difference between first and last blob arrival by blob count

Weekly breakdown

The weekly breakdown of propagation delays per block, as illustrated in Figure 4, reveals significant variations across different weeks but notably, the average propagation time increases with the number of blobs, as seen in the color-coded box plots. Blocks containing one blob consistently exhibit the lowest average delay (2.23 seconds), while blocks with six blobs show the highest average delay (2.47 seconds). The spread of delays, represented by the distribution of data points, indicates higher variability and outliers in weeks with higher blob counts. This pattern suggests that while the number of blobs is a key factor influencing propagation delay, weekly variations might also be attributed to network conditions, traffic, or other external factors affecting the chain's performance. To confirm, we will next look at the variation in propagation delay based on the client implementations.

Figure 4: Weekly variation propagation delays per block on Mainnet

Client breakdown

To better understand the factors contributing to blob propagation delay, we examined the variation in propagation delays based on different client implementations, as depicted in Figure 5. The data highlights the average propagation delays for five distinct client implementations: Lighthouse, Lodestar, Nimbus, Prysm, and Teku clients as collected from Xatu. The results show significant differences in propagation delays among these implementations. Prysm exhibits the lowest average delay at 2.44 seconds, followed closely by Nimbus at 2.68 seconds and Lighthouse at 2.79 seconds. On the other hand, Lodestar and Teku display higher average delays, with 3.18 seconds and 3.90 seconds, respectively. In comparison to Figure 2, these values are remarkably high.

Figure 5 also illustrates the weekly spread of delays for each client, with distinct box plots representing the distribution of propagation times. The lower propagation delays in Prysm, Nimbus, and Lighthouse suggest these clients might have more efficient blob processing and network communication mechanisms. In contrast, the higher delays observed in Lodestar and Teku could be due to various factors, such as less optimized code, different strategies and different event-reporting strategies. Because of this differences in the way different clients report events, it is not always easy to draw conclusions from these type of comparisons. We make an important call that the insights provided by this study should be used to review and improve client implementations, particularly for those with higher average propagation delays.

Figure 5: Weekly variation in propagation delays by consensus implementation on Mainnet

Geographical breakdown

Again, we revisit the question: does the number of blobs in a block and/or client implementation impact propagation delay? Let us take a look at how propagation delay varies across regions: Finland (FI), Netherlands (NL), United States of America (US), Australia (AU) and India (IN). Figure 6 shows that for Prysm clients the average propagation delay ranges between 1.91 - 2.15 seconds. However, for Lodestar and Teku clients, the value ranges between 2.66 - 3.07 seconds and 2.33 - 3.16 seconds in Figures 7 and 8 respectively. Notably Teku’s 5 blobs count significantly had higher propagation delay. Again, we raise a fundamental question: why are these two clients experiencing higher propagation delay? To achieve the goals of the EIP4844 protocol, it is crucial to optimize every client to have better blob propagation time.

Figure 6: Variation in propagation delays by country and blob counts per block on Mainnet - Prysm client

Figure 7: Variation in propagation delays by country and blob counts per block on Mainnet - Lodestar client

Figure 8: Variation in propagation delays by country and blob counts per block on Mainnet - Teku client

Our analysis of blob propagation delays on the mainnet underscores the significant impact of network latency, bandwidth, and client implementation on overall performance. The varying propagation times across different blob counts suggest that while the number of blobs in a block plays a role, it is not the sole factor. Weekly and client-specific breakdowns further reveal that certain client implementations, such as Prysm, exhibit lower delays compared to others like Lodestar and Teku. This indicates that network conditions and client efficiency are critical determinants of propagation speed. Additionally, the geographical analysis supports these findings, showing consistent patterns of delay variations across regions. To meet the goals of the EIP4844 protocol and ensure optimal network performance, it is essential to address these factors by enhancing network infrastructure and refining client implementations to reduce propagation delays.

Increasing the number of blobs

What happens if the coming Electra fork proposes to allow a higher number of blobs?

To answer these questions, we first conducted a Shapiro-Wilk test to check if the blob propagation data meets the assumption of normality, and a Levene’s test to check if it meets the assumption of homogeneity of variance. The results shown in Table 1 showed a p-value of 0 for both tests, which means that normality assumption and equal variance across groups are not present. Thus, we opted for a Kruskal-Wallis test, which is a non-parametric test suitable where normality assumptions do not hold but necessary to compare medians between two or more groups. The results in Table 2 for this test indicate that there are statistically significant differences in the propagation time across different numbers of blobs. After a Kruskal-Wallis test indicates significant differences among multiple groups, Dunn's test is a non-parametric post hoc analysis that can be used to identify which specific groups differ. The post hoc analysis, Dunn’s test results in Table 2 confirms further that there are statistically significant differences in propagation times across different numbers of blobs. These results confirm that an increase in the number of blobs will increase the propagation delay significantly.

Table 1: Tests for normality and equal variance in the blob propagation dataset.

Test	Number of Blobs	p-Value
Shapiro-Wilk	2, 3, 4, 5, and 6	0.0
Levene	2 - 6	0.0

Table 2: Tests for normality and equal variance in the blob propagation dataset.

Test	Number of Blobs	p-Value	H-statistic
Kruskal-Wallis	2 - 6	0.0	51219.52

Table 3: Dunn’s pairwise (blobs) post hoc test with p-values adjusted by Bonferroni correction.

	2	3	4	5	6
2	1
3	0	1
4	0	2.29e-211	1
5	0	0	1.41e-66	1
6	0	0	0	1.06e-61	1

To answer what is the maximum number of blobs that can be sent with a guarantee that they arrive within a given threshold, we built a linear regression model on the blob propagation dataset. Our linear regression model with a mean squared error of 2.12 predicts that given a threshold of 4 seconds, up to 15 blobs can be included in a block. This prediction means that all 15 blobs would arrive within or at the 4 seconds mark, as shown in Figure 9. We encourage other researchers to confirm our analysis and compare results as more factors, metrics, and usage of EIP4844 protocols emerge in the near future, paving the way towards making reliable predictions.

Figure 9: Prediction of time difference between first and last blob arrival when blob count is increased

Probability of missing a block after x blobs

As blobs can be as large as 128KB in size, it is not out of place to be concerned about the latency implications for the entire network and block processing delays. In this context, we investigated if blocks (especially blocks having 6 blobs) can be a reason for missed blocks. Figure 10 shows a distribution of the number of blobs contained in blocks preceding a missed block. With about 54% of blocks preceding missed slots containing 6 blobs, we investigated whether the number of blobs could predict the likelihood of a missing slot. It is also important to mention that only about 43.3% of blocks preceding missed blocks contained blob-carrying transactions, as shown in Figure 10. Although the pie chart in Figure 10 shows a significant number of missed blocks after 6 blobs, this is due to the high number of blocks containing 6 blobs, which increases the likelihood of this occurrence. The percentage of missed blocks relative to the overall number of blocks observed is actually quite small, as Figure 11 demonstrates.

Figure 10: Distribution of the number of blobs in blocks preceding missed blocks

Figure 11: Distribution of the number of blobs in blocks pre-missed and canonical blocks

Table 4: Number of blobs as a predictor of missing slots

Using logistic regression with results in Table 3, we found that there is a statistically significant relationship between the number of blobs and the occurrence of missing slots. Specifically, for each additional blob, the likelihood of a missing slot increases (for each additional blob, the log-odds of missing a slot increase by 0.1148%). This was confirmed by a highly significant p-value (p < 0.01), indicating that the number of blobs is a reliable predictor of missing slots. However, the overall effect is modest, as shown by the Pseudo R-squared value of 0.0077, which suggests that while the number of blobs does play a role, it is not the sole factor influencing block misses.

These findings imply that as the number of blobs increases, the blockchain experiences a slight increase in the probability of block misses. This insight is crucial for understanding the operational dynamics of Ethereum after Dencun and should be considered before any attempt to increase the number of blobs. By addressing these elements, we can enhance the reliability and efficiency of the blockchain, ensuring smooth operations and minimizing disruptions caused by missed slots.

Real-time Dahsboard

For more insights on blob data distribution across epochs, slots and blocks, please check this visualizations dashboard. It contains several interactive plots, showcasing some of the trends within blob data statistics. The dashboard is also real-time, and plots are re-generated every hour using the most recent data. We will continue expanding the dashboard with new figures in the coming weeks.

← View all posts

About MigaLabs

We are a research group specialized in next-generation Blockchain technology. Our team works on in-depth studies and solutions for Blockchain Scalability, Security and Sustainability.

A Study of the first 3 Months of Blobs in Ethereum