Introduction

At MigaLabs, as a Blockchain Observatory, we wanted to extend our reach to Layer 2’s blockchain scalability solutions. Starting with Polygon’s zkEVM. The idea was to start building a dashboard showing the status of the zkEVM. In this report, we detail the different approaches tried, the problems encountered, how we managed to solve them, how we analysed and processed data, and the obtained results.

Problems with the setup

Repository setup

We followed Polygon’s guide in order to set up the zkEVM node. The nominal requirements for the node to run are:

16GB RAM
4-core CPU
20GB of disk (will increase with the chain) (At the time of setup was 20 GB, wow it says 70 GB).

We set up a node and installed the node following the instructions. However, the docker-compose was not working well; there was one variable that was not being used, and the guide specifies the creation of several folders. However, we thought it was easier to create a single folder and to override the variable in the docker-compose file, which makes the setup much easier.

Proposed setup

Run git clone -b fix/docker-deployment git@github.com:migalabs/zkevm-node.git and cd mainnet && mv example.env .env.

Fill the .env file with the instructions from the guide and docker-compose up -d.

First approach

We were aware that the zkEVM node only works with certain types of Intel Processors (they need to support the AVX2 instruction set). We configured a new machine with an Intel CPU, 8GB RAM, and 50GB of disk. This machine was running Ubuntu OS. We deployed the setup, but the prover would not show any logs. We tried running with debug mode, and we also tried executing the binary with the help option, but no log was shown.

Second approach

After our fairy failure, we upgraded the machine with a different CPU and 16GB of RAM. When running the setup, the prover now shows logs! Great, we now have a running node.

After speaking to the internal zkEVM team, we were told the sync process would take around a week or a bit more. Also, the sync process would use more than 500GB of disk, so we had to upgrade the machine again, this time disk-wise.

The zkEVM team kindly provided a snapshot of the database, which could be imported into the zkEVM node database. However, this process would take more than 16GB of RAM, so our node could not handle the restore process. Also, we tried running the node from scratch, but the prover service kept dying as per the memory limit.

Third approach

After understanding that more requirements were needed, we started researching. We found these software requirements for the node, which mention 32GB of RAM. Therefore, we upgraded the machine once again. With the upgraded machine, we could import the snapshot and then run all services as normal; our node is completely synced and functional now.

Issues with RPC

Once we had a node synchronising the network, we had problems interacting with the RPC container. In particular, the eth_syncing method was not working properly, and there were problems enabling debug mode when accessing debug logs on RPC. In addition to this, we found that they should update the docker-compose regularly since they had an outdated version. Finally, we also found that the node stopped working occasionally out of nowhere. We found that the reason was that they periodically release new versions of the zkevm-node container that are not backwards compatible.

Analysis

The analysis of the data is done using Python 3.11.5. The aim of this analysis is to present statistics and metrics of Polygon’s zkEVM Network using our own node. In particular, we plan to investigate and answer a number of research questions that could be raised from this, such as:

How many TPS is this rollup able to handle?
What is the latency in this rollup?
Study the data availability cost and compare it against what it would cost after EIP4844 is deployed.

Among others.

RPC

For the RPC, we have created an RPC class containing the domain and port where the zkevm-rpc container exists, and we implement two basic functionalities:

post to make requests.
pprint to format the response of the post call.

RPC Utils

RPC Methods used

We have implemented a number of different methods on the RPC.

Set up a class that will make the requests using the RPC specified above.

zkEVM Crawler

The eth_blockNumber and eth_getBlockByNumber methods from the zkEVM's RPC are the most used ones.

MigaLabsDB Setup

Since the response from eth_getBlockByNumber is the following:

Object - A block object, or null when no block was found:
number: QUANTITY - the block number.
hash: DATA, 32 Bytes - hash of the block.
parentHash: DATA, 32 Bytes - hash of the parent block.
nonce: DATA, 8 Bytes - hash of the generated proof-of-work.
sha3Uncles: DATA, 32 Bytes - SHA3 of the uncles’ data in the block.
logsBloom: DATA, 256 Bytes - the bloom filter for the logs of the block.
transactionsRoot: DATA, 32 Bytes - the root of the transaction trie of the block.
stateRoot: DATA, 32 Bytes - the root of the final state trie of the block.
receiptsRoot: DATA, 32 Bytes - the root of the receipts trie of the block.
miner: DATA,20 Bytes - the beneficiary's address to whom the mining rewards are paid.
difficulty: QUANTITY - integer of the difficulty for this block.
totalDifficulty: QUANTITY - integer of the total difficulty of the chain until this block.
extraData: DATA - the “extra data” field of this block.
size: QUANTITY - integer the size of this block in bytes.
gasLimit: QUANTITY - the maximum gas allowed in this block.
gasUsed: QUANTITY - the total used gas by all transactions in this block.
timestamp: QUANTITY - the Unix timestamp for when the block was collated.
transactions: Array - Array of transaction objects, or 32 Bytes transaction hashes depending on the last given parameter.
uncles: Array - Array of uncle hashes.

We set up a PostgreSQL Schema and Table containing these fields, in particular

How do we analyse the data?

We first convert and cast the hexadecimal values to ints.

We obtain data from polygonscan to compare

Integrity checks

The integrity checks that we have implemented are over:

transactions count
Average size
Average gaslimit
Sum of gasused

We have not found minor inconsistencies for Transactions count, Average size and Average gas limit, corresponding to missing data from their side for the genesis day (2023-03-24), where they report nothing, and the same applies for the current day (2024-02-05) as the time of this writing.

Transaction count

Date	Computed Count Transactions	Reported Total Tx
2023-03-24	27	NaN
2024-02-05	16.684	NaN

Average size

Date	Computed Avg Size	Reported Avg Size
2023-03-24	2251.0	NaN
2024-02-05	1023.0	NaN

Average gas limit

Date	Computed Avg Gaslimit	Reported Avg Gaslimit
2023-03-24	28928571.0	NaN
2023-03-26	30000000.0	0.0
2024-02-05	30000000.0	NaN

In this case, they also have reported 0.0 on 2023-03-26.

Problems with gasused

However, then we took a look at the gas used, the table of inconsistencies is considerably larger.

Date	Computed Total Gasused	Reported Total Gasused	Total Gasused Diff
2023-03-24	10.111.343	NaN	NaN
2023-03-25	17.204.572	2.447.147	14.757.425
2023-03-26	6.024.983	0	6.024.983
2023-03-27	1.074.793.278	1.007.274.000	67.519.755
2023-03-28	1.188.061.525	891.388.300	296.673.183
2023-03-29	670.978.288	567.342.400	103.635.929
2023-03-30	589.929.774	529.647.100	60.282.701
2023-03-31	824.299.939	728.500.200	95.799.697
2023-04-01	1.007.672.482	888.389.400	119.283.086
2023-04-02	1.021.405.170	876.240.600	145.164.584
2023-04-03	668.641.654	587.603.100	81.038.583
2023-04-04	1.124.573.230	428.874.200	695.699.011
2023-04-05	415.954.728	393.816.000	22.138.737
2023-04-06	948.971.181	912.685.400	36.285.736
2023-04-07	884.985.979	846.659.100	38.326.916
2023-04-08	736.308.219	706.109.500	30.198.767
2023-04-09	2.550.452.681	2.521.383.000	29.069.383
2023-04-10	2.558.412.276	2.531.997.000	26.415.406
2023-04-11	448.479.495	426.224.100	22.255.361
2023-04-12	379.534.166	362.321.500	17.212.672
2023-04-13	297.533.767	287.313.800	10.220.010
2023-04-14	725.095.552	712.772.800	12.322.749
2023-04-15	955.189.823	934.784.200	20.405.612
2023-04-16	634.728.865	616.205.500	18.523.404
2023-04-17	660.773.678	643.058.600	17.715.046
2023-04-18	397.523.621	384.924.600	12.598.971
2023-04-19	245.335.831	239.039.600	6.296.276
2023-04-20	285.603.918	275.008.200	10.595.748
2023-04-21	411.109.200	398.482.500	12.626.694
2023-04-22	482.495.847	470.511.900	11.983.906
2023-04-23	364.666.178	355.273.400	9.392.791
2023-04-24	455.167.687	446.989.800	8.177.861
2023-04-25	427.500.681	405.402.900	22.097.778
2023-04-26	425.815.952	405.811.200	20.004.794
2023-04-27	439.886.580	403.400.300	36.486.264
2023-04-28	437.781.601	415.226.100	22.555.460
2023-04-29	250.244.759	243.080.000	7.164.739
2023-04-30	221.540.994	211.120.700	10.420.322
2023-05-01	173.610.301	168.043.000	5.567.329
2023-05-02	395.887.444	395.230.200	657.282
2024-02-05	2.470.912.630	NaN	NaN

We found that apart from the "typical" inconsistencies on the "genesis day" and "today", there are inconsistencies every day between 2023-03-25 and 2023-05-02.

We have plotted a chart corresponding to the Total Gassued Diff with hopes of better understanding what was happening.

But the only conclusion that we could obtain from this chart is that we always have a larger amount compared to their reported amount (since all the points in this plot are above the X-axis).

Taking a look at 2023-03-25

We wanted to make sure that everything was running well from our side> Thus, we manually checked the gas used reported by polygonscan, and we compared those numbers to the numbers obtained from our node for the 25th of March of 2023 (the first day of inconsistencies). Checking from the website, we found that the blocks that have a timestamp on 2023-03-25 are the blocks from numbers 28 to 48 (both inclusive). Double-checking this data with the timestamp provided with our node, we obtain the same block range for this day.

The data obtained from our node is the following:

Now, here are the results:

Block number	Data from PolygonScan	Data from our RPC	Equal?
Block 28	21.032	21.032	TRUE
Block 29	90.579	90.579	TRUE
Block 30	1.426.982	1.426.982	TRUE
Block 31	1.427.340	1.427.340	TRUE
Block 32	1.427.052	1.427.052	TRUE
Block 33	119.957	119.957	TRUE
Block 34	90.175	90.175	TRUE
Block 35	1.453.620	1.453.620	TRUE
Block 36	71.269	71.269	TRUE
Block 37	46.923	46.923	TRUE
Block 38	196.597	196.597	TRUE
Block 39	90.557	90.557	TRUE
Block 40	657.706	657.706	TRUE
Block 41	1.427.366	1.427.366	TRUE
Block 42	1.427.868	1.427.868	TRUE
Block 43	1.427.864	1.427.846	FALSE
Block 44	1.427.736	1.427.736	TRUE
Block 45	1.427.460	1.427.460	TRUE
Block 46	1.428.094	1.428.094	TRUE
Block 47	1.427.462	1.427.462	TRUE
Block 48	90.951	90.951	TRUE
SUM	17.204.590	17.204.572

We can see here that the sum of column Data from our RPC checks out with our reported data for this day in the table above. However, their sum (Data from PolygonScan) does not. Checking, line by line, each row, we found what seems to be a typo error from their side: block 43 has a permutation between tens and digits.

Even knowing that the number reported by PolygonScan is really far from the reported number for that day from their website, 2.447.147 (> 14M gas of difference). So, where can this number come from?

Where does this number come from?

We continued to investigate this matter, and we found out that there exists a partial sum of these numbers (gas used from blocks 28 to 48) that adds up to 2.447.147.

Block number	Data from their website
Block 28	21.032
Block 29	90.579
Block 30	1.426.982
Block 31	1.427.340
Block 32	1.427.052
Block 33	119.957
Block 34	90.175
Block 35	1.453.620
Block 36	71.269
Block 37	46.923
Block 38	196.597
Block 39	90.557
Block 40	657.706
Block 41	1.427.366
Block 42	1.427.868
Block 43	1.427.864
Block 44	1.427.736
Block 45	1.427.460
Block 46	1.428.094
Block 47	1.427.462
Block 48	90.951

The gas used, as reported on the PolygonScan website, when totaled for block 28, block 35, block 36, block 37, block 38, and block 40, equals a combined amount of 2.447.147.

21.032 + 1.453.620 + 71.269 + 46.923 + 196.597 + 657.706 = 2.447.147

This finding indicates that the reported amount is not likely to be a random number but that they (most probably) had an issue with the synchronisation process during these dates, and they ended up with a partial sum. However, what is concerning is the fact that, almost a year after this incident, the team behind PolygonScan seems to be unaware of these data inconsistencies and that this issue is still not fixed.

Conclusions

This report has explored how to analyze the data from the Polygon zkEVM. We found multiple difficulties when deploying the node due to outdated documentation and other issues that we have reported. Once we managed to obtain data from the zkEVM, we analyzed it and compared it with the data provided by PolygonScan. As a result, we found a number of inconsistencies in the data published by PolygonScan. After analysing and double-checking every step of our analysis, we have found some concerning points that could explain the inconsistencies in the PolygonScan data.

MigaLabs is open to discussing this issue with the interested parties and providing our code so that anybody can reproduce our methodology and corroborate the correctness of our computations.

← View all posts

About MigaLabs

We are a research group specialized in next-generation Blockchain technology. Our team works on in-depth studies and solutions for Blockchain Scalability, Security and Sustainability.

Inconsistencies in PolygonScan zkEVM Published Data