Inconsistencies in PolygonScan zkEVM Published Data

MigaLabs

Mon Mar 11 2024

Introduction

At MigaLabs, as a Blockchain Observatory, we wanted to extend our reach to Layer 2’s blockchain scalability solutions. Starting with Polygon’s zkEVM. The idea was to start building a dashboard showing the status of the zkEVM. In this report, we detail the different approaches tried, the problems encountered, how we managed to solve them, how we analysed and processed data, and the obtained results.

Problems with the setup

Repository setup

We followed Polygon’s guide in order to set up the zkEVM node. The nominal requirements for the node to run are

16GB RAM
4-core CPU
20GB of disk (will increase with the chain) (At the time of setup was 20 GB, wow it says 70 GB).

We set up a node and installed the node following the instructions. However, the dockercompose was not working well; there was one variable that was not being used, and the guide specifies the creation of several folders. However, we thought it was easier to create a single folder and to override the variable in the dockercompose file, which makes the setup much easier.

Proposed setup

Run git clone b fix/dockerdeployment git@github.commigalabs/zkevmnode.git and cd mainnet && mv example.env .env.

Fill the .env file with the instructions from the guide and dockercompose up d.

First approach

We were aware that the zkEVM node only works with certain types of Intel Processors (they need to support the AVX2 instruction set). We configured a new machine with an Intel CPU, 8GB RAM, and 50GB of disk. This machine was running Ubuntu OS. We deployed the setup, but the prover would not show any logs. We tried running with debug mode, and we also tried executing the binary with the help option, but no log was shown.

Second approach

After our fairy failure, we upgraded the machine with a different CPU and 16GB of RAM. When running the setup, the prover now shows logs! Great, we now have a running node.

After speaking to the internal zkEVM team, we were told the sync process would take around a week or a bit more. Also, the sync process would use more than 500GB of disk, so we had to upgrade the machine again, this time diskwise.

The zkEVM team kindly provided a snapshot of the database, which could be imported into the zkEVM node database. However, this process would take more than 16GB of RAM, so our node could not handle the restore process. Also, we tried running the node from scratch, but the prover service kept dying as per the memory limit.

Third approach

After understanding that more requirements were needed, we started researching. We found these software requirements for the node, which mention 32GB of RAM. Therefore, we upgraded the machine once again. With the upgraded machine, we could import the snapshot and then run all services as normal; our node is completely synced and functional now.

Issues with RPC

Once we had a node synchronising the network, we had problems interacting with the RPC container. In particular, the eth_syncing method was not working properly, and there were problems enabling debug mode when accessing debug logs on RPC. In addition to this, we found that they should update the dockercompose regularly since they had an outdated version. Finally, we also found that the node stopped working occasionally out of nowhere. We found that the reason was that they periodically release new versions of the zkevmnode container that are not backwards compatible.

Analysis

The analysis of the data is done using Python 3.11.5. The aim of this analysis is to present statistics and metrics of Polygon’s zkEVM Network using our own node. In particular, we plan to investigate and answer a number of research questions that could be raised from this, such as

How many TPS is this rollup able to handle?
What is the latency in this rollup?
Study the data availability cost and compare it against what it would cost after EIP4844 is deployed.

Among others.

RPC

For the RPC, we have created an RPC class containing the domain and port where the zkevmrpc container exists, and we implement two basic functionalities

post to make requests.
pprint to format the response of the post call.

RPC Utils

RPC Methods used

We have implemented a number of different methods on the RPC.

Set up a class that will make the requests using the RPC specified above.

zkEVM Crawler

The eth_blockNumber and eth_getBlockByNumber methods from the zkEVM's RPC are the most used ones.

MigaLabsDB Setup

Since the response from eth_getBlockByNumber is the following

Object - A block object, or null when no block was found:
number: QUANTITY - the block number.
hash: DATA, 32 Bytes - hash of the block.
parentHash: DATA, 32 Bytes - hash of the parent block.
nonce: DATA, 8 Bytes - hash of the generated proof-of-work.
sha3Uncles: DATA, 32 Bytes - SHA3 of the uncles’ data in the block.
logsBloom: DATA, 256 Bytes - the bloom filter for the logs of the block.
transactionsRoot: DATA, 32 Bytes - the root of the transaction trie of the block.
stateRoot: DATA, 32 Bytes - the root of the final state trie of the block.
receiptsRoot: DATA, 32 Bytes - the root of the receipts trie of the block.
miner: DATA,20 Bytes - the beneficiary's address to whom the mining rewards are paid.
difficulty: QUANTITY - integer of the difficulty for this block.
totalDifficulty: QUANTITY - integer of the total difficulty of the chain until this block.
extraData: DATA - the “extra data” field of this block.
size: QUANTITY - integer the size of this block in bytes.
gasLimit: QUANTITY - the maximum gas allowed in this block.
gasUsed: QUANTITY - the total used gas by all transactions in this block.
timestamp: QUANTITY - the Unix timestamp for when the block was collated.
transactions: Array - Array of transaction objects, or 32 Bytes transaction hashes depending on the last given parameter.
uncles: Array - Array of uncle hashes.

We set up a PostgreSQL Schema and Table containing these fields, in particular

How do we analyse the data?

We first convert and cast the hexadecimal values to ints.

We obtain data from polygonscan to compare

Integrity checks

The integrity checks that we have implemented are over

transactions count
Average size
Average gaslimit
Sum of gasused

We have not found minor inconsistencies for Transactions count, Average size and Average gas limit, corresponding to missing data from their side for the genesis day (20230324), where they report nothing, and the same applies for the current day (20240205) as the time of this writing.

Transaction count

Date Computed Count Transactions Reported Total Tx 20230324 27 NaN 20240205 16.684 NaN

Average size

Date Computed Avg Size Reported Avg Size 20230324 2251.0 NaN 20240205 1023.0 NaN

Average gas limit

Date Computed Avg Gaslimit Reported Avg Gaslimit 20230324 28928571.0 NaN 20230326 30000000.0 0.0 20240205 30000000.0 NaN

In this case, they also have reported 0.0 on 20230326.

Problems with gasused

However, then we took a look at the gas used, the table of inconsistencies is considerably larger.

Date Computed Total Gasused Reported Total Gasused Total Gasused Diff 20230324 10.111.343 NaN NaN 20230325 17.204.572 2.447.147 14.757.425 20230326 6.024.983 0 6.024.983 20230327 1.074.793.278 1.007.274.000 67.519.755 20230328 1.188.061.525 891.388.300 296.673.183 20230329 670.978.288 567.342.400 103.635.929 20230330 589.929.774 529.647.100 60.282.701 20230331 824.299.939 728.500.200 95.799.697 20230401 1.007.672.482 888.389.400 119.283.086 20230402 1.021.405.170 876.240.600 145.164.584 20230403 668.641.654 587.603.100 81.038.583 20230404 1.124.573.230 428.874.200 695.699.011 20230405 415.954.728 393.816.000 22.138.737 20230406 948.971.181 912.685.400 36.285.736 20230407 884.985.979 846.659.100 38.326.916 20230408 736.308.219 706.109.500 30.198.767 20230409 2.550.452.681 2.521.383.000 29.069.383 20230410 2.558.412.276 2.531.997.000 26.415.406 20230411 448.479.495 426.224.100 22.255.361 20230412 379.534.166 362.321.500 17.212.672 20230413 297.533.767 287.313.800 10.220.010 20230414 725.095.552 712.772.800 12.322.749 20230415 955.189.823 934.784.200 20.405.612 20230416 634.728.865 616.205.500 18.523.404 20230417 660.773.678 643.058.600 17.715.046 20230418 397.523.621 384.924.600 12.598.971 20230419 245.335.831 239.039.600 6.296.276 20230420 285.603.918 275.008.200 10.595.748 20230421 411.109.200 398.482.500 12.626.694 20230422 482.495.847 470.511.900 11.983.906 20230423 364.666.178 355.273.400 9.392.791 20230424 455.167.687 446.989.800 8.177.861 20230425 427.500.681 405.402.900 22.097.778 20230426 425.815.952 405.811.200 20.004.794 20230427 439.886.580 403.400.300 36.486.264 20230428 437.781.601 415.226.100 22.555.460 20230429 250.244.759 243.080.000 7.164.739 20230430 221.540.994 211.120.700 10.420.322 20230501 173.610.301 168.043.000 5.567.329 20230502 395.887.444 395.230.200 657.282 20240205 2.470.912.630 NaN NaN

We found that apart from the "typical" inconsistencies on the "genesis day" and "today", there are inconsistencies every day between 20230325 and 20230502.

We have plotted a chart corresponding to the Total Gassued Diff with hopes of better understanding what was happening.

But the only conclusion that we could obtain from this chart is that we always have a larger amount compared to their reported amount (since all the points in this plot are above the Xaxis).

Taking a look at 2023-03-25

We wanted to make sure that everything was running well from our side> Thus, we manually checked the gas used reported by polygonscan, and we compared those numbers to the numbers obtained from our node for the 25th of March of 2023 (the first day of inconsistencies). Checking from the website, we found that the blocks that have a timestamp on 20230325 are the blocks from numbers 28 to 48 (both inclusive). Doublechecking this data with the timestamp provided with our node, we obtain the same block range for this day.

The data obtained from our node is the following

Now, here are the results

Block number Data from PolygonScan Data from our RPC Equal? Block 28 21.032 21.032 TRUE Block 29 90.579 90.579 TRUE Block 30 1.426.982 1.426.982 TRUE Block 31 1.427.340 1.427.340 TRUE Block 32 1.427.052 1.427.052 TRUE Block 33 119.957 119.957 TRUE Block 34 90.175 90.175 TRUE Block 35 1.453.620 1.453.620 TRUE Block 36 71.269 71.269 TRUE Block 37 46.923 46.923 TRUE Block 38 196.597 196.597 TRUE Block 39 90.557 90.557 TRUE Block 40 657.706 657.706 TRUE Block 41 1.427.366 1.427.366 TRUE Block 42 1.427.868 1.427.868 TRUE Block 43 1.427.864 1.427.846 FALSE Block 44 1.427.736 1.427.736 TRUE Block 45 1.427.460 1.427.460 TRUE Block 46 1.428.094 1.428.094 TRUE Block 47 1.427.462 1.427.462 TRUE Block 48 90.951 90.951 TRUE SUM 17.204.590 17.204.572

We can see here that the sum of column Data from our RPC checks out with our reported data for this day in the table above. However, their sum (Data from PolygonScan) does not. Checking, line by line, each row, we found what seems to be a typo error from their side block 43 has a permutation between tens and digits.

Even knowing that the number reported by PolygonScan is really far from the reported number for that day from their website, 2.447.147 (> 14M gas of difference). So, where can this number come from?

Where does this number come from?

We continued to investigate this matter, and we found out that there exists a partial sum of these numbers (gas used from blocks 28 to 48) that adds up to 2.447.147.

Block number Data from their website Block 28 21.032 Block 29 90.579 Block 30 1.426.982 Block 31 1.427.340 Block 32 1.427.052 Block 33 119.957 Block 34 90.175 Block 35 1.453.620 Block 36 71.269 Block 37 46.923 Block 38 196.597 Block 39 90.557 Block 40 657.706 Block 41 1.427.366 Block 42 1.427.868 Block 43 1.427.864 Block 44 1.427.736 Block 45 1.427.460 Block 46 1.428.094 Block 47 1.427.462 Block 48 90.951

The gas used, as reported on the PolygonScan website, when totaled for block 28, block 35, block 36, block 37, block 38, and block 40, equals a combined amount of 2.447.147.

21.032 + 1.453.620 + 71.269 + 46.923 + 196.597 + 657.706 = 2.447.147

This finding indicates that the reported amount is not likely to be a random number but that they (most probably) had an issue with the synchronisation process during these dates, and they ended up with a partial sum. However, what is concerning is the fact that, almost a year after this incident, the team behind PolygonScan seems to be unaware of these data inconsistencies and that this issue is still not fixed.

Conclusions

This report has explored how to analyze the data from the Polygon zkEVM. We found multiple difficulties when deploying the node due to outdated documentation and other issues that we have reported. Once we managed to obtain data from the zkEVM, we analyzed it and compared it with the data provided by PolygonScan. As a result, we found a number of inconsistencies in the data published by PolygonScan. After analysing and doublechecking every step of our analysis, we have found some concerning points that could explain the inconsistencies in the PolygonScan data.

MigaLabs is open to discussing this issue with the interested parties and providing our code so that anybody can reproduce our methodology and corroborate the correctness of our computations.