https://www.migalabs.io/uploads/image_postmigalabs_convert_io_00f09b4e82.webp

Inconsistencies in PolygonScan zkEVM Published Data

MigaLabs

MigaLabs

· 11 min read

Introduction

At MigaLabs, as a Blockchain Observatory, we wanted to extend our reach to Layer 2’s blockchain scalability solutions. Starting with Polygon’s zkEVM. The idea was to start building a dashboard showing the status of the zkEVM. In this report, we detail the different approaches tried, the problems encountered, how we managed to solve them, how we analysed and processed data, and the obtained results.

Problems with the setup

Repository setup

We followed Polygon’s guide in order to set up the zkEVM node. The nominal requirements for the node to run are:

  • 16GB RAM
  • 4-core CPU
  • 20GB of disk (will increase with the chain) (At the time of setup was 20 GB, wow it says 70 GB).

We set up a node and installed the node following the instructions. However, the docker-compose was not working well; there was one variable that was not being used, and the guide specifies the creation of several folders. However, we thought it was easier to create a single folder and to override the variable in the docker-compose file, which makes the setup much easier.

Proposed setup

Run git clone -b fix/docker-deployment git@github.com:migalabs/zkevm-node.git and cd mainnet && mv example.env .env.

Fill the .env file with the instructions from the guide and docker-compose up -d.

First approach

We were aware that the zkEVM node only works with certain types of Intel Processors (they need to support the AVX2 instruction set). We configured a new machine with an Intel CPU, 8GB RAM, and 50GB of disk. This machine was running Ubuntu OS. We deployed the setup, but the prover would not show any logs. We tried running with debug mode, and we also tried executing the binary with the help option, but no log was shown.

Second approach

After our fairy failure, we upgraded the machine with a different CPU and 16GB of RAM. When running the setup, the prover now shows logs! Great, we now have a running node.

After speaking to the internal zkEVM team, we were told the sync process would take around a week or a bit more. Also, the sync process would use more than 500GB of disk, so we had to upgrade the machine again, this time disk-wise.

The zkEVM team kindly provided a snapshot of the database, which could be imported into the zkEVM node database. However, this process would take more than 16GB of RAM, so our node could not handle the restore process. Also, we tried running the node from scratch, but the prover service kept dying as per the memory limit.

Third approach

After understanding that more requirements were needed, we started researching. We found these software requirements for the node, which mention 32GB of RAM. Therefore, we upgraded the machine once again. With the upgraded machine, we could import the snapshot and then run all services as normal; our node is completely synced and functional now.

Issues with RPC

Once we had a node synchronising the network, we had problems interacting with the RPC container. In particular, the eth_syncing method was not working properly, and there were problems enabling debug mode when accessing debug logs on RPC. In addition to this, we found that they should update the docker-compose regularly since they had an outdated version. Finally, we also found that the node stopped working occasionally out of nowhere. We found that the reason was that they periodically release new versions of the zkevm-node container that are not backwards compatible.

Analysis

The analysis of the data is done using Python 3.11.5. The aim of this analysis is to present statistics and metrics of Polygon’s zkEVM Network using our own node. In particular, we plan to investigate and answer a number of research questions that could be raised from this, such as:

  • How many TPS is this rollup able to handle?
  • What is the latency in this rollup?
  • Study the data availability cost and compare it against what it would cost after EIP4844 is deployed.

Among others.

RPC

For the RPC, we have created an RPC class containing the domain and port where the zkevm-rpc container exists, and we implement two basic functionalities:

  • post to make requests.
  • pprint to format the response of the post call.

RPC Utils

RPC Methods used

We have implemented a number of different methods on the RPC.

Set up a class that will make the requests using the RPC specified above.

zkEVM Crawler

The eth_blockNumber and eth_getBlockByNumber methods from the zkEVM's RPC are the most used ones.

3_most_used_methods.png

MigaLabsDB Setup

Since the response from eth_getBlockByNumber is the following:

  • Object - A block object, or null when no block was found:
  • number: QUANTITY - the block number.
  • hash: DATA, 32 Bytes - hash of the block.
  • parentHash: DATA, 32 Bytes - hash of the parent block.
  • nonce: DATA, 8 Bytes - hash of the generated proof-of-work.
  • sha3Uncles: DATA, 32 Bytes - SHA3 of the uncles’ data in the block.
  • logsBloom: DATA, 256 Bytes - the bloom filter for the logs of the block.
  • transactionsRoot: DATA, 32 Bytes - the root of the transaction trie of the block.
  • stateRoot: DATA, 32 Bytes - the root of the final state trie of the block.
  • receiptsRoot: DATA, 32 Bytes - the root of the receipts trie of the block.
  • miner: DATA,20 Bytes - the beneficiary's address to whom the mining rewards are paid.
  • difficulty: QUANTITY - integer of the difficulty for this block.
  • totalDifficulty: QUANTITY - integer of the total difficulty of the chain until this block.
  • extraData: DATA - the “extra data” field of this block.
  • size: QUANTITY - integer the size of this block in bytes.
  • gasLimit: QUANTITY - the maximum gas allowed in this block.
  • gasUsed: QUANTITY - the total used gas by all transactions in this block.
  • timestamp: QUANTITY - the Unix timestamp for when the block was collated.
  • transactions: Array - Array of transaction objects, or 32 Bytes transaction hashes depending on the last given parameter.
  • uncles: Array - Array of uncle hashes.

We set up a PostgreSQL Schema and Table containing these fields, in particular

5_insert_one_block.png

How do we analyse the data?

We first convert and cast the hexadecimal values to ints.

6_casting.png

We obtain data from polygonscan to compare

7_obtain_csv_from_polygon.png

Integrity checks

The integrity checks that we have implemented are over:

  • transactions count
  • Average size
  • Average gaslimit
  • Sum of gasused

We have not found minor inconsistencies for Transactions count, Average size and Average gas limit, corresponding to missing data from their side for the genesis day (2023-03-24), where they report nothing, and the same applies for the current day (2024-02-05) as the time of this writing.

Transaction count

Date Computed Count Transactions Reported Total Tx
2023-03-24 27 NaN
2024-02-05 16.684 NaN

Average size

Date Computed Avg Size Reported Avg Size
2023-03-24 2251.0 NaN
2024-02-05 1023.0 NaN

Average gas limit

Date Computed Avg Gaslimit Reported Avg Gaslimit
2023-03-24 28928571.0 NaN
2023-03-26 30000000.0 0.0
2024-02-05 30000000.0 NaN

In this case, they also have reported 0.0 on 2023-03-26.

Problems with gasused

However, then we took a look at the gas used, the table of inconsistencies is considerably larger.

Date Computed Total Gasused Reported Total Gasused Total Gasused Diff
2023-03-24 10.111.343 NaN NaN
2023-03-25 17.204.572 2.447.147 14.757.425
2023-03-26 6.024.983 0 6.024.983
2023-03-27 1.074.793.278 1.007.274.000 67.519.755
2023-03-28 1.188.061.525 891.388.300 296.673.183
2023-03-29 670.978.288 567.342.400 103.635.929
2023-03-30 589.929.774 529.647.100 60.282.701
2023-03-31 824.299.939 728.500.200 95.799.697
2023-04-01 1.007.672.482 888.389.400 119.283.086
2023-04-02 1.021.405.170 876.240.600 145.164.584
2023-04-03 668.641.654 587.603.100 81.038.583
2023-04-04 1.124.573.230 428.874.200 695.699.011
2023-04-05 415.954.728 393.816.000 22.138.737
2023-04-06 948.971.181 912.685.400 36.285.736
2023-04-07 884.985.979 846.659.100 38.326.916
2023-04-08 736.308.219 706.109.500 30.198.767
2023-04-09 2.550.452.681 2.521.383.000 29.069.383
2023-04-10 2.558.412.276 2.531.997.000 26.415.406
2023-04-11 448.479.495 426.224.100 22.255.361
2023-04-12 379.534.166 362.321.500 17.212.672
2023-04-13 297.533.767 287.313.800 10.220.010
2023-04-14 725.095.552 712.772.800 12.322.749
2023-04-15 955.189.823 934.784.200 20.405.612
2023-04-16 634.728.865 616.205.500 18.523.404
2023-04-17 660.773.678 643.058.600 17.715.046
2023-04-18 397.523.621 384.924.600 12.598.971
2023-04-19 245.335.831 239.039.600 6.296.276
2023-04-20 285.603.918 275.008.200 10.595.748
2023-04-21 411.109.200 398.482.500 12.626.694
2023-04-22 482.495.847 470.511.900 11.983.906
2023-04-23 364.666.178 355.273.400 9.392.791
2023-04-24 455.167.687 446.989.800 8.177.861
2023-04-25 427.500.681 405.402.900 22.097.778
2023-04-26 425.815.952 405.811.200 20.004.794
2023-04-27 439.886.580 403.400.300 36.486.264
2023-04-28 437.781.601 415.226.100 22.555.460
2023-04-29 250.244.759 243.080.000 7.164.739
2023-04-30 221.540.994 211.120.700 10.420.322
2023-05-01 173.610.301 168.043.000 5.567.329
2023-05-02 395.887.444 395.230.200 657.282
2024-02-05 2.470.912.630 NaN NaN

We found that apart from the "typical" inconsistencies on the "genesis day" and "today", there are inconsistencies every day between 2023-03-25 and 2023-05-02.

We have plotted a chart corresponding to the Total Gassued Diff with hopes of better understanding what was happening.

8_graphic.png

But the only conclusion that we could obtain from this chart is that we always have a larger amount compared to their reported amount (since all the points in this plot are above the X-axis).

Taking a look at 2023-03-25

We wanted to make sure that everything was running well from our side> Thus, we manually checked the gas used reported by polygonscan, and we compared those numbers to the numbers obtained from our node for the 25th of March of 2023 (the first day of inconsistencies). Checking from the website, we found that the blocks that have a timestamp on 2023-03-25 are the blocks from numbers 28 to 48 (both inclusive). Double-checking this data with the timestamp provided with our node, we obtain the same block range for this day.

The data obtained from our node is the following:

9_gasused.png

Now, here are the results:

Block number Data from PolygonScan Data from our RPC Equal?
Block 28 21.032 21.032 TRUE
Block 29 90.579 90.579 TRUE
Block 30 1.426.982 1.426.982 TRUE
Block 31 1.427.340 1.427.340 TRUE
Block 32 1.427.052 1.427.052 TRUE
Block 33 119.957 119.957 TRUE
Block 34 90.175 90.175 TRUE
Block 35 1.453.620 1.453.620 TRUE
Block 36 71.269 71.269 TRUE
Block 37 46.923 46.923 TRUE
Block 38 196.597 196.597 TRUE
Block 39 90.557 90.557 TRUE
Block 40 657.706 657.706 TRUE
Block 41 1.427.366 1.427.366 TRUE
Block 42 1.427.868 1.427.868 TRUE
Block 43 1.427.864 1.427.846 FALSE
Block 44 1.427.736 1.427.736 TRUE
Block 45 1.427.460 1.427.460 TRUE
Block 46 1.428.094 1.428.094 TRUE
Block 47 1.427.462 1.427.462 TRUE
Block 48 90.951 90.951 TRUE
SUM 17.204.590 17.204.572

We can see here that the sum of column Data from our RPC checks out with our reported data for this day in the table above. However, their sum (Data from PolygonScan) does not. Checking, line by line, each row, we found what seems to be a typo error from their side: block 43 has a permutation between tens and digits.

Even knowing that the number reported by PolygonScan is really far from the reported number for that day from their website, 2.447.147 (> 14M gas of difference). So, where can this number come from?

Where does this number come from?

We continued to investigate this matter, and we found out that there exists a partial sum of these numbers (gas used from blocks 28 to 48) that adds up to 2.447.147.

Block number Data from their website
Block 28 21.032
Block 29 90.579
Block 30 1.426.982
Block 31 1.427.340
Block 32 1.427.052
Block 33 119.957
Block 34 90.175
Block 35 1.453.620
Block 36 71.269
Block 37 46.923
Block 38 196.597
Block 39 90.557
Block 40 657.706
Block 41 1.427.366
Block 42 1.427.868
Block 43 1.427.864
Block 44 1.427.736
Block 45 1.427.460
Block 46 1.428.094
Block 47 1.427.462
Block 48 90.951

The gas used, as reported on the PolygonScan website, when totaled for block 28, block 35, block 36, block 37, block 38, and block 40, equals a combined amount of 2.447.147.

21.032 + 1.453.620 + 71.269 + 46.923 + 196.597 + 657.706 = 2.447.147

This finding indicates that the reported amount is not likely to be a random number but that they (most probably) had an issue with the synchronisation process during these dates, and they ended up with a partial sum. However, what is concerning is the fact that, almost a year after this incident, the team behind PolygonScan seems to be unaware of these data inconsistencies and that this issue is still not fixed.

Conclusions

This report has explored how to analyze the data from the Polygon zkEVM. We found multiple difficulties when deploying the node due to outdated documentation and other issues that we have reported. Once we managed to obtain data from the zkEVM, we analyzed it and compared it with the data provided by PolygonScan. As a result, we found a number of inconsistencies in the data published by PolygonScan. After analysing and double-checking every step of our analysis, we have found some concerning points that could explain the inconsistencies in the PolygonScan data.

MigaLabs is open to discussing this issue with the interested parties and providing our code so that anybody can reproduce our methodology and corroborate the correctness of our computations.

MigaLabs

About MigaLabs

We are a research group specialized in next-generation Blockchain technology. Our team works on in-depth studies and solutions for Blockchain Scalability, Security and Sustainability.
Powered by Migalabs | Made with Web3Templates