Introduction
At MigaLabs, as a Blockchain Observatory, we wanted to extend our reach to Layer 2’s blockchain scalability solutions. Starting with Polygon’s zkEVM. The idea was to start building a dashboard showing the status of the zkEVM. In this report, we detail the different approaches tried, the problems encountered, how we managed to solve them, how we analysed and processed data, and the obtained results.
Problems with the setup
Repository setup
We followed Polygon’s guide in order to set up the zkEVM node. The nominal requirements for the node to run are:
- 16GB RAM
- 4-core CPU
- 20GB of disk (will increase with the chain) (At the time of setup was 20 GB, wow it says 70 GB).
We set up a node and installed the node following the instructions. However, the docker-compose was not working well; there was one variable that was not being used, and the guide specifies the creation of several folders. However, we thought it was easier to create a single folder and to override the variable in the docker-compose file, which makes the setup much easier.
Proposed setup
Run git clone -b fix/docker-deployment git@github.com:migalabs/zkevm-node.git
and cd mainnet && mv example.env .env
.
Fill the .env
file with the instructions from the guide and docker-compose up -d
.
First approach
We were aware that the zkEVM node only works with certain types of Intel Processors (they need to support the AVX2 instruction set). We configured a new machine with an Intel CPU, 8GB RAM, and 50GB of disk. This machine was running Ubuntu OS. We deployed the setup, but the prover would not show any logs. We tried running with debug mode, and we also tried executing the binary with the help option, but no log was shown.
Second approach
After our fairy failure, we upgraded the machine with a different CPU and 16GB of RAM. When running the setup, the prover now shows logs! Great, we now have a running node.
After speaking to the internal zkEVM team, we were told the sync process would take around a week or a bit more. Also, the sync process would use more than 500GB of disk, so we had to upgrade the machine again, this time disk-wise.
The zkEVM team kindly provided a snapshot of the database, which could be imported into the zkEVM node database. However, this process would take more than 16GB of RAM, so our node could not handle the restore process. Also, we tried running the node from scratch, but the prover service kept dying as per the memory limit.
Third approach
After understanding that more requirements were needed, we started researching. We found these software requirements for the node, which mention 32GB of RAM. Therefore, we upgraded the machine once again. With the upgraded machine, we could import the snapshot and then run all services as normal; our node is completely synced and functional now.
Issues with RPC
Once we had a node synchronising the network, we had problems interacting with the RPC container. In particular, the eth_syncing
method was not working properly, and there were problems enabling debug mode when accessing debug logs on RPC. In addition to this, we found that they should update the docker-compose regularly since they had an outdated version. Finally, we also found that the node stopped working occasionally out of nowhere. We found that the reason was that they periodically release new versions of the zkevm-node container that are not backwards compatible.
Analysis
The analysis of the data is done using Python 3.11.5. The aim of this analysis is to present statistics and metrics of Polygon’s zkEVM Network using our own node. In particular, we plan to investigate and answer a number of research questions that could be raised from this, such as:
- How many TPS is this rollup able to handle?
- What is the latency in this rollup?
- Study the data availability cost and compare it against what it would cost after EIP4844 is deployed.
Among others.
RPC
For the RPC, we have created an RPC class containing the domain and port where the zkevm-rpc
container exists, and we implement two basic functionalities:
post
to make requests.pprint
to format the response of the post call.
RPC Methods used
We have implemented a number of different methods on the RPC.
Set up a class that will make the requests using the RPC specified above.
The eth_blockNumber
and eth_getBlockByNumber
methods from the zkEVM's RPC are the most used ones.
MigaLabsDB Setup
Since the response from eth_getBlockByNumber is the following:
- Object - A block object, or null when no block was found:
- number: QUANTITY - the block number.
- hash: DATA, 32 Bytes - hash of the block.
- parentHash: DATA, 32 Bytes - hash of the parent block.
- nonce: DATA, 8 Bytes - hash of the generated proof-of-work.
- sha3Uncles: DATA, 32 Bytes - SHA3 of the uncles’ data in the block.
- logsBloom: DATA, 256 Bytes - the bloom filter for the logs of the block.
- transactionsRoot: DATA, 32 Bytes - the root of the transaction trie of the block.
- stateRoot: DATA, 32 Bytes - the root of the final state trie of the block.
- receiptsRoot: DATA, 32 Bytes - the root of the receipts trie of the block.
- miner: DATA,20 Bytes - the beneficiary's address to whom the mining rewards are paid.
- difficulty: QUANTITY - integer of the difficulty for this block.
- totalDifficulty: QUANTITY - integer of the total difficulty of the chain until this block.
- extraData: DATA - the “extra data” field of this block.
- size: QUANTITY - integer the size of this block in bytes.
- gasLimit: QUANTITY - the maximum gas allowed in this block.
- gasUsed: QUANTITY - the total used gas by all transactions in this block.
- timestamp: QUANTITY - the Unix timestamp for when the block was collated.
- transactions: Array - Array of transaction objects, or 32 Bytes transaction hashes depending on the last given parameter.
- uncles: Array - Array of uncle hashes.
We set up a PostgreSQL Schema and Table containing these fields, in particular
How do we analyse the data?
We first convert and cast the hexadecimal values to ints.
We obtain data from polygonscan to compare
Integrity checks
The integrity checks that we have implemented are over:
transactions count
- Average
size
- Average
gaslimit
- Sum of
gasused
We have not found minor inconsistencies for Transactions count, Average size and Average gas limit, corresponding to missing data from their side for the genesis day (2023-03-24), where they report nothing, and the same applies for the current day (2024-02-05) as the time of this writing.
Transaction count
Date | Computed Count Transactions | Reported Total Tx |
---|---|---|
2023-03-24 | 27 | NaN |
2024-02-05 | 16.684 | NaN |
Average size
Date | Computed Avg Size | Reported Avg Size |
---|---|---|
2023-03-24 | 2251.0 | NaN |
2024-02-05 | 1023.0 | NaN |
Average gas limit
Date | Computed Avg Gaslimit | Reported Avg Gaslimit |
---|---|---|
2023-03-24 | 28928571.0 | NaN |
2023-03-26 | 30000000.0 | 0.0 |
2024-02-05 | 30000000.0 | NaN |
In this case, they also have reported 0.0
on 2023-03-26.
Problems with gasused
However, then we took a look at the gas used, the table of inconsistencies is considerably larger.
Date | Computed Total Gasused | Reported Total Gasused | Total Gasused Diff |
---|---|---|---|
2023-03-24 | 10.111.343 | NaN | NaN |
2023-03-25 | 17.204.572 | 2.447.147 | 14.757.425 |
2023-03-26 | 6.024.983 | 0 | 6.024.983 |
2023-03-27 | 1.074.793.278 | 1.007.274.000 | 67.519.755 |
2023-03-28 | 1.188.061.525 | 891.388.300 | 296.673.183 |
2023-03-29 | 670.978.288 | 567.342.400 | 103.635.929 |
2023-03-30 | 589.929.774 | 529.647.100 | 60.282.701 |
2023-03-31 | 824.299.939 | 728.500.200 | 95.799.697 |
2023-04-01 | 1.007.672.482 | 888.389.400 | 119.283.086 |
2023-04-02 | 1.021.405.170 | 876.240.600 | 145.164.584 |
2023-04-03 | 668.641.654 | 587.603.100 | 81.038.583 |
2023-04-04 | 1.124.573.230 | 428.874.200 | 695.699.011 |
2023-04-05 | 415.954.728 | 393.816.000 | 22.138.737 |
2023-04-06 | 948.971.181 | 912.685.400 | 36.285.736 |
2023-04-07 | 884.985.979 | 846.659.100 | 38.326.916 |
2023-04-08 | 736.308.219 | 706.109.500 | 30.198.767 |
2023-04-09 | 2.550.452.681 | 2.521.383.000 | 29.069.383 |
2023-04-10 | 2.558.412.276 | 2.531.997.000 | 26.415.406 |
2023-04-11 | 448.479.495 | 426.224.100 | 22.255.361 |
2023-04-12 | 379.534.166 | 362.321.500 | 17.212.672 |
2023-04-13 | 297.533.767 | 287.313.800 | 10.220.010 |
2023-04-14 | 725.095.552 | 712.772.800 | 12.322.749 |
2023-04-15 | 955.189.823 | 934.784.200 | 20.405.612 |
2023-04-16 | 634.728.865 | 616.205.500 | 18.523.404 |
2023-04-17 | 660.773.678 | 643.058.600 | 17.715.046 |
2023-04-18 | 397.523.621 | 384.924.600 | 12.598.971 |
2023-04-19 | 245.335.831 | 239.039.600 | 6.296.276 |
2023-04-20 | 285.603.918 | 275.008.200 | 10.595.748 |
2023-04-21 | 411.109.200 | 398.482.500 | 12.626.694 |
2023-04-22 | 482.495.847 | 470.511.900 | 11.983.906 |
2023-04-23 | 364.666.178 | 355.273.400 | 9.392.791 |
2023-04-24 | 455.167.687 | 446.989.800 | 8.177.861 |
2023-04-25 | 427.500.681 | 405.402.900 | 22.097.778 |
2023-04-26 | 425.815.952 | 405.811.200 | 20.004.794 |
2023-04-27 | 439.886.580 | 403.400.300 | 36.486.264 |
2023-04-28 | 437.781.601 | 415.226.100 | 22.555.460 |
2023-04-29 | 250.244.759 | 243.080.000 | 7.164.739 |
2023-04-30 | 221.540.994 | 211.120.700 | 10.420.322 |
2023-05-01 | 173.610.301 | 168.043.000 | 5.567.329 |
2023-05-02 | 395.887.444 | 395.230.200 | 657.282 |
2024-02-05 | 2.470.912.630 | NaN | NaN |
We found that apart from the "typical" inconsistencies on the "genesis day" and "today", there are inconsistencies every day between 2023-03-25 and 2023-05-02.
We have plotted a chart corresponding to the Total Gassued Diff with hopes of better understanding what was happening.
But the only conclusion that we could obtain from this chart is that we always have a larger amount compared to their reported amount (since all the points in this plot are above the X-axis).
Taking a look at 2023-03-25
We wanted to make sure that everything was running well from our side> Thus, we manually checked the gas used reported by polygonscan, and we compared those numbers to the numbers obtained from our node for the 25th of March of 2023 (the first day of inconsistencies). Checking from the website, we found that the blocks that have a timestamp on 2023-03-25 are the blocks from numbers 28 to 48 (both inclusive). Double-checking this data with the timestamp provided with our node, we obtain the same block range for this day.
The data obtained from our node is the following:
Now, here are the results:
Block number | Data from PolygonScan | Data from our RPC | Equal? |
---|---|---|---|
Block 28 | 21.032 | 21.032 | TRUE |
Block 29 | 90.579 | 90.579 | TRUE |
Block 30 | 1.426.982 | 1.426.982 | TRUE |
Block 31 | 1.427.340 | 1.427.340 | TRUE |
Block 32 | 1.427.052 | 1.427.052 | TRUE |
Block 33 | 119.957 | 119.957 | TRUE |
Block 34 | 90.175 | 90.175 | TRUE |
Block 35 | 1.453.620 | 1.453.620 | TRUE |
Block 36 | 71.269 | 71.269 | TRUE |
Block 37 | 46.923 | 46.923 | TRUE |
Block 38 | 196.597 | 196.597 | TRUE |
Block 39 | 90.557 | 90.557 | TRUE |
Block 40 | 657.706 | 657.706 | TRUE |
Block 41 | 1.427.366 | 1.427.366 | TRUE |
Block 42 | 1.427.868 | 1.427.868 | TRUE |
Block 43 | 1.427.864 | 1.427.846 | FALSE |
Block 44 | 1.427.736 | 1.427.736 | TRUE |
Block 45 | 1.427.460 | 1.427.460 | TRUE |
Block 46 | 1.428.094 | 1.428.094 | TRUE |
Block 47 | 1.427.462 | 1.427.462 | TRUE |
Block 48 | 90.951 | 90.951 | TRUE |
SUM | 17.204.590 | 17.204.572 |
We can see here that the sum of column Data from our RPC
checks out with our reported data for this day in the table above. However, their sum (Data from PolygonScan
) does not. Checking, line by line, each row, we found what seems to be a typo error from their side: block 43 has a permutation between tens and digits.
Even knowing that the number reported by PolygonScan is really far from the reported number for that day from their website, 2.447.147
(> 14M
gas of difference). So, where can this number come from?
Where does this number come from?
We continued to investigate this matter, and we found out that there exists a partial sum of these numbers (gas used from blocks 28 to 48) that adds up to 2.447.147
.
Block number | Data from their website |
---|---|
Block 28 | 21.032 |
Block 29 | 90.579 |
Block 30 | 1.426.982 |
Block 31 | 1.427.340 |
Block 32 | 1.427.052 |
Block 33 | 119.957 |
Block 34 | 90.175 |
Block 35 | 1.453.620 |
Block 36 | 71.269 |
Block 37 | 46.923 |
Block 38 | 196.597 |
Block 39 | 90.557 |
Block 40 | 657.706 |
Block 41 | 1.427.366 |
Block 42 | 1.427.868 |
Block 43 | 1.427.864 |
Block 44 | 1.427.736 |
Block 45 | 1.427.460 |
Block 46 | 1.428.094 |
Block 47 | 1.427.462 |
Block 48 | 90.951 |
The gas used, as reported on the PolygonScan website, when totaled for block 28, block 35, block 36, block 37, block 38, and block 40, equals a combined amount of 2.447.147.
21.032 + 1.453.620 + 71.269 + 46.923 + 196.597 + 657.706 = 2.447.147
This finding indicates that the reported amount is not likely to be a random number but that they (most probably) had an issue with the synchronisation process during these dates, and they ended up with a partial sum. However, what is concerning is the fact that, almost a year after this incident, the team behind PolygonScan seems to be unaware of these data inconsistencies and that this issue is still not fixed.
Conclusions
This report has explored how to analyze the data from the Polygon zkEVM. We found multiple difficulties when deploying the node due to outdated documentation and other issues that we have reported. Once we managed to obtain data from the zkEVM, we analyzed it and compared it with the data provided by PolygonScan. As a result, we found a number of inconsistencies in the data published by PolygonScan. After analysing and double-checking every step of our analysis, we have found some concerning points that could explain the inconsistencies in the PolygonScan data.
MigaLabs is open to discussing this issue with the interested parties and providing our code so that anybody can reproduce our methodology and corroborate the correctness of our computations.