blockchain-etl / polygon-etl Goto Github PK
View Code? Open in Web Editor NEWETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
License: MIT License
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
License: MIT License
We are currently using the Polygon dataset available in BigQuery. In this process, every hour we query the logs
table for the max block number. This query always returns the expected results. But, in one instance we got back a weird number i.e. 7277816997830857785
.
Now this only happened once, but has caused us a lot of headaches due to a run-away process that was trying to catch up to this number.
It would be very helpful if anyone familiar with the ETL process to provide any insight on why this would have been the case. Is there some temporary state the data was in when this query was run?
I tried looking into the implementation in this repo but did not make any meaningful progress.
This occurred around 10 pm US Pacific Standard Time on Dec 26th but has not been reproducible since then.
Polygon recently implemented EIP-1559. This has introduced a base fee per block and miner tips. Would it be possible to update the schema?
Hello!
If I am not mistaken, there is no data in the crypto_polygon dataset in BigQuery since 2022-09-06.
See this query output:
select date(block_timestamp), count(*)
from `bigquery-public-data`.crypto_polygon.transactions
where true
AND block_timestamp > '2022-06-01'
group by 1 order by 1 desc
limit 1000
Am I missing something?
Thanks,
- JO
As you can see at this polygon transactions, you can see that the hash is on 2022-02-09 00:26:21 UTC
in 24714823
select
* from `public-data-finance.crypto_polygon.transactions`
where TIMESTAMP_TRUNC(block_timestamp, hour) = TIMESTAMP_TRUNC("2022-02-09 00:00:00", hour)
AND `hash` = "0xd481ff0f96cdea342e8baa7d3ca65d59589cdec2782b0b7b4f65731bfae0e63e"
;
But when you look at https://polygonscan.com/tx/0xd481ff0f96cdea342e8baa7d3ca65d59589cdec2782b0b7b4f65731bfae0e63e
The transaction happened in Feb-09-2022 12:34:55 AM +UTC
So we deep dive into the logs, we found the transaction is logged in two different blocks and two different block timestamp as well.
select
*
from `public-data-finance.crypto_polygon.transactions`
where TIMESTAMP_TRUNC(block_timestamp, hour) >= TIMESTAMP_TRUNC("2022-02-09 00:00:00", hour)
AND `hash` = "0xd481ff0f96cdea342e8baa7d3ca65d59589cdec2782b0b7b4f65731bfae0e63e"
but we couldn't find the transaction in https://polygonscan.com/txs?block=24714823 at all.
Can anyone help deep dive in this more?
The last available block in bigquery-public-data.crypto_polygon.blocks
is block 32751950, which has timestamp 2022-09-06 06:01:58 UTC. Blocks after this date are not included anymore in the table.
This is a minor, but annoying issue:
In the schema for the traces table, the traceid is the last column in the public data crypto-ethereum dataset, but is NOT the last column in the crypto-polygon dataset.
That prevents me from writing concise cross-blockchain queries (using, for example, UNION). It'd be awesome if the schemas for the crypto-polygon and crypto-ethereum datasets were identical.
Description:
I have deployed ERC-20 contract on the Polygon testnet (Mumbai)
While testing different commands, i have noticed that calling:
polygonetl export_contracts --contract-addresses contract_addresses.txt --provider-uri http://matic-mumbai.chainstacklabs.com --output contracts.csv
returns all contracts, but is_erc20 and is_erc721 fields are always FALSE
Enviroment:
PolygonEtl version: polygonetl, version 0.1.8
Network: Polygon Mumbai testnet
JSON-RPC: http://matic-mumbai.chainstacklabs.com
Chain id: 80001
Transaction hash: 0xb2b0a3a0c07dbf7be8d52f0cb3990bf5ef0088121bf23854ef6f1b7f6c80eac7
Contract Address: 0xCc5dDbAECDB40993d779b68b0CE503219aE5A144
Contract file: https://github.com/OpenZeppelin/openzeppelin-contracts/blob/master/contracts/token/ERC20/ERC20.sol
Steps to reproduce:
1. polygonetl export_blocks_and_transactions --start-block 26347319 --end-block 26347321 --provider-uri http://matic-mumbai.chainstacklabs.com --blocks-output blocks.csv --transactions-output transactions.csv
2. polygonetl extract_csv_column --input transactions.csv --column hash --output transaction_hashes.txt
3. polygonetl export_receipts_and_logs --transaction-hashes transaction_hashes.txt --provider-uri http://matic-mumbai.chainstacklabs.com --receipts-output receipts.csv --logs-output logs.csv
4. polygonetl extract_token_transfers --logs logs.csv --output token_transfers.csv
5. polygonetl extract_csv_column --input receipts.csv --column contract_address --output contract_addresses.txt
6. polygonetl export_contracts --contract-addresses contract_addresses.txt --provider-uri http://matic-mumbai.chainstacklabs.com --output contracts.csv
Check contracts.csv file and for contract (addr = 0xcc5ddbaecdb40993d779b68b0ce503219ae5a144) is_erc20 and is_erc721 fields are always FALSE
I followed the streaming documentation successfully and went through the dataflow documentation, but found it's both not up to date and doesn't work for me.
#1 for the bigquery command, this isn't in the repo:
src/main/resources/errors-schema.json
#2 This appears in the logs, but no tables are ever created in BigQuery:
"Executing operation polygonBlocksReadFromPubSub/PubsubUnboundedSource+polygonBlocksReadFromPubSub/MapElements/Map+polygonBlocksConvertToTableRows+polygonBlocksWriteToBigQuery/PrepareWrite/ParDo(Anonymous)+polygonBlocksWriteToBigQuery/StreamingInserts/CreateTables/ParDo(CreateTables)+polygonBlocksWriteToBigQuery/StreamingInserts/StreamingWriteTables/ShardTableWrites+polygonBlocksWriteToBigQuery/StreamingInserts/StreamingWriteTables/TagWithUniqueIds+polygonBlocksWriteToBigQuery/StreamingInserts/StreamingWriteTables/Reshuffle/Window.Into()/Window.Assign+polygonBlocksWriteToBigQuery/StreamingInserts/StreamingWriteTables/Reshuffle/GroupByKey/WriteStream"
#3 In the PolygonBlocksWriteToBigQuery stage, there are a bunch of warns, I'm not sure if that's relevant, but no more logs appear after that. I've mostly focused on getting the blocks from my local polygon to bigquery:
2022-07-07T18:55:58.630ZOperation ongoing in step polygonBlocksWriteToBigQuery/StreamingInserts/StreamingWriteTables/StreamingWrite for at least 02h10m00s without outputting or completing in state finish at [email protected]/jdk.internal.misc.Unsafe.park(Native Method) at [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) at [email protected]/java.util.concurrent.FutureTask.awaitDone(FutureTask.java:447) at [email protected]/java.util.concurrent.FutureTask.get(FutureTask.java:190) at app//org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:817) at app//org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:882) at app//org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:143) at app//org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:115) at app//org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn$DoFnInvoker.invokeFinishBundle(Unknown Source)
I would really like to get this ETL job running properly so any advice would be great
base_fee_per_gas is missing in public-data-finance.crypto_polygon.blocks for the past 1 month
Hi blockchain-etl team!
I got a question - can I speed up somehow extractions of receipts and logs?
Setup:
-polygonetl, version 3.4.2
(ran in a kubernetes pod with 3000Mi memory and 1500m cpu)
-Querying own polygon archive as provider
Steps:
-starting extraction with command polygonetl export_all -s, 19070000 -e 25069999
Outcome:
-exporting receipts and logs takes >~90% of the time needed to extract info from a batch of blocks f.e.
exporting 1000 blocks' transactions or ERC20 transfers takes <1 minute exporting the receipts and logs for the same batch of blocks takes ~1 hour
Expected:
-is there any way I can speed this up?
I can provide more info, logs or whatever needed.
Thanks!
Hey everyone, can anyone help me investigate a weird balance / lack of transaction on polygonscan? My team is working on investigate a few weird matic balances in our data and we came across Hop Protocol which allows in the examples here to bridge matic from the Ethereum network to Polygon.
We isolated a few examples of addresses clearly sending matic to polygon, but polygonscan and oklink don't show any incoming transaction for the matic itself on ploygon.
This address: https://polygonscan.com/address/0xfdf6c6096e70799e53d6172d32c6c3b6f3d1f38f
Has only one transaction where it sends about 17.3K matic to anyswap to bridge to Boba. But, nothing shows where the Matic came from in the first place.
I found this transaction on ETH https://etherscan.io/tx/0x81ccb965f059f53a6edd005a3f3645840c24f24440eb26a04e2b0ebee757dd95 - "send to L2", of the same amount, happening before.
So that's where the Matic is coming from. However, I need a way to see the inbound matic on the Polygon chain. Is there a missing transaction here, how does the matic actually ended up in that address?
Also look at this address : https://polygonscan.com/address/0x718c0c212db4fc57f8fe14e92229a604df666ee8
it also has outbound transactions higher that inbound for matic.
Could you please help me with the following error
/lib/python3.9/site-packages/web3/manager.py", line 112, in request_blocking raise ValueError(response["error"]) ValueError: {'code': -32004, 'message': 'the method eth_newFilter does not exist/is not available'}
I get this error on executing the polygonetl export_all -p https://polygon-rpc.com -s 1100 -e 1111 -b 10 -o ./ command.
I have changed the provider uril to our local setup node as well but get the same error
hey guys
I have a problem, so I installed the package on ubuntu wsl windows but when I want to run any command I get a weird issue
Traceback (most recent call last):
File "/home/user/.local/bin/polygonetl", line 5, in
from polygonetl.cli import cli
File "/home/user/.local/lib/python3.10/site-packages/polygonetl/cli/init.py", line 24, in
from polygonetl.cli.export_all import export_all
File "/home/user/.local/lib/python3.10/site-packages/polygonetl/cli/export_all.py", line 30, in
from web3 import Web3
File "/home/user/.local/lib/python3.10/site-packages/web3/init.py", line 7, in
from eth_account import Account # noqa: E402
File "/home/user/.local/lib/python3.10/site-packages/eth_account/init.py", line 1, in
from eth_account.account import Account # noqa: F401
File "/home/user/.local/lib/python3.10/site-packages/eth_account/account.py", line 1, in
from collections import (
ImportError: cannot import name 'Mapping' from 'collections' (/usr/lib/python3.10/collections/init.py)
this is my error log
I googled the problem and I figured out that maybe the error came from a python version upgrade so even downgrading the python version couldn't helpful
so could anybody help me to fix the problem?
Description:
I have deployed ERC-20 contract on the Polygon testnet (Mumbai)
While testing different commands, i have noticed that calling:
polygonetl filter_items -i contracts.csv -p "item['is_erc20'] or item['is_erc721']" | polygonetl extract_field -f address -o token_addresses.txt
return contract as token_address even if is_erc20 and is_erc721 items are FALSE
Enviroment:
PolygonEtl version: polygonetl, version 0.1.8
Network: Polygon Mumbai testnet
JSON-RPC: http://matic-mumbai.chainstacklabs.com/
Chain id: 80001
Transaction hash: 0x826a27ebe789e91bb06489dccf5e7382b6109f9eadccff696b32fcd64f1a0a32
Steps to reproduce:
Check contracts.csv file and for contract (addr = 0x1a1586c38e38a094e42223a31896c40598516d05) is_erc20 and is_erc721 fields are FALSE. But filter_tems method detect that they are token_addresses and returns address in token_addresses.txt
predicate(item) return string, so this condition will always be true.
Either change method to return bool or compare to string (if predicate(item)=="True":)
When i use the "export_all" command I am getting the following error for the receipts data (i could extract receipt data via the export receipts command) but would appreciate support for this issue
python3.9/site-packages/polygonetl/jobs/export_receipts_job.py", line 70, in
receipts = [self.receipt_mapper.json_dict_to_receipt(result) for result in results]
File "/python3.9/site-packages/polygonetl/utils.py", line 71, in rpc_response_batch_to_results
yield rpc_response_to_result(response_item)
File "/python3.9/site-packages/polygonetl/utils.py", line 82, in rpc_response_to_result
raise RetriableValueError(error_message)
polygonetl.misc.retriable_value_error.RetriableValueError: result is None in response {'jsonrpc': '2.0', 'id': 0, 'result': None}. Make sure polygon node is synced.
We have identified several transactions that seem to be totally missing from bigquery-public-data.crypto_polygon.transactions and bigquery-public-data.crypto_polygon.logs table
Here are a couple of examples:
0xcaed45e923e162a8adb3f42ef55a215b0c0f54b0c48a3bf18a8bf83995c35259
0x22b4a9d39c910e2150bf562c062061eb4f9a7eb5977109b835309f4f4f408c7d
0x4feb97ce0277cd5a837ce9d3018f8371c86b001d7757d2263f5566a283da6828
Seems like there are many more missing transactions that were present before.
Especially we found that all swap transactions and logs for Balancer pool 0x8159462d255c1d24915cb51ec361f700174cd994 disappeared - they were there 3-4 weeks ago but now they are gone.
This is of course a critical issue because if transactions are missing, the data cannot be trusted and used for any purpose.
I'm creating queries that look at both ethereum mainnet and the polygon chain, and was surprised to get a mismatch in the 'traces' table between the crypto-polygon dataset you maintain and the Google public crypto-ethereum dataset on the value column.
It's NUMERIC versus STRING; seems to me STRING is correct.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.