Git Product home page Git Product logo

ethereum-scraper's Introduction

Ethereum Scraper

JSON RPC Scraper

Schema

blocks.csv

Column Type
block_number bigint
block_hash hex_string
block_parent_hash hex_string
block_nonce hex_string
block_sha3_uncles hex_string
block_logs_bloom hex_string
block_transactions_root hex_string
block_state_root hex_string
block_miner hex_string
block_difficulty bigint
block_total_difficulty bigint
block_size bigint
block_extra_data hex_string
block_gas_limit bigint
block_gas_used bigint
block_timestamp bigint
block_transaction_count bigint

transactions.csv

Column Type
tx_hash hex_string
tx_nonce bigint
tx_block_hash hex_string
tx_block_number bigint
tx_index bigint
tx_from hex_string
tx_to hex_string
tx_value bigint
tx_gas bigint
tx_gas_price bigint
tx_input hex_string

erc20_transfers.csv

Column Type
erc20_token hex_string
erc20_from hex_string
erc20_to hex_string
erc20_value bigint
erc20_tx_hash hex_string
erc20_block_number bigint

IPC

If you want to export the entire blockchain refer to https://github.com/medvedev1088/ethereum-etl. It uses IPC and works way faster.

If you want to export a few thousand blocks / contracts continue reading below.

Usage

Follow the installation guide for Scrapy https://doc.scrapy.org/en/latest/intro/install.html

Run in the terminal:

> pip install typing future scrapy
> scrapy runspider ethscraper/spiders/eth_json_rpc_spider.py \
-s ETH_JSON_RPC_URL=https://mainnet.infura.io/<your_api_key> \
-s START_BLOCK=0 \
-s END_BLOCK=1000000 \
-s FEED_FORMAT=csv

The output will be in blocks.csv, transactions.csv, erc20_transfers.csv in the current directory.

Should work on both python2 and python3. Tested on python2.7.

Options

ETH_JSON_RPC_URL

The Ethereum node JSON RPC url. If running a local geth node start it with --rpc option:

geth --rpc --rpcapi eth

Then use ETH_JSON_RPC_URL=http://localhost:8545.

START_BLOCK, END_BLOCK

Integers representing the start and end blocks for scraping, inclusive.

FEED_FORMAT

Output format. The output files will have the corresponding extension.

Supported formats are: csv, xml, json, jsonlines, pickle, marshal.

EXPORT_TRANSACTIONS

Whether to export transactions.csv file. Possible values: True, False.

EXPORT_ERC20_TRANSFERS

Whether to export erc20_transfers.csv file. Possible values: True, False.

CONCURRENT_REQUESTS

The number of concurrent requests. Default is 20.

RETRY_TIMES

How many times to retry a request in case an error is encountered. Default is 10.

Internal Transactions

Retrieving internal transactions requires transaction tracing. Since that's potentially a very long running operation (hours) and can also result in huge amounts of data, an IPC subscription should be used instead of RPC.

An example is given in this PR ethereum/go-ethereum#15516

$ nc -U /work/temp/rinkeby/geth.ipc
{"id": 1, "method": "debug_subscribe", "params": ["traceChain", "0x0", "0xffff", {"tracer": "callTracer"}]}

The API will stream back one RPC notification per non-empty block. An exception is the very last block, which will be reported even if empty so the user knows the stream is done.

{"jsonrpc":"2.0","id":1,"result":"0xe1deecc4b399e5fd2b2a8abbbc4624e2"}
{"jsonrpc":"2.0","method":"debug_subscription","params":{"subscription":"0xe1deecc4b399e5fd2b2a8abbbc4624e2","result":{"block":"0x37","hash":"0xdb16f0d4465f2fd79f10ba539b169404a3e026db1be082e7fd6071b4c5f37db7","traces":[{"from":"0x31b98d14007bdee637298086988a0bbd31184523","gas":"0x0","gasUsed":"0x0","input":"0x","output":"0x","time":"1.077µs","to":"0x2ed530faddb7349c1efdbf4410db2de835a004e4","type":"CALL","value":"0xde0b6b3a7640000"}]}}}
{"jsonrpc":"2.0","method":"debug_subscription","params":{"subscription":"0xe1deecc4b399e5fd2b2a8abbbc4624e2","result":{"block":"0xf43","hash":"0xacb74aa08838896ad60319bce6e07c92edb2f5253080eb3883549ed8f57ea679","traces":[{"from":"0x31b98d14007bdee637298086988a0bbd31184523","gas":"0x0","gasUsed":"0x0","input":"0x","output":"0x","time":"1.568µs","to":"0xbedcf417ff2752d996d2ade98b97a6f0bef4beb9","type":"CALL","value":"0xde0b6b3a7640000"}]}}}
{"jsonrpc":"2.0","method":"debug_subscription","params":{"subscription":"0xe1deecc4b399e5fd2b2a8abbbc4624e2","result":{"block":"0xf47","hash":"0xea841221179e37ca9cc23424b64201d8805df327c3296a513e9f1fe6faa5ffb3","traces":[{"from":"0xbedcf417ff2752d996d2ade98b97a6f0bef4beb9","gas":"0x4687a0","gasUsed":"0x12e0d","input":"0x6060604052341561000c57fe5b5b6101828061001c6000396000f30060606040526000357c0100000000000000000000000000000000000000000000000000000000900463ffffffff168063230925601461003b575bfe5b341561004357fe5b61008360048080356000191690602001909190803560ff1690602001909190803560001916906020019091908035600019169060200190919050506100c5565b604051808273ffffffffffffffffffffffffffffffffffffffff1673ffffffffffffffffffffffffffffffffffffffff16815260200191505060405180910390f35b6000600185858585604051806000526020016040526000604051602001526040518085600019166000191681526020018460ff1660ff1681526020018360001916600019168152602001826000191660001916815260200194505050505060206040516020810390808403906000866161da5a03f1151561014257fe5b50506020604051035190505b9493505050505600a165627a7a7230582054abc8e7b2d8ea0972823aa9f0df23ecb80ca0b58be9f31b7348d411aaf585be0029","output":"0x60606040526000357c0100000000000000000000000000000000000000000000000000000000900463ffffffff168063230925601461003b575bfe5b341561004357fe5b61008360048080356000191690602001909190803560ff1690602001909190803560001916906020019091908035600019169060200190919050506100c5565b604051808273ffffffffffffffffffffffffffffffffffffffff1673ffffffffffffffffffffffffffffffffffffffff16815260200191505060405180910390f35b6000600185858585604051806000526020016040526000604051602001526040518085600019166000191681526020018460ff1660ff1681526020018360001916600019168152602001826000191660001916815260200194505050505060206040516020810390808403906000866161da5a03f1151561014257fe5b50506020604051035190505b9493505050505600a165627a7a7230582054abc8e7b2d8ea0972823aa9f0df23ecb80ca0b58be9f31b7348d411aaf585be0029","time":"658.529µs","to":"0x5481c0fe170641bd2e0ff7f04161871829c1902d","type":"CREATE","value":"0x0"}]}}}
{"jsonrpc":"2.0","method":"debug_subscription","params":{"subscription":"0xe1deecc4b399e5fd2b2a8abbbc4624e2","result":{"block":"0xfff","hash":"0x254ccbc40eeeb183d8da11cf4908529f45d813ef8eefd0fbf8a024317561ac6b"}}}

Individual block tracing is concurrent in the transactions (limited to num cores) and also makes chain tracing concurrent in the blocks (limited to num cores).

Etherscan Scraper

To scrape contract bytecode and Solidity code from Etherscan:

> pip install Scrapy
> scrapy runspider ethscraper/spiders/etherscan_contract_spider.py -o contracts.csv

Note that CloudFlare will block your machine after a few thousand requests. Be aware that web scraping is considered bad practice. This can break without notice, as it is obviously relying on how the frontend is rendered.

ethereum-scraper's People

Contributors

medvedev-evgeny avatar medvedev1088 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.