Git Product home page Git Product logo

inspector's Introduction

Build Status Security Status Coverage Status Go Report Card License PkgGoDev Discord

(Un)Pack Inspector - Ethereum Smart Contract Downloader and Analysis Toolkit

Overview

Introspector is a robust tool designed for Ethereum smart contract analysis across all EVM-compatible networks. This toolkit excels in the seamless acquisition of contract metadata and source code, particularly for contracts verified on Etherscan, Bscscan, Sourcify.dev and IPFS.

Introspector is an essential tool for developers working on Ethereum smart contract analysis. It provides access to source codes and offers detailed insights into various aspects of contracts. These include function definitions, events, and state variables. Additionally, Introspector features contract source code graphs (CFG), which visually maps the execution flow of smart contracts.

Introspector also supports advanced features via JSON-RPC, enabling users to decode transactions, logs, and decompile opcodes. This functionality extends to recreating the ABI (Application Binary Interface) of any deployed smart contract on the Ethereum network. Additionally, it offers the ability to search for specific functions and events by their signatures, providing a comprehensive toolkit for in-depth contract analysis and auditing. This enhanced search capability makes it easier for developers to quickly locate and analyze specific components of smart contracts.

To manage the storage of large datasets without using Git LFS, we have opted to utilize Cloudflare R2. This object storage solution offers zero egress fees and is a cost-effective alternative to Amazon S3. As a result, the entire dataset is seamlessly integrated with this repository, eliminating the need for you to recreate the datasets independently.

Long story short, clone repo, download datasets, configure only necessary settings, start docker containers and enjoy playing with JSON-RPC and GraphQL services!

Features

  • Cloudflare R2 datasets: 7-zip datasets will be hosted on Cloudflare as it's much cheaper than S3 and there are no egress traffic charges!
  • Efficient Contract Downloading: Streamlined process for downloading Ethereum smart contracts.
  • Download Resumption: Capability to pause and resume downloads, ensuring progress isn't lost.
  • Local Storage Management: Stores contracts in Sqlite3 and disk for quick access and efficient retrieval.
  • Source Code Access: Provides easy access to the source code of verified contracts on Etherscan, Bscscan, Sourcify.dev and internal Solgo.
  • JSON-RPC and GraphQL Support: For easier access to the data, including documentation and postman collection!
  • Advanced Decoding Tools: Utilize JSON-RPC to decode transactions, logs, decompile opcodes, and attempt ABI recreation of any deployed Ethereum smart contract.
  • Signature Search: Enables the searching of functions and events based on their signatures, enhancing the ability to analyze specific contract aspects efficiently.

This project is inspired from Smart Contract Sanctuary. For now, it will be focused only on mainnet contracts. Ethereum is the first one, Arbitrum, Binance Smart Chain and Polygon will be added later including others.

IMPORTANT

  • IN EARLY DEVELOPMENT: Unpacking works (95% of the time), however, compression and decompression including JSON-RPC and GraphQL are still not even close to be completed.
  • Otterscan JSON-RPC api is recommended. We use Erigon.
  • For dataset compression we use 7-Zip compression.
  • If you do not have an Erigon node, you will need to have Etherscan, Bscscan and/or BitQuery account to get contract creation information.
  • Due to licensing issues, we are not going to provide information that stricly breaks licenses. Instead, we are going to provide you the tools to extract information yourself if you have access to the 3rd party sources, utilising their respective API keys.

Datasets

We offer access to a collection of complete datasets, invaluable for researchers, developers, and analysts who require extensive real-world data for their projects. Below is a detailed guide to our available datasets, which include extensive records suitable for in-depth analysis in various fields.

These datasets are stored on R2, ensuring reliable and speedy access via direct download links.

Dataset Location Compressed Size Decompressed Size
Ethereum Dataset http://r2.unpack.dev/datasets/ethereum.7z 526.99 MB 17 GB
Sqlite3 Dataset http://r2.unpack.dev/datasets/sqlite.7z 2.8 GB 64 GB

Last Revision Date: 2024-04-23 09:45 CEST

Documentation (WIP)

Documentation is currently being developed.

Begin your journey at the Welcome. For installation instructions, please consult the Installation guide.

Storage

Sqlite supports up to 140TB of data. We won't have more than few TB at maximum. In our case, even disk capacity is not an issue really. We will be using cgo-free port of the SQLite, so it will be fast, especially because ORMs are forbidden and will not be used in this project.

Notes

  • For the best performance, it is recommended to run this project on NVMe storage. This is because the database is very large and requires fast read and write to the storage to perform well.

This project is not a new blockchain, due to it key-value storage is not needed. Even more, it's a problem.

Database Statistics

Bellow is a database overview table, explaining the size of Sqlite3 database as of last database push. This data is only informational and is here to give you a big picture, not the most accurate database size (as it changes from second to second).

Ethereum Dataset Statistics

Key Value
Contracts Count 184,480
Full Contracts Count 68,631
Metadata Count 82,933
AST Count 52,724
CFG Count 52,682
Standards Count 48,142
Tokens Count 47,047
Constructors Count 45,791
Variables Count 847,394
Functions Count 3,351,299
Events Count 349,965
Database Size (MB) 64,325.133

Extending Inspector (Modules)

Please navigate to the Module to see how you can extend inspector with custom options, commands and services!

Demo and Examples

To be defined here. Will be a link with graphql playground, limited to 1req/s including examples in its own directory of how to consume this service. Additional note is that demo will go down and up as I work on it, served from my own datacenter. I am not promising any availability.

Please check out Examples directory for more information about how to use service.

Postman Collection (WIP)

For easy access, here is publicly shared postman collection. Please note that it's still in early stages, WIP.

Run In Postman

LICENSE

I am offering this code, not related to contracts at no cost, under the Apache License 2.0. For more details about this license, please refer to the LICENSE file included in this repository.

Please note: The contracts themselves are subject to their respective licenses. These licenses can be found within the source code of each individual contract. It is imperative that you review and adhere to these licenses when using the contracts.

Message to Etherscan & Bscscan

I extend my sincere gratitude to the Etherscan team for your invaluable contributions. After reviewing your licensing terms, I believe that my use of your services aligns with these terms. However, should there be any concerns or issues regarding my usage, I welcome your feedback and guidance. Please feel free to contact me at info(at)unpack.dev.

inspector's People

Contributors

0x19 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

inspector's Issues

Sanity of this repository - Large File Storage

It is not hard to build downloader as it's hard to maintain the shred amount of the data in the repository. It takes forever to clone and even git status within the project is slow due to million of indexes that it needs to maintain.

I need to find better way to handle this. It's not long term sane to have millions of files in this repository like it is now.

Higher Overview Scope Of Work

Higher overview idea is to have instance, running badgerdb that contains all necessary and crutial information about contracts such as license, entry source name, source code of the contract, compiler version and so on.

Additional functionality is gprc interface to which we can connect and request new data or retrieve existing data.

Accessing and writing into badgerdb from multiple applications to same database instance is not possible due to consistencies and locks which are enforced via badgerdb itself... Therefore we don't have choice if we want to use badger and I really want to use it.

Clickhouse instead of Sqlite3?

Perhaps it will be better to go with the clickhouse instead of Sqlite3. It can be quite slow, even if there are indexes. Need to do the research how we would:

a.) Store the data, map the data.
b.) Exporting and data importing.

State package improvements

This is the list of future work for state package:

  • Currently it writes new states as soon as it happens, which means, load on redis will be higher then probably we want. Perhaps to write every few seconds with post application shutdown sequence to write latest information.
  • Sync package should be used to ensure concurrent writes and reads are handled correctly.

New contracts?

Basically source code may be missing or from the IPFS or from external sources at the time it's created. Which is quite expected. Therefore, somehow there should be a listeners/watchers that handle these things?

Or should it be left to the engineer requesting it and downloading in the real time? <- This is probably the best.

Fix github workflows

Will be fixed on later on. Adding this ticket here so it's set to stone for fixing.

Dataset import

Right now export is possible but import is not at least in meaning of running 1 command and that's it. Need to make this happen.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.