Git Product home page Git Product logo

indexer's Introduction

Iron Indexer

A parallel Reth indexer.

Reads transaction history from reth's DB (direct from filesystem, skipping network & JSON-RPC overhead). It's able to index from a dynamic set of addresses, which can grow at runtime, by spawning parallel self-optimizing backfill jobs.

Note: Kudos to reth-indexer, which was the original implementation that served as a basis for this.

Disclaimer

This is currently a prototype, and built to serve a yet-to-be-released feature of Iron wallet. All development so far has been with that goal in mind. Don't expect a plug-and-play indexing solution for every use case (at least not right now)

How to use

๐Ÿšง TODO ๐Ÿšง

For now, check iron-indexer.toml, which should help you get started. Feel free to contact me or open issues for any questions.

Why

Fetching on-chain data can be a painful process. A simple query such as "what is the transaction history for my wallet address?" translates into a time-consuming walk of the entire chain. It's also not enough to sync the from and to fields of every transaction (which would already be costly). Relevant transactions for a wallet are also based on the emitted topics, such as an ERC20 transfers.

On top of this, most indexers require a predetermined set of topics to index, and any changes require a new full walk of the chain.

Instead, iron-indexer takes a different approach: new addresses can be added to the sync list at runtime, and self-optimizing backfill jobs are registered to backfill all data for each incoming address.

How

Forward & Backfill workers

Let's illustrate this with an example: Say we're currently indexing only alice's address. A regular syncing process is running, waiting for new blocks to process.

After block 10, bob's address is added to the set. From block 11 onwards, both alice and bob will be matched. But we missed blocks 1 through 10 for bob. At this point we register a new backill job for the missing data.

We're now at this state:

job account set block range
Forward [alice, bob] waiting for #11
Backfill #1 [bob] [1, 10]

The new job starts immediately, in reverse order.

A few moments later, carol's address joins too. By now both existing jobs have advanced a bit:

job account set block range notes
Forward [alice, bob] waiting for #16
Backfill #1 [bob] [1, 5] We've synced from 10 to 6 in the meantime
Backfill #2 [carol] [1, 15]

The naive approach would be to the new job and run all 3 concurrently. This has one drawback thought: both backfill jobs will fetch redundant blocks (1 through 5).

Instead of starting right away, we run a reorganization step:

job account set block range notes
Forward [alice, bob] waiting for #16
Backfill #3 [bob,carol] [1, 5] The overlapping range in one job...
Backfill #4 [carol] [6, 15] ...And carol's unique range in another

This ensures we are never attempting to fetch the same block twice, therefore optimizing IO as much as possible.

Cuckoo filters

We make use of Cuckoo filters for efficiently filtering data inclusion. This is similar to how Bloom filters work, with additional benefits such as ability to remove items, and lower space overhead. The particular implementation being used also supports automatic scaling.

Future Work

To be done next

  • Finish the API
  • Add EIP-712 based authentication
  • Document this a bit better
  • Benchmark on a real mainnet node

Future optimizations

A few potential optimizations are still yet-to-be-done, but should help improve throughput even further:

  • Split workers into producer/consumers. Currently workers alternate between fetching a block and processing. Instead, which is not optimal for IO. (question: is this worth it? or can we just saturate read capacity by setting up more workers?);
  • Work-stealing. If we have a single backfill job walking N blocks, we can split it into Y jobs of N/Y blocks each. This can be done directly in the reorganization step.

Benchmarks

๐Ÿšง TODO ๐Ÿšง

Requirements

  • A reth node running in the same node (requires access to the same filesystem)
  • PostgreSQL

License

MIT License

indexer's People

Contributors

naps62 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.