Git Product home page Git Product logo

near-indexer-for-explorer's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

near-indexer-for-explorer's Issues

Strict mode

We are running Indexer for Explorer for testnet, started from it's genesis. We have a logic to skip receipts we don't have parent transaction for. While it is possible for running from some point after genesis it shouldn't be possible while running from genesis, there shouldn't be any missing transactions.

We suppose this is happening because we are spawning the handler and occurring in situation where handler which should add transaction hasn't completed yet (hasn't stored transaction to database) but the newer handler is already trying to add receipt that is child for that transaction. And we are skipping the receipt, which is wrong behaviour in this case.

Also we're planning to run a couple of indexer nodes, at least one to index from genesis and another one to index in real-time. So adding retry logic to those receipts is not the solution we're looking for.

We want to add some sort of strict mode (enabled by default) in which retry logic for receipts will be enabled and could be disabled if strict mode is off.

We came up with the idea to introduce additional command to run that will disable strict mode for specific amount of blocks (for the case when we run one node from genesis and another one like real-time to be able to continue but not stuck on such receipts)

So it can look like

--allow-missing-relations-in-first-blocks 1000

feat: Add command arg to choose SyncMode for Streamer

Currently streamers SyncMode is hardcoded and can be changed before compilation, but we want to be able to pass it like an args. This will allow us to run the same binary on different machines with different settings (like one for real-time indexing and the other one for historical data)

How should we calculate circulating supply?

Circulating supply is total supply minus locked tokens.
Not all tokens on the lockup contract accounts are actually locked: some of them could be withdrawn by the owner, and that is the main difficulty here.

My task here is to:

  • Read previous attempts of solving this task (see below)
  • Understand, how should we calculate the amount of locked tokens for account X
  • Check numbers on a historical data (October 2020 - March 2021)
  • Try to talk with our finance team if my numbers will have big difference
  • Then, go to this issue and write yet another solution for that

Previous attempts

Useful materials

Update README for 0.2.0

Update README with relevant information for version 0.2.0

Add details about dependencies and requirements, how to run, update database schema with latest changes.

indexer for tracking transfers

Integration partners would like to have an indexer that can track all receipts that contain transfers to a specific set of accounts.

Improve `sync-from-interruption` mode

In order to automate Indexer for Explorer CI/CD we want indexer to be able to decide from which block it should start indexing. We already have sync-from-interruption mode but in our case we want to start indexing from N blocks earlier than indexer was interrupted.

To avoid adding unnecessary features on the NEAR Indexer Framework side it was suggested to handle this on the Indexer for Explorer side.

We want to perform a trick if sync-from-interruption mode was passed:

  • read explorer database to get the height of the latest block we know (LATEST)
  • calculate START_FROM = LATEST - N
  • start indexer by replacing sync-from-interruption with sync-from-block --height START_FROM under the hood

This will allow to automate indexer restarting in case of any fails.

/cc @chefsale

refactor: Update handling of transactions

0.2.x NEAR Indexer Framework returns IndexerChunkView with transactions of type IndexerTransactionWithOutcome which includes transaction itself and it's ExecutionOutcome.

@frol proposes to add receipt_id field to transactions table as ExecutionOutcome of transaction always includes receipt id the transaction was converted to.

Also we have a status of transaction execution right here, I think we can add it to this table as well, @frol?

Duplicated rows in DB

Try executing following query:

select * from account_changes 
where account_changes.changed_in_block_hash = 'CFAAJTVsw5y4GmMKNmuTNybxFJtapKcrarsTh5TPUyQf';

Expected: 5 rows
Actual: 30 rows, that could be easily grouped to 5

Similar request to RPC:


POST to https://archival-rpc.mainnet.near.org
Body
{
  "jsonrpc": "2.0",
  "id": "dontcare",
  "method": "EXPERIMENTAL_changes",
  "params": {
    "changes_type": "account_changes",
    "account_ids": ["nfvalidator1.near", "nfvalidator2.near", "nfvalidator3.near", "nfvalidator4.near", "near", "erm.near", "treasury.near"],
     "block_id": 9820214
  }
}

Result gives 5 rows

Looks like we have an error in unique constraint, and we do not handle null values properly. See this article for more details:
https://www.enterprisedb.com/postgres-tutorials/postgresql-unique-constraint-null-allowing-only-one-null

[Icebox] Index function call arguments

Currently, function call arguments are stored in transaction/receipt action args JSONB column in base64 encoding, and thus it is impossible to query those from SQL, since FUNCTION_CALL action has the following structure:

{
  "method_name": "ping",
  "args_base64": "e30="
}

We may consider indexing those function calls which use JSON into a separate table to enable querying internal.

Alternatively, we should consider building a special service like TheGraph for custom filters/indexers.

Some rows are missing because of transaction hash collisions

In accounts table, we have implicit account that still should exist ae384ca1c2c93a4029e34fb52a5f72adebc5ff10b8b35fdd628ab2e7c874d088

But we do not have it in RPC, here is the request body

{
  "jsonrpc": "2.0",
  "id": "dontcare",
  "method": "query",
  "params": {
    "request_type": "view_account",
    "block_id": 33039470,
    "account_id": "ae384ca1c2c93a4029e34fb52a5f72adebc5ff10b8b35fdd628ab2e7c874d088"
  }
}

More indices and renaming

The general theme about indices: add indecies to most of the foreign key and compound primary key fields. Without these indecies most of the queries take at least 100ms, and can go as bad as several seconds at the moment.

The field names need to be consistent and help to reason about the relations. Let's take an example: hash in a blocks table is probably fine, but when we have block_hash, chunk_hash and hash in transactions table it is getting nasty, so I suggest we always indicate what hash we store, and it would probably be even more helpful if we name block_hash in transactions table as included_in_block_hash. What do you think? /cc @khorolets @icerove

blocks

  • rename height -> block_height
  • rename hash -> block_hash
  • rename prev_hash -> prev_block_hash
  • rename timestamp -> block_timestamp

chunks

  • index block_hash (#36)
  • rename hash -> chunk_hash

transactions

  • index block_hash (+ consider included_in_block_hash renaming applied to all the relevant use-cases)
  • index chunk_hash
  • index signer_account_id
  • index receiver_account_id
  • index public_key (+ rename to signer_public_key - double-check it)

transaction_actions

  • index transaction_hash
  • rename index to index_in_transaction

receipts

  • index block_hash
  • index chunk_hash
  • index predecessor_account_id
  • index receiver_account_id

receipt_data

  • index receipt_id (we need to rename it to something meaningful)

receipt_actions (meh, I need another go over this naming as I get constantly confused by it ๐Ÿ˜ž)

  • consider index signer_account_id (I am not sure about its use at the moment)

receipt_action_actions

  • index receipt_id

receipt_action_output_data

  • index data_id (added in some recent PR)
  • index receipt_id
  • consider index receiver_account_id (I am not sure about its use at the moment)

receipt_action_input_data

  • index data_id
  • index receipt_id

execution_outcomes

  • index block_hash

Please suggest more if you encounter.

Add index on `chunks` table for `block_hash`

Confirm that the following query is immediate (it currently takes around 2-4 minutes):

select * from chunks where block_hash = '51DkcKsnEfPgBwuBCbNTTEfHDR6P8dXWN9r2PfYBXnRE' limit 1;

refactor: Handle ExecutionOutcomes

Since 0.2.x NEAR Indexer Framework we are receiving receipts' execution outcomes separately from transaction ones (see #3 ), so we need to handle ExecutionOutcomes properly (separate table and here we should have only outcomes for receipts.

Do we miss some balance changes?

I sum up non-staking and staking-balances based on the latest state changes per account:

SELECT
    SUM(affected_account_nonstaked_balance),
    SUM(affected_account_staked_balance)
FROM (
    SELECT DISTINCT ON (affected_account_id) affected_account_nonstaked_balance, affected_account_staked_balance
    FROM account_changes
    ORDER BY affected_account_id, changed_in_block_timestamp DESC
) as t;

And here are the values:

                sum                |                sum
-----------------------------------+-----------------------------------
 617515305435595972472724261340813 | 402992430818950819988234479163231
(1 row)

The problem here is that this exceeds the total supply. Note, there are three accounts created in genesis (bo.near, registrar, wallet.pulse.near), which have never been touched, so they are not in the account_changes table (they hold just 60 NEAR)

>>> total_supply
1019905212545170492542259392171734

>>> 617515305435595972472724261340813 + 402992430818950819988234479163231
1020507736254546792460958740504044

>>> 617515305435595972472724261340813 + 402992430818950819988234479163231 + (50000000000000000000000000 + 10000000000000000000000000 + 999899913398562500000000)
1020507797254446705859521240504044

The difference is 602.5k NEAR

Any ideas? @telezhnaya @bowenwang1996

Setup continuous deployment

Here are the problems we currently have:

  • There is no automation around provisioning and auto-start for the Indexer nodes, so we manually configure and maintain the nodes uptime
  • Sometimes indexer nodes fall off sync process due to protocol upgrade to the version that is not supported by the nearcore we pulled in as a dependency

Currently, we run Indexer for Explorer manually on GCP nodes under tmux session, and we bump nearcore version once in a while mostly to a recent nearcore commit version instead of using a proper release to match the rest of the network.

We need to follow the same provisioning strategy as we have for archival RPC nodes, and if we automate this process, we should be able to manually bump the version of nearcore in the Cargo.toml, go through CI, merge to master, and deploy the new release to the nodes. Ideally, we should use separate releases for testnet, mainnet, and betanet deployments.

P.S. we don't have betanet deployment at all, so we can start with it to debug any issues on the way (it will unblock near/near-wallet#1510)

Incorrect gas usage

Currently we display block gas usage by summing up gas used from the chunks included in the block. However the result is gas used for the previous block. To correctly calculate gas used for the current block, we should find for each shard, the next chunk that is included in some block and get the gas used there. Alternatively, if we have indexer ready, we can index gas used from the result of applying the chunk and sum it up to get the total gas used for the block.

Look up the holders of the NEAR tokens to understand if those match the expectations

There are several groups of token holders:

  • Foundation
  • Exchanges
  • Investors (individual or otherwise)

I wonder if "whales" group is reasonable: https://near.flipsidecrypto.com/

"whales" are accounts owning 100k+ NEAR (on Flipside dashboard, the delegated tokens are properly mapped to the owners, but let's ignore that fact, let's assume that all the staked tokens are delegated from whales, and currently 400M NEAR are staked [scroll to the bottom])

refactor: Extract AccessKeys from receipts (like we do in Indexer for Wallet)

We want to store AccessKey actions separately to query them in future like we do in Indexer for Wallet. In addition to Indexer for Wallet logic we also store receipts as is (see #2). That's why we want to introduce some sort of handler we can attach to receipts processing providing additional logic.

Need to discuss with @frol before starting

Visualize the database structure relations

We discussed this and already had a draft of the visualization.

Send a PR with the change to the README (put the picture itself into the migrations folder or create a dedicated folder like docs in the root of the project -- both solutions look suboptimal to me, so if you have a better option, go for it).

Keep the `created_by_receipt_id` for implicit account pointing to the first TRANSFER action rather updating it on every TRANSFER action

@khorolets we had this conversation before, but it seems that the current behavior confuses end-users (near/near-explorer#551). Updating the creation receipt id for implicit accounts causes confusion as to why there are transactions that are before the "creation" transaction. In the common case, people would rather expect to see that the very first transaction causes the creation; in our case we should make sure that we handle the account deletion properly, but we should be able to only update the creation receipt id when deletion receipt id is not NULL (in which case we want to set it to NULL and update the creation receipt id).

`accounts.last_update_block_height` does not show corresponding data

Just checked on the data:
accounts.last_update_block_height is equal to accounts.deleted_by_receipt_id if the account was deleted; otherwise, it is equal to accounts.created_by_receipt_id.
Intuitively, I was waiting for the block height with the last update on this account. By the way, I am not sure we need this info since it will produce many updates in DB.

I suggest to get rid of this column and add columns created_at_block_height and deleted_at_block_height.
We have to make 4 joins to find this info right now ๐Ÿ˜ข

account_changes table: fill caused_by_transaction_hash field

If caused_by_receipt_id is filled, it means that we have parent transaction. Even if it's not direct parent, anyway it could be extremely useful to see transaction_hash.

I usually copy caused_by_receipt_id , make select from receipts, take transaction_hash and then analyse the transaction. It would be cool to automate that.

Denormalize `action_receipt_actions` table in order to improve common queries performance

  • add columns to action_receipt_actions table and enable indices for them:

    • predecessor_account_id
    • receiver_account_id
    • included_in_block_timestamp

    The motivation is that JOINing with receipts table while doing filtering like receiver_account_id = ? AND action_kind = ? AND args = ? is slow. @vgrichina Could you provide a minimal useful query?

    Ideally, we want to have cross-table indices, but Postgres does not support that, though Postgres 12 and 13 claim to improve various aspects of index performance, so we may want to try to migrate our Postgres 11 database to Postgres 13 to see if that helps.

  • add index on args->>'method_name' for FUNCTION_CALL actions (conditional index in Postgres)

  • try to decode base64-encoded args->>args_base64 to JSON (replace the original base64 field with args->>args_json if parsing succeeds), so ultimately we will either have args_base64 (e.g. rainbow bridge uses borsh instead of JSON args) or args_json (commonly used by NEAR smart contracts)

    • consider adding indices to args->>'args_json'->>'receiver_id' and args->>'args_json'->>'amount' for FT and NFT

bug: deleted_by_receipt_id for access_keys of deleted account should not be null

@vgrichina asked in Discord

I wonder what is the expected behavior of indexer when account is deleted?

I've just tried creating test-indexer-keys.vg with single key (ed25519:FUMSeS84AsKAq8ydAWiE5io1CxC9NSdAW74EGPyxmKPL).

Then I deleted the account, however deleted_by_receipit_id is still null. Is this by design?

I wonder if there is any way to determine from indexer DB whether access key is still active.

I have clarified the flow with @frol and we agreed that Indexer should set deleted_by_receipt_id for access keys of the deleting account with receipt id which deleted the account.

It will require additional re-index from genesis to fullfill the field for missing access keys (on mainnet it will take ~3 days to complete)

Resolve conflicting/duplicating CreateAccount/DeleteAccount actions

The problem is two-fold:

  1. Use the logic we have for access keys to resolve the updates to the same account (getting multiple CreateAccount / DeleteAccount actions in a single chunk we need to squash those changes to a single INSERT or UPDATE query)
  2. Come up with the solution to make the updates/inserts idempotent to avoid the situation when two concurrent nodes may rewrite each other changes

My proposed solution is to introduce last_update_block_height into accounts/access_keys tables so we can rely on it, and instead of doing:

UPDATE access_keys
SET
    deleted_at_receipt_id = 'abcd'
WHERE
    account_id = 'frol.near'
    AND
    public_key = 'ed25519:...'

we do

UPDATE access_keys
SET
    deleted_at_receipt_id = 'abcd',
    last_update_block_height = :current_block_height
WHERE
    account_id = 'frol.near'
    AND
    public_key = 'ed25519:...'
    AND
    block_height < :current_block_height

This way it will successfully update the record if the last update was before the current block height, and silently ignore the update if we are behind the latest state in the database.

NOTE: I have not put any thought into the INSERT ... ON CONFLICT statements that we need when we create the account, so you will need to design the solution yourself.

Investigate missing chunks

While syncing testnet from genesis, we've encountered a lot retries for storing receipts, they are infinite because they can't find the parent. Investigation of the reason showed that the parent transactions for these receipts were not stored and then we've found out that the chunk which contains particular parent transactions hasn't been added to database.

For example this one chunk hash H6MvjzT6JtqzuJiPNGennqBDhtqXfDcb2e8cL5NLNcgU (block height 22303582)

We need to investigate why Indexer hasn't store this chunks and similar ones and fix it.

/cc @frol

Weird balance changes in the moment of VALIDATOR_ACCOUNTS_UPDATE

In the moment of reward, liquid amount for jazza.poolv1.near raised dramatically, and stake decreased. I can't explain the process behind both of these changes. @bowenwang1996 could you please help?

select * from account_changes
where account_changes.changed_in_block_timestamp > 1615775643106397811 AND account_changes.changed_in_block_timestamp < 1615812599676806812 
AND account_changes.affected_account_id = 'jazza.poolv1.near'
order by account_changes.changed_in_block_timestamp;
liquid stake
203513788946994521008902241 3277075738533977555810040143262
741399745512700845188547240 3277075737545599543519640143262

Improve Access Key serialization in AddKey actions

Currently, we store the serialized action argument with AccessKeyView as

{
  "access_key": {
    "nonce": 0,
    "permission": "FullAccess"
  },
  "public_key": "ed25519:4Rx5XC1Nu1q4TvNatTpokM8mrWGLiCYtHccWpu3VaDJd"
}

and

{
  "access_key": {
    "nonce": 0,
    "permission": {
      "FunctionCall": {
        "allowance": "100500",
        "receiver_id": "contract-name",
        "method_names": ["..."]
      }
    }
  },
  "public_key": "ed25519:4Rx5XC1Nu1q4TvNatTpokM8mrWGLiCYtHccWpu3VaDJd"
}

Notice that the permissions value can be either a string or an object. We should have a unified structure using serde tagger enum representation (and I also would prefer to have the const strings to be in SCREAMING_SNAKE_CASE):

{
  "access_key": {
    "nonce": 0,
    "permission_kind": "FULL_ACCESS"
  },
  "public_key": "ed25519:4Rx5XC1Nu1q4TvNatTpokM8mrWGLiCYtHccWpu3VaDJd"
}

and

{
  "access_key": {
    "nonce": 0,
    "permission_kind": "FUNCTION_CALL",
    "permission_details": {
      "allowance": "100500",
      "receiver_id": "contract-name",
      "method_names": ["..."]
    }
  },
  "public_key": "ed25519:4Rx5XC1Nu1q4TvNatTpokM8mrWGLiCYtHccWpu3VaDJd"
}

The following config should do the trick:

#[serde(tag = "permission_kind", content = "permission_details", rename_all = "SCREAMIING_SNAKE_CASE")]

, but we cannot apply it directly to the PermissionView enum, so we will need to do some code duplication and implement From trait to convert the nearcore primitive structure into our Indexer custom implementation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.