near / near-indexer-for-explorer Goto Github PK

View Code? Open in Web Editor NEW

123.0 16.0 56.0 3.72 MB

Watch NEAR network and store all the data from NEAR blockchain to PostgreSQL database

Home Page: https://near-indexers.io/docs/projects/near-indexer-for-explorer

License: GNU General Public License v3.0

PLpgSQL 2.10% Rust 97.37% Dockerfile 0.54%

blockchain hacktoberfest nearprotocol

near-indexer-for-explorer's People

Stargazers

Watchers

near-indexer-for-explorer's Issues

feat: Store index in chunk and block timestamp in transactions and receipts tables for better indexing

Strict mode

We are running Indexer for Explorer for testnet, started from it's genesis. We have a logic to skip receipts we don't have parent transaction for. While it is possible for running from some point after genesis it shouldn't be possible while running from genesis, there shouldn't be any missing transactions.

We suppose this is happening because we are spawning the handler and occurring in situation where handler which should add transaction hasn't completed yet (hasn't stored transaction to database) but the newer handler is already trying to add receipt that is child for that transaction. And we are skipping the receipt, which is wrong behaviour in this case.

Also we're planning to run a couple of indexer nodes, at least one to index from genesis and another one to index in real-time. So adding retry logic to those receipts is not the solution we're looking for.

We want to add some sort of strict mode (enabled by default) in which retry logic for receipts will be enabled and could be disabled if strict mode is off.

We came up with the idea to introduce additional command to run that will disable strict mode for specific amount of blocks (for the case when we run one node from genesis and another one like real-time to be able to continue but not stuck on such receipts)

So it can look like

--allow-missing-relations-in-first-blocks 1000

Add README on adding new read-only accounts to PostgreSQL

feat: Add command arg to choose SyncMode for Streamer

Currently streamers SyncMode is hardcoded and can be changed before compilation, but we want to be able to pass it like an args. This will allow us to run the same binary on different machines with different settings (like one for real-time indexing and the other one for historical data)

How should we calculate circulating supply?

Circulating supply is total supply minus locked tokens.
Not all tokens on the lockup contract accounts are actually locked: some of them could be withdrawn by the owner, and that is the main difficulty here.

My task here is to:

Read previous attempts of solving this task (see below)
Understand, how should we calculate the amount of locked tokens for account X
Check numbers on a historical data (October 2020 - March 2021)
Try to talk with our finance team if my numbers will have big difference
Then, go to this issue and write yet another solution for that

Previous attempts

Useful materials

Docs and source code of lockup contract
Probably I need to read this file in contract
Tool that could answer what amount is locked for account - I don't want to use it, no guarantee that it gives correct result
Sources of this tool
Numbers from our financial team - no-one can answer me, how exactly it was calculated

Investigate why some transactions don't have actions stored in DB

The is a transaction on mainnet: https://explorer.mainnet.near.org/transactions/3dbKYqfac2eZxaBv1H5KwVX4uu5smQ6xE6rq5wz8HX2u, and indexer picked it up:

SELECT * FROM transactions WHERE transactions.transaction_hash  = '3dbKYqfac2eZxaBv1H5KwVX4uu5smQ6xE6rq5wz8HX2u';

, but there are no actions in the database:

SELECT * FROM transaction_actions WHERE transaction_hash = '3dbKYqfac2eZxaBv1H5KwVX4uu5smQ6xE6rq5wz8HX2u'

Update README for 0.2.0

Update README with relevant information for version 0.2.0

Add details about dependencies and requirements, how to run, update database schema with latest changes.

indexer for tracking transfers

Integration partners would like to have an indexer that can track all receipts that contain transfers to a specific set of accounts.

Improve `sync-from-interruption` mode

In order to automate Indexer for Explorer CI/CD we want indexer to be able to decide from which block it should start indexing. We already have sync-from-interruption mode but in our case we want to start indexing from N blocks earlier than indexer was interrupted.

To avoid adding unnecessary features on the NEAR Indexer Framework side it was suggested to handle this on the Indexer for Explorer side.

We want to perform a trick if sync-from-interruption mode was passed:

read explorer database to get the height of the latest block we know (LATEST)
calculate START_FROM = LATEST - N
start indexer by replacing sync-from-interruption with sync-from-block --height START_FROM under the hood

This will allow to automate indexer restarting in case of any fails.

/cc @chefsale

Add access keys from genesis to store genesis handler

refactor: Update handling of transactions

0.2.x NEAR Indexer Framework returns IndexerChunkView with transactions of type IndexerTransactionWithOutcome which includes transaction itself and it's ExecutionOutcome.

@frol proposes to add receipt_id field to transactions table as ExecutionOutcome of transaction always includes receipt id the transaction was converted to.

Also we have a status of transaction execution right here, I think we can add it to this table as well, @frol?

Improve stability of Indexer

We want to improve overall stability and optimise performance where possible

Duplicated rows in DB

Try executing following query:

select * from account_changes 
where account_changes.changed_in_block_hash = 'CFAAJTVsw5y4GmMKNmuTNybxFJtapKcrarsTh5TPUyQf';

Expected: 5 rows
Actual: 30 rows, that could be easily grouped to 5

Similar request to RPC:


POST to https://archival-rpc.mainnet.near.org
Body
{
  "jsonrpc": "2.0",
  "id": "dontcare",
  "method": "EXPERIMENTAL_changes",
  "params": {
    "changes_type": "account_changes",
    "account_ids": ["nfvalidator1.near", "nfvalidator2.near", "nfvalidator3.near", "nfvalidator4.near", "near", "erm.near", "treasury.near"],
     "block_id": 9820214
  }
}

Result gives 5 rows

Looks like we have an error in unique constraint, and we do not handle null values properly. See this article for more details:
https://www.enterprisedb.com/postgres-tutorials/postgresql-unique-constraint-null-allowing-only-one-null

Add index on transactions.receiver_account_id

[Icebox] Index function call arguments

Currently, function call arguments are stored in transaction/receipt action args JSONB column in base64 encoding, and thus it is impossible to query those from SQL, since FUNCTION_CALL action has the following structure:

{
  "method_name": "ping",
  "args_base64": "e30="
}

We may consider indexing those function calls which use JSON into a separate table to enable querying internal.

Alternatively, we should consider building a special service like TheGraph for custom filters/indexers.

Squash database migrations into a single final schema

Once we are ready to merge 0.2.x into master branch, let's squash all the database migrations into a single initial migration.

NOTE: just apply all the migrations and dump the latest schema.

Some rows are missing because of transaction hash collisions

In accounts table, we have implicit account that still should exist ae384ca1c2c93a4029e34fb52a5f72adebc5ff10b8b35fdd628ab2e7c874d088

But we do not have it in RPC, here is the request body

{
  "jsonrpc": "2.0",
  "id": "dontcare",
  "method": "query",
  "params": {
    "request_type": "view_account",
    "block_id": 33039470,
    "account_id": "ae384ca1c2c93a4029e34fb52a5f72adebc5ff10b8b35fdd628ab2e7c874d088"
  }
}

Add order-preserving columns to execution outcomes

Blocked by near/nearcore#3529

index_in_chunk
executed_in_chunk_hash
executed_in_block_timestamp

More indices and renaming

The general theme about indices: add indecies to most of the foreign key and compound primary key fields. Without these indecies most of the queries take at least 100ms, and can go as bad as several seconds at the moment.

The field names need to be consistent and help to reason about the relations. Let's take an example: hash in a blocks table is probably fine, but when we have block_hash, chunk_hash and hash in transactions table it is getting nasty, so I suggest we always indicate what hash we store, and it would probably be even more helpful if we name block_hash in transactions table as included_in_block_hash. What do you think? /cc @khorolets @icerove

blocks

rename height -> block_height
rename hash -> block_hash
rename prev_hash -> prev_block_hash
rename timestamp -> block_timestamp

chunks

index ~~block_hash~~ (#36)
rename hash -> chunk_hash

transactions

index block_hash (+ consider included_in_block_hash renaming applied to all the relevant use-cases)
index chunk_hash
index signer_account_id
index receiver_account_id
index public_key (+ rename to signer_public_key - double-check it)

transaction_actions

index transaction_hash
rename index to index_in_transaction

receipts

index block_hash
index chunk_hash
index predecessor_account_id
index receiver_account_id

receipt_data

index receipt_id (we need to rename it to something meaningful)

receipt_actions (meh, I need another go over this naming as I get constantly confused by it 😞)

consider index signer_account_id (I am not sure about its use at the moment)

receipt_action_actions

index receipt_id

receipt_action_output_data

index ~~data_id~~ (added in some recent PR)
index receipt_id
consider index receiver_account_id (I am not sure about its use at the moment)

receipt_action_input_data

index data_id
index receipt_id

execution_outcomes

index block_hash

Please suggest more if you encounter.

Add index on `chunks` table for `block_hash`

Confirm that the following query is immediate (it currently takes around 2-4 minutes):

select * from chunks where block_hash = '51DkcKsnEfPgBwuBCbNTTEfHDR6P8dXWN9r2PfYBXnRE' limit 1;

refactor: Extract Accounts from receipts

Similar to #5 we want to extract Accounts

refactor: Handle ExecutionOutcomes

Since 0.2.x NEAR Indexer Framework we are receiving receipts' execution outcomes separately from transaction ones (see #3 ), so we need to handle ExecutionOutcomes properly (separate table and here we should have only outcomes for receipts.

Do we miss some balance changes?

I sum up non-staking and staking-balances based on the latest state changes per account:

SELECT
    SUM(affected_account_nonstaked_balance),
    SUM(affected_account_staked_balance)
FROM (
    SELECT DISTINCT ON (affected_account_id) affected_account_nonstaked_balance, affected_account_staked_balance
    FROM account_changes
    ORDER BY affected_account_id, changed_in_block_timestamp DESC
) as t;

And here are the values:

                sum                |                sum
-----------------------------------+-----------------------------------
 617515305435595972472724261340813 | 402992430818950819988234479163231
(1 row)

The problem here is that this exceeds the total supply. Note, there are three accounts created in genesis (bo.near, registrar, wallet.pulse.near), which have never been touched, so they are not in the account_changes table (they hold just 60 NEAR)

>>> total_supply
1019905212545170492542259392171734

>>> 617515305435595972472724261340813 + 402992430818950819988234479163231
1020507736254546792460958740504044

>>> 617515305435595972472724261340813 + 402992430818950819988234479163231 + (50000000000000000000000000 + 10000000000000000000000000 + 999899913398562500000000)
1020507797254446705859521240504044

The difference is 602.5k NEAR

Any ideas? @telezhnaya @bowenwang1996

Setup continuous deployment

Here are the problems we currently have:

There is no automation around provisioning and auto-start for the Indexer nodes, so we manually configure and maintain the nodes uptime
Sometimes indexer nodes fall off sync process due to protocol upgrade to the version that is not supported by the nearcore we pulled in as a dependency

Currently, we run Indexer for Explorer manually on GCP nodes under tmux session, and we bump nearcore version once in a while mostly to a recent nearcore commit version instead of using a proper release to match the rest of the network.

We need to follow the same provisioning strategy as we have for archival RPC nodes, and if we automate this process, we should be able to manually bump the version of nearcore in the Cargo.toml, go through CI, merge to master, and deploy the new release to the nodes. Ideally, we should use separate releases for testnet, mainnet, and betanet deployments.

P.S. we don't have betanet deployment at all, so we can start with it to debug any issues on the way (it will unblock near/near-wallet#1510)

Incorrect gas usage

Currently we display block gas usage by summing up gas used from the chunks included in the block. However the result is gas used for the previous block. To correctly calculate gas used for the current block, we should find for each shard, the next chunk that is included in some block and get the gas used there. Alternatively, if we have indexer ready, we can index gas used from the result of applying the chunk and sum it up to get the total gas used for the block.

refactor: Update handling of receipts

Update the way we are handling Receipts

Receipt may be of type action or data, we need to store them properly.

Look up the holders of the NEAR tokens to understand if those match the expectations

There are several groups of token holders:

Foundation
Exchanges
Investors (individual or otherwise)

I wonder if "whales" group is reasonable: https://near.flipsidecrypto.com/

"whales" are accounts owning 100k+ NEAR (on Flipside dashboard, the delegated tokens are properly mapped to the owners, but let's ignore that fact, let's assume that all the staked tokens are delegated from whales, and currently 400M NEAR are staked [scroll to the bottom])

refactor: Extract AccessKeys from receipts (like we do in Indexer for Wallet)

We want to store AccessKey actions separately to query them in future like we do in Indexer for Wallet. In addition to Indexer for Wallet logic we also store receipts as is (see #2). That's why we want to introduce some sort of handler we can attach to receipts processing providing additional logic.

Need to discuss with @frol before starting

Visualize the database structure relations

We discussed this and already had a draft of the visualization.

Send a PR with the change to the README (put the picture itself into the migrations folder or create a dedicated folder like docs in the root of the project -- both solutions look suboptimal to me, so if you have a better option, go for it).

Keep the `created_by_receipt_id` for implicit account pointing to the first TRANSFER action rather updating it on every TRANSFER action

@khorolets we had this conversation before, but it seems that the current behavior confuses end-users (near/near-explorer#551). Updating the creation receipt id for implicit accounts causes confusion as to why there are transactions that are before the "creation" transaction. In the common case, people would rather expect to see that the very first transaction causes the creation; in our case we should make sure that we handle the account deletion properly, but we should be able to only update the creation receipt id when deletion receipt id is not NULL (in which case we want to set it to NULL and update the creation receipt id).

`accounts.last_update_block_height` does not show corresponding data

Just checked on the data:
accounts.last_update_block_height is equal to accounts.deleted_by_receipt_id if the account was deleted; otherwise, it is equal to accounts.created_by_receipt_id.
Intuitively, I was waiting for the block height with the last update on this account. By the way, I am not sure we need this info since it will produce many updates in DB.

I suggest to get rid of this column and add columns created_at_block_height and deleted_at_block_height.
We have to make 4 joins to find this info right now 😢

Add index on receipts.originated_from_transaction_hash

I have just realized that we miss the index. It is super important for Explorer needs to pull the receipts for a given transaction.

account_changes table: fill caused_by_transaction_hash field

If caused_by_receipt_id is filled, it means that we have parent transaction. Even if it's not direct parent, anyway it could be extremely useful to see transaction_hash.

I usually copy caused_by_receipt_id , make select from receipts, take transaction_hash and then analyse the transaction. It would be cool to automate that.

Add `author_account_id` column to blocks and chunks

Denormalize `action_receipt_actions` table in order to improve common queries performance

add columns to action_receipt_actions table and enable indices for them:
- predecessor_account_id
- receiver_account_id
- included_in_block_timestamp
The motivation is that JOINing with receipts table while doing filtering like receiver_account_id = ? AND action_kind = ? AND args = ? is slow. @vgrichina Could you provide a minimal useful query?

Ideally, we want to have cross-table indices, but Postgres does not support that, though Postgres 12 and 13 claim to improve various aspects of index performance, so we may want to try to migrate our Postgres 11 database to Postgres 13 to see if that helps.
add index on args->>'method_name' for FUNCTION_CALL actions (conditional index in Postgres)
try to decode base64-encoded args->>args_base64 to JSON (replace the original base64 field with args->>args_json if parsing succeeds), so ultimately we will either have args_base64 (e.g. rainbow bridge uses borsh instead of JSON args) or args_json (commonly used by NEAR smart contracts)
- consider adding indices to args->>'args_json'->>'receiver_id' and args->>'args_json'->>'amount' for FT and NFT

Run Indexer for Explorer for each network

testnet
mainnet

Investigate the missing parent receipt/transaction

There are hundreds of receipts that we cannot currently properly store in the DB due to them failing to find their parent.

It is unclear what is the cause at the moment.

Introduce a CLI option to force syncing blocks when still in "is_syncing" mode

It is useful for debugging purposes and fast iterations on syncing specific data that does not require full sync.

--index-without-waiting-for-full-network-sync (it is a placeholder name 😄)

NOTE: this will require the option implemented on the nearcore streamer side as well.

bug: deleted_by_receipt_id for access_keys of deleted account should not be null

@vgrichina asked in Discord

I wonder what is the expected behavior of indexer when account is deleted?

I've just tried creating test-indexer-keys.vg with single key (ed25519:FUMSeS84AsKAq8ydAWiE5io1CxC9NSdAW74EGPyxmKPL).

Then I deleted the account, however deleted_by_receipit_id is still null. Is this by design?

I wonder if there is any way to determine from indexer DB whether access key is still active.

I have clarified the flow with @frol and we agreed that Indexer should set deleted_by_receipt_id for access keys of the deleting account with receipt id which deleted the account.

It will require additional re-index from genesis to fullfill the field for missing access keys (on mainnet it will take ~3 days to complete)

Resolve conflicting/duplicating CreateAccount/DeleteAccount actions

The problem is two-fold:

Use the logic we have for access keys to resolve the updates to the same account (getting multiple CreateAccount / DeleteAccount actions in a single chunk we need to squash those changes to a single INSERT or UPDATE query)
Come up with the solution to make the updates/inserts idempotent to avoid the situation when two concurrent nodes may rewrite each other changes

My proposed solution is to introduce last_update_block_height into accounts/access_keys tables so we can rely on it, and instead of doing:

UPDATE access_keys
SET
    deleted_at_receipt_id = 'abcd'
WHERE
    account_id = 'frol.near'
    AND
    public_key = 'ed25519:...'

we do

UPDATE access_keys
SET
    deleted_at_receipt_id = 'abcd',
    last_update_block_height = :current_block_height
WHERE
    account_id = 'frol.near'
    AND
    public_key = 'ed25519:...'
    AND
    block_height < :current_block_height

This way it will successfully update the record if the last update was before the current block height, and silently ignore the update if we are behind the latest state in the database.

NOTE: I have not put any thought into the INSERT ... ON CONFLICT statements that we need when we create the account, so you will need to design the solution yourself.

feat: Store account changes

StreamerMessage also returns state_changes that also should be stored in database

Investigate missing chunks

While syncing testnet from genesis, we've encountered a lot retries for storing receipts, they are infinite because they can't find the parent. Investigation of the reason showed that the parent transactions for these receipts were not stored and then we've found out that the chunk which contains particular parent transactions hasn't been added to database.

For example this one chunk hash H6MvjzT6JtqzuJiPNGennqBDhtqXfDcb2e8cL5NLNcgU (block height 22303582)

We need to investigate why Indexer hasn't store this chunks and similar ones and fix it.

/cc @frol

Rename account ID columns to include `_account_id` suffix

I stumbled on predecessor_id and receiver_id, but it seems we have more.

Weird balance changes in the moment of VALIDATOR_ACCOUNTS_UPDATE

In the moment of reward, liquid amount for jazza.poolv1.near raised dramatically, and stake decreased. I can't explain the process behind both of these changes. @bowenwang1996 could you please help?

select * from account_changes
where account_changes.changed_in_block_timestamp > 1615775643106397811 AND account_changes.changed_in_block_timestamp < 1615812599676806812 
AND account_changes.affected_account_id = 'jazza.poolv1.near'
order by account_changes.changed_in_block_timestamp;

liquid	stake
`203513788946994521008902241`	`3277075738533977555810040143262`
`741399745512700845188547240`	`3277075737545599543519640143262`

Make the updates/inserts idempotent on access keys

Come up with the solution to make the updates/inserts idempotent to avoid the situation when two concurrent nodes may rewrite each other changes

See #32 for details

feat: Add accounts from genesis on Indexer start

In order to add all the accounts from genesis into database in indexer we need to introduce genesis handler, which will add all the accounts from genesis file to database on Indexer start

Improve Access Key serialization in AddKey actions

Currently, we store the serialized action argument with AccessKeyView as

{
  "access_key": {
    "nonce": 0,
    "permission": "FullAccess"
  },
  "public_key": "ed25519:4Rx5XC1Nu1q4TvNatTpokM8mrWGLiCYtHccWpu3VaDJd"
}

and

{
  "access_key": {
    "nonce": 0,
    "permission": {
      "FunctionCall": {
        "allowance": "100500",
        "receiver_id": "contract-name",
        "method_names": ["..."]
      }
    }
  },
  "public_key": "ed25519:4Rx5XC1Nu1q4TvNatTpokM8mrWGLiCYtHccWpu3VaDJd"
}

Notice that the permissions value can be either a string or an object. We should have a unified structure using serde tagger enum representation (and I also would prefer to have the const strings to be in SCREAMING_SNAKE_CASE):

{
  "access_key": {
    "nonce": 0,
    "permission_kind": "FULL_ACCESS"
  },
  "public_key": "ed25519:4Rx5XC1Nu1q4TvNatTpokM8mrWGLiCYtHccWpu3VaDJd"
}

and

{
  "access_key": {
    "nonce": 0,
    "permission_kind": "FUNCTION_CALL",
    "permission_details": {
      "allowance": "100500",
      "receiver_id": "contract-name",
      "method_names": ["..."]
    }
  },
  "public_key": "ed25519:4Rx5XC1Nu1q4TvNatTpokM8mrWGLiCYtHccWpu3VaDJd"
}

The following config should do the trick:

#[serde(tag = "permission_kind", content = "permission_details", rename_all = "SCREAMIING_SNAKE_CASE")]

, but we cannot apply it directly to the PermissionView enum, so we will need to do some code duplication and implement From trait to convert the nearcore primitive structure into our Indexer custom implementation.

refactor: Upgrade Indexer for Explorer to use NEAR Indexer Framework of version 0.2.x

There is up to date version on NEAR Indexer Framework that returns local_receipts.

We need to use commit hash 8bf9b050e1173aad3d5cdc31de00f3abfb4a0718 in dependency

Add indecies on action_kind and status (execution_outcome_status)

We need the following indecies for more efficient lookups on Explorer side (near/near-explorer#575)

transaction_actions.action_kind
action_receipt_actions.action_kind
execution_outcomes.status

feat: Include implicit accounts to Accounts handler

Implicit accounts are created by Transfer receipt action where receiver_id is 64-length AccountId wich can be decoded into access key

for ref: near/near-indexer-for-wallet#23

near / near-indexer-for-explorer Goto Github PK

near-indexer-for-explorer's People

Stargazers

Watchers

Forkers

near-indexer-for-explorer's Issues

Previous attempts

Useful materials

Recommend Projects

Recommend Topics

Recommend Org