Git Product home page Git Product logo

dune-bridge's Introduction

Data Backend Serving Dune Downloads

Data flow for the dune-bridge

There are 3 different data steams that need to be transferred between the backend and the dune queries:

  • appData hashes that were sent with transactions on the ethereum chain: They are needed by the backend to look up the corresponding ipfs files, in order to index the meta-data of an order.
  • appData-referral mapping: Once the backend parsed the appData and read the corresponding ipfs files, it is able to create a mapping from the appData to the referrals. This information is needed to build the correct dune queries for the dune download.
  • main-dune-query-data: The main dune query calculates the trading volumes for the referrals, the usual trading volumes and some other key metrics. Since the calculation and download of the complete data for each user takes quite some time, the download is split for the current day and the rest of the entire history
    • entire history download: this file is only created once and it will reflect the complete history of trading data until a certain date. From that date, only daily downloads will be taken
    • daily downloads: Every 30 mins, a new daily download with the newest data for the day will be fetched from dune and the data in the backend is updated.

In the first version all the data is stored in simple json files. Later, we will consider building real databases. The data flows are driven by 2 different cronjobs. The first job updates and executes the queries for the appData and the main-dune-query for the daily download, each 30 mins. Then 15 mins later, a second job is starting the download of the query results. The backend-api will continuously look for new downloads from dune in a maintenance loop and read the new data, serve it via an api and create new appData-referral mappings.

Instructions for getting data from dune

installation

Preparations:

cd dune_api_scripts
python3 -m venv env
source ./env/bin/activate
pip install -r requirements.txt

Setting some envs:

cp .env.example .env

and adjust the values.

Download data:

Pulling new query results:

python -m dune_api_scripts.store_query_result_all_distinct_app_data
python -m dune_api_scripts.store_query_result_for_todays_trading_data
python -m dune_api_scripts.store_query_result_for_entire_history_trading_data

The last command might take a while, as downloading the whole history takes quite some time.

Alternatively, the scripts can also be run via docker:

docker build -t fetch_script -f ./docker/Dockerfile.binary .
docker run -e DUNE_PASSWORD=<pwd> -e [email protected] -e REFERRAL_DATA_FOLDER=/usr/src/app/data/ -v ./data/:/usr/src/app/data -ti fetch_script /bin/sh

Instructions for running the api

Running the api with the data form user_data.json:

cargo run

and then check the local endpoint like this:

http://127.0.0.1:8080/api/v1/profile/0xa4a6ef5c494091f6aeca7fa28a04a219dd0f31b5
or
http://127.0.0.1:8080/api/v1/profile/0xe7207afc5cd57625b88e2ddbc4fe9de794a76b0f

Alternatively, the code can also be run via docker:

  1. Running api
docker build -t gpdata -f docker/Dockerfile.binary . 
docker run -ti  -e DUNE_DATA_FOLDER='/usr/src/app/data'  gpdata gpdata           

dune-bridge's People

Contributors

josojo avatar bh2smith avatar fleupold avatar gentrexha avatar

Stargazers

Susmeet Jain avatar Michael Demarais avatar Samuel Safahi avatar Matthias Seitz avatar

Watchers

 avatar

dune-bridge's Issues

Use api auction endpoint instead of deprecated solvable_orders

Currently this project is using the solvable_orders api endpoint

async fn load_current_app_data_of_solvable_orders() -> Result<Vec<H256>> {
let urls = vec![
"https://protocol-mainnet.dev.gnosisdev.com/api/v1/solvable_orders".to_string(),
"https://protocol-mainnet.gnosis.io/api/v1/solvable_orders".to_string(),
];
let mut app_data: Vec<H256> = Vec::new();
for url in urls.iter() {
let solvable_orders_body = &make_api_request_to_url(url).await;
let new_app_data = match solvable_orders_body {
Ok(body) => parse_app_data_from_api_body(body)?,
Err(err) => {
tracing::debug!("Could not get solvable orders, due to error: {:}", err);
Vec::new()
}
};
app_data.extend(new_app_data);
}
tracing::debug!("Newly fetched app data is: {:?}", app_data);
Ok(app_data)
}
.

It should be using the get_auction https://protocol-rinkeby.dev.gnosisdev.com/api/#/default/get_api_v1_auction endpoint.

App Data Update (Prod/Staging)

Currently we are updating a single table in Dune. This means if we run the app data script in both production and staging the two different versions of the query will disagree with each other.

Proposed Solution

append a parameter to the table name

dune_user_generated.gnosis_protocol_v2_app_data_{environment}

where we pass staging/prod as parameter names.

Dune API errors

Our cron-jobs updating Dune data are currently failing with:

Dune API Request failed with errors [{'extensions': {'path': '$', 'code': 'validation-failed'}, 'message': 'query is not allowed'}]

It would be good to investigate why the "query is not allowed". If this is an intermittent issue, then we need to add some sort of retry mechanism because the alerts are quite noisy.

Migrate Reusable Code to dune-client

Several pieces introduced in #37 can be refined, generalized and moved into the dependency package dune-cli

In particular:

There are a few reasons why these have not yet been moved into the client outlined here. But, in essence, it is because they need to be refined a bit more before we can make then serve as a useful general purpose. For example, it isn't clear that either of QueryData or DuneFetcher are all that beneficial -- perhaps its better to just extend the Query and DuneClient classes to include the additional fields added in these "extended classes". Would love to open up a discussion about this (maybe in the Dune Client).

E2e test

We have testplans, how we are currently testing the code:

E.g. here is one example: #6

It would be great to automatized this test and put it as an e2e test

Idea - Add CID to AppData.

If the sync has already been deployed, this will require a schema change on Dune's side.

This is not a hard requirement, but could make data verification much easier from the User's perspective.

It is pretty easy to implement now if think we may want it.

more context in slack

[Chicken & Egg] BlockRange.block_from

Currently we get the "block_from", but opening a file in our own persisted storage, however this value can also be obtained by querying the table in dune itself. The table does not yet exist, but once it does we could experiment with fetching this value from one, or both places...

# TODO - could be replaced by Dune Query on the app_data table (once available).
block_from,
block_to=int(
# KeyError here means the query has been modified and column no longer exists
# IndexError means no results were returned from query (which is unlikely).
(await self.fetch(QUERIES["LATEST_APP_HASH_BLOCK"].query))[0][
"latest_block"
]
),
)

Blazing Fast V2 Referral & Trader Data Query

The following query should eliminate the need for the entire python portion of this project and also any component of the Rust project responsible for fetching and parsing App Data

The API part of this project would now only refresh this query (responsibly) say once per hours and be able to serve this data to the user account page on CoW.Fi

Observe the current end point is:

https://api.cow.fi/affiliate/api/v1/profile/0x8540f80fab2afcae8d8fd6b1557b1cf943a0999b

which can be compared with any of the trader addresses returned by the query:

V2 Referral Query

Note that, while working on this I have discovered what appears to be a bug in the existing API service/referral query.

Namely, that for 0x8540f80fab2afcae8d8fd6b1557b1cf943a0999b (the address in the link above) the new query returns non zero referral count and volume. I went to look at the trade history of these referrals and found that the traders first trade did, in fact, use this wallet as the referred on their first trade. So I can not understand why we would be neglecting this as a valid referral:

  1. Referral 1
  2. Referral 2
  3. Referral 3

cc @gentrexha and @josojo as something we should plan for in the beginning of January.

Dune Sync

Instructions from Dune:

AWS Stuff

  1. The data is written into Dune’s S3 buckets. We assume we host the data, but the data still belongs to you.
  2. Separate IAM users are created for the community data providers to write data. IAM roles created by data providers are added to the trust policy to allow them to assume IAM roles within our account. This security measure minimizes the number of different credentials maintained by data providers.
  3. Data providers should write objects with the bucket-owner-full-control canned ACL. -- #44
  4. Pre-specified External IDs must be used when assuming the role.

File Formating

  1. We use JSON format to write data into S3 buckets.
  2. The file names should contain a predefined constant prefix. (e.g. cow_{the rest of the filename}.json)
  3. We intend to keep filenames very simple. Filenames need to contain an increasing sequence number (e.g., cow_000000000000001.json, cow_000000000000001.json). We are releasing a new update to be able to test timestamps instead of sequence numbers in the filenames.
  4. The data is written into JSON files as JSON objects.
  5. The data in JSON files should not be enclosed in array brackets.
  6. Generally, we avoid updates to the written files. If you are producing data every minute, you can write separate files.
  7. Written data should follow an append-only approach. We can discuss strategies for data updates and deletion separately.
  8. Name/value pairs in JSON files should correspond to the column names and data values in the target data table.
  9. If you plan to write data into several tables, you can write data for different tables in different folders in the S3 bucket. (e.g. s3:::{bucket name}/{folder name}) - #44
  10. We only support predefined schemas at the moment. Data providers should define schemas of the final tables in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.