Git Product home page Git Product logo

dune-sync's Introduction

dune-sync

Components for syncing off-chain data with Dune Community Sources

Local Development

  1. Clone Repo git clone [email protected]:cowprotocol/dune-sync.git
  2. If using VS Code, open in devcontainer to ensure same setup as the final container.
  3. Several Makefile Commands make XXX Key make commands are; install, check, test

To execute the main binary (inside the container) run

python3 -m src.main --sync-table <job>

Docker

Build

docker build -t local_dune_sync .

You must provide valid environment variables as specified in .env.sample

Run Local

docker run -v ${PWD}/data:/app/data --env-file .env local_dune_sync

Run Remote

You will need to attach a volume and have an env file configuration. This example

  • mounts $PWD/data
  • and assumes .env file is in $PWD
docker run -v ${PWD}/data:/app/data --env-file .env ghcr.io/cowprotocol/dune-sync:latest

Breaking Changes

Whenever the schema changes, we must coordinate with Dune that the data must be dropped and the table rebuilt. For this we have provided a script scripts/empty_bucket.py which can be called to delete all data from their buckets and our backup volume. This should only be run whilst in coordination with their team about the changes. They will "stop the stream", drop the table on their side and restart the stream. In the event that a hard reset is performed without the proper coordination, it is likely that duplicate records will appear in their production environment (i.e. the interface).

So, the process is:

  • Contact a Dune Team Member (@dsalv)
  • Mention that we need to rebuild table XYZ (because of a schema change)
  • Once they are aware/prepared run
docker run -v ${PWD}/data:/app/data \
    --env-file .env \
    ghcr.io/cowprotocol/dune-sync \
    --sync-table SYNC_TABLE

This will empty the buckets and repopulate with the appropriate changes.

dune-sync's People

Contributors

harisang avatar fhenneke avatar bh2smith avatar fleupold avatar martinquaxd avatar

Stargazers

Nimra Tariq avatar  avatar Susmeet Jain avatar

Watchers

 avatar  avatar

Forkers

0xgoerlimainnet

dune-sync's Issues

Simplify Async Function Logic

I know this wasn't implemented here but I just remembered about this. For discussion sake: Have you checked out the backoff package?

I believe it would make this code a lot easier to understand and shorter. Check it out and let me know what you think about it.

Originally posted by @gentrexha in #45 (comment)

Fix Genesis Block

genesis_block=16691686, # First Recorded Batch Reward block
# TODO - use correct genesis block here. (note this is not actually determined yet)

When ready we should set the first sync block for this script. Doesn't matter if its super accurate since we won't use earlier blocks. But it should start from a point we know the data is correct.

Update code to work with latest version black

Version 24 of black wants to format some files differently and makes our CI fail. We will pin black at version 23, see #74 . But at some point we should switch to the newest version and fix all files.

Inconsitent types for order and batch rewards

We currently upload amounts in batch rewards as integers (see here) and the amounts for order rewards as strings (see here). If there is a technical reason for this we should add a comment in the code. If there is no such reason, we could change the code to be more consistent.

The syncing of quoting data is now failing due to changes in our database.

Otex mentioned this morning that we are not syncing which solver quoted each order anymore on Dune, and this seems to have started after yesterday's weekly release was applied.

Not sure if this is only due to the following, but at least one change is in the order_execution table, where we now only have entries for limit orders but not market orders. This means we need to adjust the queries here in order to properly recover the quoter of each order and continue pushing this information on Dune.

Delays when syncing data

After PR #71, which fixed the issue with the backend database change, it seems to me that we might sync data with signficant delay, as the syncing, I believe, now depends on both prod and staging having at least one settlement every few hours. The reason for that, I think, is the min operator in this line:

return min(int(barn["latest"][0]), int(prod["latest"][0])) - REORG_THRESHOLD

I would propose that we actually change it to max now. Let me know what you think @fhenneke

All logs are going to `stderr`

dune-sync pods are currently flooding #alerts-prod because all logs are going to stderr.
This can be seen in our logs here.
This is true for batch-rewards, app-data, app-data and probably all dune-sync pods.

Failing unit test

@fhenneke first pointed out that there is some test that is now failing. Since I am personally a bit ignorant about this, any help would be appreciated. One option would be to just remove this test. cc @bh2smith @MartinquaXD

`=================================== FAILURES ===================================
__________________________ TestIPFS.test_get_content ___________________________

self = <tests.unit.test_ipfs.TestIPFS testMethod=test_get_content>

def test_get_content(self):
    self.assertEqual(
        {
            "version": "0.1.0",
            "appCode": "CowSwap",
            "metadata": {
                "referrer": {
                    "version": "0.1.0",
                    "address": "0x424a46612794dbb8000194937834250Dc723fFa5",
                }
            },
        },
        Cid.old_schema(
            "3d876de8fcd70969349c92d731eeb0482fe8667ceca075592b8785081d630b9a"
        ).get_content(ACCESS_KEY, max_retries=10),
    )
       self.assertEqual(
        {
            "version": "1.0.0",
            "appCode": "CowSwap",
            "metadata": {
                "referrer": {
                    "kind": "referrer",
                    "referrer": "0x8c35B7eE520277D14af5F6098835A584C337311b",
                    "version": "1.0.0",
                }
            },
        },
        Cid.old_schema(
            "1FE7C5555B3F9C14FF7C60D90F15F1A5B11A0DA5B1E8AA043582A1B2E1058D0C"
        ).get_content(ACCESS_KEY),
    )
 AssertionError: {'version': '1.0.0', 'appCode': 'CowSwap'[122 chars]0'}}} != None

tests/unit/test_ipfs.py:73: AssertionError
=========================== short test summary info ============================
FAILED tests/unit/test_ipfs.py::TestIPFS::test_get_content - AssertionError: {'version': '1.0.0', 'appCode': 'CowSwap'[122 chars]0'}}} != None`

Remove Warehouse db

As we want to simplify slippage accounting, we decided, for now, to stop running the internal imbalances job, and shut down the related database. Fixes in the tests in this repo are needed for any future PR to pass all tests.

Handle Race Condition

Currently, if running this from two different machines, the state is stored on a local volume and we run the risk up uploading duplicated content. To resolve this, we should fetch our latest sync block from the AWS bucket instead of our own local volume.

[Epic] Solver Auction CIP

Solvers are changing their reward structure (yet again) and we will need to adapt what information is synced.

  • Create necessary tables in Orderbook Database (cowprotocol/services#1166)
  • Query for info to be streamed into Dune Community Sources (#26)
  • Construct appropriate Dune Schema for Batch Rewards (#28)
  • Deployment - Coordinate with Dune Team (schema and services)
  • Write Spell to unpack the JSON content being added (similar to this and sketched here)
  • Update solver-rewards repo to read from this. (cowprotocol/solver-rewards#199)

Last Sync - Improvements

@MartinquaXD raised some good (non-blocking) points of improvement for the already merged code in #12. Here I will try to summarize;

  1. No need for Optional Block Number. This is only used for testing purposes and the tests could be adapted instead of the return type.
  2. Throw error on invalid table name. I think the problem with this may be that our test tables write to a different directory (intentionally) so not to interfere with real data. I think we may want to keep this.... not sure.
  3. Look at when KeyError occurs
  4. Use Contract Deployment as default genesis block

I think items 1 and 4 are the most straight forward to change first, still not sure yet about the other two.

MyPy & Pandas Issue with dtype in `read_sql_query`

MyPy "Bug"

I believe there is a bug in mypy on the dtype field.
They expect types that I believe match what we pass,
but also don't provide a public interface to import their types (also np.int64 doesn't work either).

src/fetch/orderbook.py:51: error: Argument "dtype" to "read_sql_query" has incompatible type "Dict[str, str]"; expected "Optional[Union[Union[ExtensionDtype, Union[str, dtype[generic], Type[str], Type[complex], Type[bool], Type[object]]], Dict[Any, Union[ExtensionDtype, Union[str, dtype[generic], Type[str], Type[complex], Type[bool], Type[object]]]]]]"  [arg-type]

We ignore types on this line only for now.

Originally posted by @bh2smith in #23 (comment)

Additional context:

When attempting to import expected type declarations, like DType, we are faces with private interface pandas._typing

Pandas "Bug"

Furthermore, the type issue arises from concatenating two similar data frames (one which is empty).

  1. If the dataframe has two fields (one integer and one float)
  2. One of the two data frames is empty (so that without specifying types) the emptiness defaults to "object" type
  3. After concatenation of the data frames, the float field remains float, but the integer field becomes "object"

Error msg about DataFrame concatenation

For the past few weeks, we have observed this error showing up in the logs.

ERROR (pod: dune-sync-order-rewards): /app/src/fetch/orderbook.py:91: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.

Although it doesn't seem to affect anything for now, we should address this soon.

Update AppData Fetching

The backend is going to start encoding app data differently and we will need to change the inverse mapping

appHash --> CID

@vkgnosis shared the rust code that already does this here

and it appears we will just have to change the prefix here.

However we will need to continue to support the old style (forever) because old app hashes will always exist. Unless they plan to migrate all the old files to the new schema.

In terms of support there will be a block number for which the new content starts but is mixed (deployed in staging) with old and another block where we only support new app hashes. The code will need to know these blocks say (left, right) so that we use old on blocks below left, both on block between left and right and new on blocks after right. Might actually be easier to just check both always (with priority on the new schema).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.