cowprotocol / dune-sync Goto Github PK

View Code? Open in Web Editor NEW

4.0 2.0 1.0 460 KB

Components for syncing off-chain data with Dune Community Sources

Makefile 0.78% Python 99.02% Dockerfile 0.20%

dune-sync's Introduction

dune-sync

Components for syncing off-chain data with Dune Community Sources

Local Development

Clone Repo git clone [email protected]:cowprotocol/dune-sync.git
If using VS Code, open in devcontainer to ensure same setup as the final container.
Several Makefile Commands make XXX Key make commands are; install, check, test

To execute the main binary (inside the container) run

python3 -m src.main --sync-table <job>

Docker

Build

docker build -t local_dune_sync .

You must provide valid environment variables as specified in .env.sample

Run Local

docker run -v ${PWD}/data:/app/data --env-file .env local_dune_sync

Run Remote

You will need to attach a volume and have an env file configuration. This example

mounts $PWD/data
and assumes .env file is in $PWD

docker run -v ${PWD}/data:/app/data --env-file .env ghcr.io/cowprotocol/dune-sync:latest

Breaking Changes

Whenever the schema changes, we must coordinate with Dune that the data must be dropped and the table rebuilt. For this we have provided a script scripts/empty_bucket.py which can be called to delete all data from their buckets and our backup volume. This should only be run whilst in coordination with their team about the changes. They will "stop the stream", drop the table on their side and restart the stream. In the event that a hard reset is performed without the proper coordination, it is likely that duplicate records will appear in their production environment (i.e. the interface).

So, the process is:

Contact a Dune Team Member (@dsalv)
Mention that we need to rebuild table XYZ (because of a schema change)
Once they are aware/prepared run

docker run -v ${PWD}/data:/app/data \
    --env-file .env \
    ghcr.io/cowprotocol/dune-sync \
    --sync-table SYNC_TABLE

This will empty the buckets and repopulate with the appropriate changes.

dune-sync's People

Contributors

Stargazers

Watchers

Forkers

0xgoerlimainnet

dune-sync's Issues

Attempt to fetch both appDatas in parallele

You could do it in parallel instead of only when the first try fails.

Originally posted by @vkgnosis in #44 (comment)

Simplify Async Function Logic

I know this wasn't implemented here but I just remembered about this. For discussion sake: Have you checked out the backoff package?

I believe it would make this code a lot easier to understand and shorter. Check it out and let me know what you think about it.

Originally posted by @gentrexha in #45 (comment)

Check how a block range is used in the order rewards sql query

After struggling to make an integration test work in PR #94, I decided to open this issue to revisit the way we handle the block range in the order rewards query.

Fix Genesis Block

dune-sync/src/sync/order_rewards.py

Lines 90 to 91 in 2d4cc5c

 genesis_block=16691686, # First Recorded Batch Reward block 

 # TODO - use correct genesis block here. (note this is not actually determined yet)

When ready we should set the first sync block for this script. Doesn't matter if its super accurate since we won't use earlier blocks. But it should start from a point we know the data is correct.

Update code to work with latest version black

Version 24 of black wants to format some files differently and makes our CI fail. We will pin black at version 23, see #74 . But at some point we should switch to the newest version and fix all files.

Inconsitent types for order and batch rewards

We currently upload amounts in batch rewards as integers (see here) and the amounts for order rewards as strings (see here). If there is a technical reason for this we should add a comment in the code. If there is no such reason, we could change the code to be more consistent.

The syncing of quoting data is now failing due to changes in our database.

Otex mentioned this morning that we are not syncing which solver quoted each order anymore on Dune, and this seems to have started after yesterday's weekly release was applied.

Not sure if this is only due to the following, but at least one change is in the order_execution table, where we now only have entries for limit orders but not market orders. This means we need to adjust the queries here in order to properly recover the quoter of each order and continue pushing this information on Dune.

Delays when syncing data

After PR #71, which fixed the issue with the backend database change, it seems to me that we might sync data with signficant delay, as the syncing, I believe, now depends on both prod and staging having at least one settlement every few hours. The reason for that, I think, is the min operator in this line:

dune-sync/src/fetch/orderbook.py

Line 72 in d6f34e0

return min(int(barn["latest"][0]), int(prod["latest"][0])) - REORG_THRESHOLD

I would propose that we actually change it to max now. Let me know what you think @fhenneke

Setup Mock Orderbook DBs for unit testing

In #3, we introduce some tests that are actually "integration tests", however, if we want to unit test this without relying on external sources, we should prepare two DBs (barn and orderbook) -- both populated with necessary seed data.

This will be similar to what we have done here:

https://github.com/cowprotocol/solver-rewards/tree/main/tests/db

Check if tests for orderbook queries are still correct

Some tests were introduced in PR #64 , but due to a slight change in the volume cap with PR #67, some of the tests might not be correct anymore.

All logs are going to `stderr`

dune-sync pods are currently flooding #alerts-prod because all logs are going to stderr.
This can be seen in our logs here.
This is true for batch-rewards, app-data, app-data and probably all dune-sync pods.

Try implementing backoff package into dune-sync

Based on this conversation here with @bh2smith

Failing unit test

@fhenneke first pointed out that there is some test that is now failing. Since I am personally a bit ignorant about this, any help would be appreciated. One option would be to just remove this test. cc @bh2smith @MartinquaXD

`=================================== FAILURES ===================================
__________________________ TestIPFS.test_get_content ___________________________

self = <tests.unit.test_ipfs.TestIPFS testMethod=test_get_content>

def test_get_content(self):
    self.assertEqual(
        {
            "version": "0.1.0",
            "appCode": "CowSwap",
            "metadata": {
                "referrer": {
                    "version": "0.1.0",
                    "address": "0x424a46612794dbb8000194937834250Dc723fFa5",
                }
            },
        },
        Cid.old_schema(
            "3d876de8fcd70969349c92d731eeb0482fe8667ceca075592b8785081d630b9a"
        ).get_content(ACCESS_KEY, max_retries=10),
    )
       self.assertEqual(
        {
            "version": "1.0.0",
            "appCode": "CowSwap",
            "metadata": {
                "referrer": {
                    "kind": "referrer",
                    "referrer": "0x8c35B7eE520277D14af5F6098835A584C337311b",
                    "version": "1.0.0",
                }
            },
        },
        Cid.old_schema(
            "1FE7C5555B3F9C14FF7C60D90F15F1A5B11A0DA5B1E8AA043582A1B2E1058D0C"
        ).get_content(ACCESS_KEY),
    )

 AssertionError: {'version': '1.0.0', 'appCode': 'CowSwap'[122 chars]0'}}} != None

tests/unit/test_ipfs.py:73: AssertionError
=========================== short test summary info ============================
FAILED tests/unit/test_ipfs.py::TestIPFS::test_get_content - AssertionError: {'version': '1.0.0', 'appCode': 'CowSwap'[122 chars]0'}}} != None`

IPFS Migration URL Change?

dune-sync/src/fetch/ipfs.py

Lines 42 to 44 in 40282bb

 def url(self) -> str: 

 """IPFS URL where content can be recovered""" 

 return f"https://gnosis.mypinata.cloud/ipfs/{self}"

Not sure, but I guess something needs to happen here. cc @anxolin and @olgafetisova

Remove Warehouse db

As we want to simplify slippage accounting, we decided, for now, to stop running the internal imbalances job, and shut down the related database. Fixes in the tests in this repo are needed for any future PR to pass all tests.

Handle Race Condition

Currently, if running this from two different machines, the state is stored on a local volume and we run the risk up uploading duplicated content. To resolve this, we should fetch our latest sync block from the AWS bucket instead of our own local volume.

[Epic] Solver Auction CIP

Solvers are changing their reward structure (yet again) and we will need to adapt what information is synced.

Create necessary tables in Orderbook Database (cowprotocol/services#1166)
Query for info to be streamed into Dune Community Sources (#26)
Construct appropriate Dune Schema for Batch Rewards (#28)
Deployment - Coordinate with Dune Team (schema and services)
Write Spell to unpack the JSON content being added (similar to this and sketched here)
Update solver-rewards repo to read from this. (cowprotocol/solver-rewards#199)

Last Sync - Improvements

@MartinquaXD raised some good (non-blocking) points of improvement for the already merged code in #12. Here I will try to summarize;

No need for Optional Block Number. This is only used for testing purposes and the tests could be adapted instead of the return type.
Throw error on invalid table name. I think the problem with this may be that our test tables write to a different directory (intentionally) so not to interfere with real data. I think we may want to keep this.... not sure.
Look at when KeyError occurs
Use Contract Deployment as default genesis block

I think items 1 and 4 are the most straight forward to change first, still not sure yet about the other two.

MyPy & Pandas Issue with dtype in `read_sql_query`

MyPy "Bug"

I believe there is a bug in mypy on the dtype field.
They expect types that I believe match what we pass,
but also don't provide a public interface to import their types (also np.int64 doesn't work either).

src/fetch/orderbook.py:51: error: Argument "dtype" to "read_sql_query" has incompatible type "Dict[str, str]"; expected "Optional[Union[Union[ExtensionDtype, Union[str, dtype[generic], Type[str], Type[complex], Type[bool], Type[object]]], Dict[Any, Union[ExtensionDtype, Union[str, dtype[generic], Type[str], Type[complex], Type[bool], Type[object]]]]]]"  [arg-type]

We ignore types on this line only for now.

Originally posted by @bh2smith in #23 (comment)

Additional context:

When attempting to import expected type declarations, like DType, we are faces with private interface pandas._typing

Pandas "Bug"

Furthermore, the type issue arises from concatenating two similar data frames (one which is empty).

If the dataframe has two fields (one integer and one float)
One of the two data frames is empty (so that without specifying types) the emptiness defaults to "object" type
After concatenation of the data frames, the float field remains float, but the integer field becomes "object"

Error msg about DataFrame concatenation

For the past few weeks, we have observed this error showing up in the logs.

ERROR (pod: dune-sync-order-rewards): /app/src/fetch/orderbook.py:91: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.

Although it doesn't seem to affect anything for now, we should address this soon.

Update AppData Fetching

The backend is going to start encoding app data differently and we will need to change the inverse mapping

appHash --> CID

@vkgnosis shared the rust code that already does this here

and it appears we will just have to change the prefix here.

However we will need to continue to support the old style (forever) because old app hashes will always exist. Unless they plan to migrate all the old files to the new schema.

In terms of support there will be a block number for which the new content starts but is mixed (deployed in staging) with old and another block where we only support new app hashes. The code will need to know these blocks say (left, right) so that we use old on blocks below left, both on block between left and right and new on blocks after right. Might actually be easier to just check both always (with priority on the new schema).

	genesis_block=16691686, # First Recorded Batch Reward block
	# TODO - use correct genesis block here. (note this is not actually determined yet)

	def url(self) -> str:
	"""IPFS URL where content can be recovered"""
	return f"https://gnosis.mypinata.cloud/ipfs/{self}"