Git Product home page Git Product logo

Comments (11)

ClaytonSmith avatar ClaytonSmith commented on July 18, 2024 2

out of sync partitions has been a huuuge issue for me. Now showing downstream partitions as being out of sync has been deeply impactful (in a bad way). Not just when incrementing the version but also just updating parent partitions

from dagster.

ClaytonSmith avatar ClaytonSmith commented on July 18, 2024 1

I just found out Dagster IS aware of out of sync partitions

image

Please, someone let me know what needs to happen to get a fix for this. These green dots must be yellow. The ability to rebuild outdated partitions is soooooo valuable.

from dagster.

sam-goodwin avatar sam-goodwin commented on July 18, 2024

I just tried updating first to v2 and it still says everything is up to date ...

@asset(
    code_version="v2",
    partitions_def=test_partitions,
)
def first():
    return "first"

Screenshot 2024-06-25 at 11 04 45 AM

from dagster.

sam-goodwin avatar sam-goodwin commented on July 18, 2024

I also get this error

image

When selecting "only backfill missing or failed assets" and clicking "preview":

image

from dagster.

sam-goodwin avatar sam-goodwin commented on July 18, 2024

I can confirm that everything behaves as expected in a non-partitioned job. I don't see this limitation anywhere on the docs, however.

from dagster.

garethbrickman avatar garethbrickman commented on July 18, 2024

Could be related: #22553

from dagster.

sam-goodwin avatar sam-goodwin commented on July 18, 2024

Is it just the UI that's broken or is there a fundamental problem in the backend?

from dagster.

sam-goodwin avatar sam-goodwin commented on July 18, 2024

This query returns the correct information indicating that dagster does know which assets are stale for each partition.

query AssetsByGroup($groupName: String!) {
  assetNodes(group: {
    groupName: $groupName,
    repositoryName:"__repository__",
    repositoryLocationName:"your_pkg.defs"
  }) {
    id
    assetKey {
      path
    }
    staleStatusByPartition(partitions:[
      "first",
      "second"
    ])
  }
}

What I am unsure about is how to launch a materialization for many partitions and have each run only include the un-synced assets for each partition.

from dagster.

sam-goodwin avatar sam-goodwin commented on July 18, 2024

Looks like the GraphQL schema is built with a strong coupling to time-based partitions (which does not align with my system):

input PartitionsByAssetSelector {
  assetKey: AssetKeyInput!
  partitions: PartitionsSelector
}

input PartitionsSelector {
  range: PartitionRangeSelector!
}

When launching a backfill, you can't specify a list of partitions per asset. You can only specify a range. I am seeing this over-fitting to time-based partitions a lot in Dagster's design.

from dagster.

sam-goodwin avatar sam-goodwin commented on July 18, 2024

Oh actually, it looks like I can use assetSelection and partitionNames along with batching to achieve this behavior.

Here's a prototype that materializes stale partitions of assets in a group: https://gist.github.com/sam-goodwin/d8dd76ad58a241cdb14deba9cb53c2bf

Note

It makes the assumption that the partitioning scheme of each asset in a group is the same (this may not be true for you)

from dagster.

sam-goodwin avatar sam-goodwin commented on July 18, 2024

Just discovered that the following GraphQL query is extremely slow and can't be executed in parallel because it will crash dagster's SQL database:

query AssetStaleStatus(
    $groupName: String!,
    $assetKey: AssetKeyInput!,
    $partitionKeys: [String!]!,
    $repositoryLocation: String!
) {
  assetNodes(group: {
    groupName: $groupName,
    repositoryName:"__repository__",
    repositoryLocationName: $repositoryLocation
  }, assetKeys: [$assetKey]) {
    id
    staleStatusByPartition(partitions: $partitionKeys)
  }
}

from dagster.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.