Comments (11)
out of sync partitions has been a huuuge issue for me. Now showing downstream partitions as being out of sync has been deeply impactful (in a bad way). Not just when incrementing the version but also just updating parent partitions
from dagster.
I just found out Dagster IS aware of out of sync partitions
Please, someone let me know what needs to happen to get a fix for this. These green dots must be yellow. The ability to rebuild outdated partitions is soooooo valuable.
from dagster.
I just tried updating first
to v2
and it still says everything is up to date ...
@asset(
code_version="v2",
partitions_def=test_partitions,
)
def first():
return "first"
from dagster.
I also get this error
When selecting "only backfill missing or failed assets" and clicking "preview":
from dagster.
I can confirm that everything behaves as expected in a non-partitioned job. I don't see this limitation anywhere on the docs, however.
from dagster.
Could be related: #22553
from dagster.
Is it just the UI that's broken or is there a fundamental problem in the backend?
from dagster.
This query returns the correct information indicating that dagster does know which assets are stale for each partition.
query AssetsByGroup($groupName: String!) {
assetNodes(group: {
groupName: $groupName,
repositoryName:"__repository__",
repositoryLocationName:"your_pkg.defs"
}) {
id
assetKey {
path
}
staleStatusByPartition(partitions:[
"first",
"second"
])
}
}
What I am unsure about is how to launch a materialization for many partitions and have each run only include the un-synced assets for each partition.
from dagster.
Looks like the GraphQL schema is built with a strong coupling to time-based partitions (which does not align with my system):
input PartitionsByAssetSelector {
assetKey: AssetKeyInput!
partitions: PartitionsSelector
}
input PartitionsSelector {
range: PartitionRangeSelector!
}
When launching a backfill, you can't specify a list of partitions per asset. You can only specify a range. I am seeing this over-fitting to time-based partitions a lot in Dagster's design.
from dagster.
Oh actually, it looks like I can use assetSelection
and partitionNames
along with batching to achieve this behavior.
Here's a prototype that materializes stale partitions of assets in a group: https://gist.github.com/sam-goodwin/d8dd76ad58a241cdb14deba9cb53c2bf
Note
It makes the assumption that the partitioning scheme of each asset in a group is the same (this may not be true for you)
from dagster.
Just discovered that the following GraphQL query is extremely slow and can't be executed in parallel because it will crash dagster's SQL database:
query AssetStaleStatus(
$groupName: String!,
$assetKey: AssetKeyInput!,
$partitionKeys: [String!]!,
$repositoryLocation: String!
) {
assetNodes(group: {
groupName: $groupName,
repositoryName:"__repository__",
repositoryLocationName: $repositoryLocation
}, assetKeys: [$assetKey]) {
id
staleStatusByPartition(partitions: $partitionKeys)
}
}
from dagster.
Related Issues (20)
- Regression (v1.7.8) - Increased Time Required to start a dbt Project HOT 8
- Pull dagster-databricks Pipes into it's own library to remove dependency on Spark
- [UI] Comma after surname included in catalog greeting
- Add useful context in tooltip for assets outside of the current selection
- [ui][1.7.11] jobs page misbehaving HOT 1
- Overload of warning logs when using dbt-core 1.8.* HOT 6
- Missing imports in `Refactoring assets to use resources` in Dagster Essentials HOT 1
- k8sRunLauncher.runK8sConfig.containerConfig overridden by dagster-user-deployments.deployments[0].resources HOT 4
- Dagster DBT doesn't load list of tags from dbt into dagster
- dbt custom schema not working in dagster
- Unexpected behaviour with PermissiveConfig
- Add button to skip queue for a run
- Automation UI not working with new AutomationConditions HOT 6
- Upstream_output.definition_metadata doesn't contain the metadata since 1.7.11 for Source Assets [REGRESSION] HOT 2
- unexpected error HOT 1
- Does Dagster support strategies like FIFO, FAIR, CAPACITY, etc. ? HOT 1
- `AutomationCondition.since_last_requested` has strange behavior HOT 1
- Cannot deploy to Dagster+ Serverless using pex because it targets `manylinux2014`
- Dagster 1.6.3 and later import pytest (when available) HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dagster.