Comments (5)
Hrm, so the very easy thing to do is to assert the result of a python function onto each block without the barrier. Although in multi-threaded contexts this might get weird.
x = da.assert_(x, lambda block: block > 0, ValueError('x must be positive'))
Doing general full-array assertions is also doable (with a bit more complex graph magic) but would probably fill up cache space with the intermediate variables.
from dask.
@eric-czech brought this up today in a call. CC'ing him here.
@shoyer do you have any thoughts on how we would do this today? FWIW I think taht @eric-czech is operating on Dask under Xarray.
from dask.
👍
Yep, using Xarray over Dask. I'd love to be able to use lazy, runtime checks like that.
from dask.
Can you say a bit more about your needs @eric-czech ? Do you mostly need elementwise checks, or something more complicated? How would you like to spell these checks?
from dask.
Elementwise checks would cover the majority of cases I can think of as being useful. Checks on reductions would also be nice (i.e. sums along an axis equal 1), but not critical. I can see some value in making the assertion a terminal task as well, e.g.:
data = da.array(..., dtype=int)
mask = da.array(..., dtype=bool)
da.assertion(
data[mask].min(),
lambda v: v >= 0,
lambda v: ValueError(f'Data values must be >= 0 (found min value {v})')
)
# do stuff with data and mask but not data[mask]
as opposed to:
res = da.assertion(
data[mask].min(),
lambda v: v >= 0,
lambda v: ValueError(f'Data values must be >= 0 (found min value {v})')
)
# now I need to use `res` elsewhere in the graph for the
# assertion to fire, but I don't necessarily want to
from dask.
Related Issues (20)
- NumPy 2.0 support HOT 14
- Control order of execution with query planner HOT 2
- Unable to get series when previously filtered with datetime slice HOT 3
- Backport Python import error patch to 2024.2.1
- Turning off query planning is difficult HOT 1
- Unique Operation fails on dataframe repartitioned using set index after resetting the index HOT 1
- dd.concat returns empty tuple when axis=1 HOT 3
- Add a `dask.array.sample` functionality mirroring `dask.dataframe.sample` with an optional `ignore_nan` argument
- Inconsistency in ddf.astype(Arrow Dict) HOT 1
- CI is Failing HOT 4
- ddf.drop is inconsistent when passed a set of columns HOT 4
- test_division_or_partition in test_sql is failing for pandas 3
- Sorting by a categorical column doesn't always work
- Use case focused docs pages HOT 2
- TypeError: can only concatenate str (not "traceback") to str
- ⚠️ Upstream CI failed ⚠️
- Add support for `pip install dask[jobqueue]`
- Mean fails to compute for very large column of pyarrow type
- Previously working time series resampling breaks in new version of Dask HOT 1
- When using PyArrow dtypes, aggregations create NaNs of unexpected type HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask.