Comments (3)
Thx for the report and the digging in. I'll put up a fix
from dask.
Computing the dataframe works:
>>> table.compute()
x y genes
0 0.254246 0.977669 a
1 0.776803 0.776138 a
2 0.945980 0.821877 a
3 0.344795 0.886264 a
4 0.825084 0.915373 a
5 0.254027 0.482018 a
6 0.418374 0.077625 b
7 0.279991 0.692238 b
8 0.960852 0.192070 b
9 0.750886 0.973036 b
10 0.558514 0.742854 b
11 0.898733 0.855712 b
12 0.078307 0.143652 b
13 0.859781 0.869062 b
14 0.110986 0.262581 b
15 0.445537 0.669543 b
16 0.933542 0.471514 b
17 0.320354 0.295965 b
18 0.307094 0.974755 b
19 0.824354 0.553312 b
While computing the column 'genes'
doesn't
>>> table['genes'].compute()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_collection.py", line 475, in compute
out = out.optimize(fuse=fuse)
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_collection.py", line 590, in optimize
return new_collection(self.expr.optimize(fuse=fuse))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 94, in optimize
return optimize(self, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 3032, in optimize
return optimize_until(expr, stage)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 2993, in optimize_until
expr = expr.lower_completely()
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_core.py", line 444, in lower_completely
new = expr.lower_once(lowered)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_core.py", line 399, in lower_once
out = expr._lower()
^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_repartition.py", line 81, in _lower
if self.new_partitions < self.frame.npartitions:
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 398, in npartitions
return len(self.divisions) - 1
^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/functools.py", line 995, in __get__
val = self.func(instance)
^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 382, in divisions
return tuple(self._divisions())
^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 2071, in _divisions
return super()._divisions()
^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 529, in _divisions
if not self._broadcast_dep(arg):
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 520, in _broadcast_dep
return dep.npartitions == 1 and dep.ndim < self.ndim
^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 398, in npartitions
return len(self.divisions) - 1
^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/functools.py", line 995, in __get__
val = self.func(instance)
^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 382, in divisions
return tuple(self._divisions())
^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 3374, in _divisions
if {df.npartitions for df in self.args} == {1}:
^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/functools.py", line 995, in __get__
val = self.func(instance)
^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 3392, in args
return [op for op in dfs if not is_broadcastable(dfs, op)]
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 3050, in is_broadcastable
and any(compare(s, df) for df in dfs if df.ndim == 2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 3050, in <genexpr>
and any(compare(s, df) for df in dfs if df.ndim == 2)
^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 3042, in compare
return s.divisions == (min(df.columns), max(df.columns))
^^^^^^^^^^^^^^^
ValueError: min() iterable argument is empty
from dask.
Note that this works, it's adding a column to an existing Dask dataframe that seems to lead to the issue.
import dask.dataframe as dd
import numpy as np
import pandas as pd
# Generate random data for 'x' and 'genes'
x = np.random.rand(20)
genes = pd.Series(['a'] * 6 + ['b'] * 14, dtype='category')
# Create a Dask DataFrame with 'x' and 'genes' columns in a single call
table = dd.from_pandas(pd.DataFrame({'x': x, 'genes': genes}), npartitions=1)
# Both should work now
table.compute()
table['genes'].compute()
from dask.
Related Issues (20)
- Pandas nightlies broken HOT 1
- Handling errors in Dask distributed
- `imread` relies on `skimage.io.imread` (instead of `imageio.v3.imread`?) HOT 1
- More support for serialization & deserialization. HOT 3
- Dask DataFrame fails to export to parquet when using ProcessPoolExecutor
- Dask and FastAPI can't run at the same process. HOT 2
- Handle np.frombuffer HOT 3
- copy.deepcopy() doesn't work with new dataframe API HOT 1
- Discrepancy Between Native Pandas Types And Dask Computed Pandas Types HOT 2
- Negative lookahead suddenly incorrectly parsed HOT 3
- Removal of Sphinx context injection at build time HOT 1
- Roundtripping timezone-aware DataFrame through parquet doesn't preserve timestamp resolution
- from_pandas fails when given empty dataframe and chunksize HOT 2
- Array slicing is using low level materialized graphs HOT 1
- Pyarrow <NA> filters are not being applied in read_parquet HOT 3
- When Dask merges two large tables, the _merge column disappears. However, it appears normally when merging with a smaller table. HOT 6
- `read_sql_table` no longer sets index name in the resulting ddf meta HOT 1
- Optimizer applies parquet `filters` after loading when using `read_parquet(...).map_partitions(...).compute()` HOT 1
- Create dask array from non-dask npy stack
- Not all chunks are square
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask.