Comments (3)
Thx for the report and the digging in. I'll put up a fix
from dask.
Computing the dataframe works:
>>> table.compute()
x y genes
0 0.254246 0.977669 a
1 0.776803 0.776138 a
2 0.945980 0.821877 a
3 0.344795 0.886264 a
4 0.825084 0.915373 a
5 0.254027 0.482018 a
6 0.418374 0.077625 b
7 0.279991 0.692238 b
8 0.960852 0.192070 b
9 0.750886 0.973036 b
10 0.558514 0.742854 b
11 0.898733 0.855712 b
12 0.078307 0.143652 b
13 0.859781 0.869062 b
14 0.110986 0.262581 b
15 0.445537 0.669543 b
16 0.933542 0.471514 b
17 0.320354 0.295965 b
18 0.307094 0.974755 b
19 0.824354 0.553312 b
While computing the column 'genes'
doesn't
>>> table['genes'].compute()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_collection.py", line 475, in compute
out = out.optimize(fuse=fuse)
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_collection.py", line 590, in optimize
return new_collection(self.expr.optimize(fuse=fuse))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 94, in optimize
return optimize(self, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 3032, in optimize
return optimize_until(expr, stage)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 2993, in optimize_until
expr = expr.lower_completely()
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_core.py", line 444, in lower_completely
new = expr.lower_once(lowered)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_core.py", line 399, in lower_once
out = expr._lower()
^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_repartition.py", line 81, in _lower
if self.new_partitions < self.frame.npartitions:
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 398, in npartitions
return len(self.divisions) - 1
^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/functools.py", line 995, in __get__
val = self.func(instance)
^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 382, in divisions
return tuple(self._divisions())
^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 2071, in _divisions
return super()._divisions()
^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 529, in _divisions
if not self._broadcast_dep(arg):
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 520, in _broadcast_dep
return dep.npartitions == 1 and dep.ndim < self.ndim
^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 398, in npartitions
return len(self.divisions) - 1
^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/functools.py", line 995, in __get__
val = self.func(instance)
^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 382, in divisions
return tuple(self._divisions())
^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 3374, in _divisions
if {df.npartitions for df in self.args} == {1}:
^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/functools.py", line 995, in __get__
val = self.func(instance)
^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 3392, in args
return [op for op in dfs if not is_broadcastable(dfs, op)]
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 3050, in is_broadcastable
and any(compare(s, df) for df in dfs if df.ndim == 2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 3050, in <genexpr>
and any(compare(s, df) for df in dfs if df.ndim == 2)
^^^^^^^^^^^^^^
File "/opt/miniconda3/envs/ome/lib/python3.12/site-packages/dask_expr/_expr.py", line 3042, in compare
return s.divisions == (min(df.columns), max(df.columns))
^^^^^^^^^^^^^^^
ValueError: min() iterable argument is empty
from dask.
Note that this works, it's adding a column to an existing Dask dataframe that seems to lead to the issue.
import dask.dataframe as dd
import numpy as np
import pandas as pd
# Generate random data for 'x' and 'genes'
x = np.random.rand(20)
genes = pd.Series(['a'] * 6 + ['b'] * 14, dtype='category')
# Create a Dask DataFrame with 'x' and 'genes' columns in a single call
table = dd.from_pandas(pd.DataFrame({'x': x, 'genes': genes}), npartitions=1)
# Both should work now
table.compute()
table['genes'].compute()
from dask.
Related Issues (20)
- 'SeriesGroupBy' object has no attribute 'nunique_approx' HOT 6
- Categorical column information incorrectly copied over when using series to create new dataframe resulting in a broken dataframe
- calling repartition on ddf with timeseries index after resample causes ValueError: left side of old and new divisions are different
- Can not process datasets created by the older version of Dask HOT 9
- P2P rechunking of ERA-5 from spatial to temporal dimension is failing hard HOT 15
- Improve documentation for `dd.from_map(...)` HOT 1
- AssertionError: DataFrame are different with dask 2024.5.1 and python 3.12 HOT 3
- `test_quantile` flaky
- Shuffle not raising exception when `on` does not exist HOT 1
- [FEA] Add official mechanism to check if query-planning is enabled in ``dask.dataframe`` HOT 3
- UnboundLocalError in test_dt_accessor when dd._dask_expr_enabled is False HOT 2
- Error with the default tokenizer. HOT 4
- Cannot bind async delayed
- Most tests in `test_parquet.py` fail on s390x (big-endian) HOT 4
- cumsum/cumprod issue with empty partitions HOT 2
- Bug in map_blocks when iterating over multiple arrays
- Error when processing JSONL with excluded null values using dataframe.read_json HOT 1
- Turn `fail_on_warning` in docs back on again
- 404 Not Found for "Dask distributed IPython docs" HOT 3
- Backend dispatch fails to re-raise BotoCoreError HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask.