Comments (2)
Correct, each reader only needs the schema and the one row-group - a ParquetFile where row_groups is replaced with [row_group], i.e., just the one of interest, would be enough to specify everything.
In fact, read_row_group_file is passed the row_group anyway, and doesn't access that list at all, just a few small attributes: opener function, filesystem separator, set of partitions, schema/helper
from fastparquet.
In dask.dataframe.io.parquet before L68 (line with delayed) del pf.row_groups
might solve it very easily.
from fastparquet.
Related Issues (20)
- Incorrect value returned for overflow timestamps in micros format for V1 footers HOT 4
- row_filter=True fails if parquet file wasn't made by fastparquet HOT 1
- ValueError: Seek before start of file using custom open_with function or S3 file object HOT 3
- pandas nullable opt-out not working as intented HOT 3
- API Documentation Page Is Empty HOT 1
- Potential Parquet File Metadata Corruption After Process Timeout HOT 7
- Fastparquet not outputting DATE logical type while using to_parquet of pandas HOT 4
- fastparquet encoding issue. HOT 20
- BUG: reading boolean column with RLE encoding gives wrong values HOT 4
- fastparquet cannot read a categorical column that contains NaNs only HOT 2
- to_pandas(): cramjam.DecompressionError: snappy: output buffer (size = 262144) is smaller than required (size = 1048576) HOT 1
- BUG: dataframe.empty with non-nano pd.DatetimeTZDtype HOT 2
- a python-3.12 windows wheel HOT 13
- Some `fastparquet`-related tests are failing on Python 3.10 HOT 10
- Regression due to `_from_sequence` HOT 1
- attrs persistance for Pandas HOT 1
- Nullable types for 1 row vs multiple rows HOT 3
- update_file_custom_metadata error when file has no properties.
- schema evolution when writing the row groups does not work HOT 4
- Bug loading parquet files with timezone information HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastparquet.