Comments (14)
There are a number of changes required for py2 compatibility. If this is important to you, it can be the highest priority task.
from fastparquet.
@jbednar do you need Python 2 support? We were hoping to start avoiding maintaining this by default unless someone showed up willing to pay the bill for it. Maintaining Python2/3 support for data formats can be somewhat costly due to Python 2's management of bytes/str.
from fastparquet.
I've had a look, and I can get some tests working for py2 in my branch https://github.com/martindurant/fastparquet/tree/py2 , but probably covering all the cases where strings crop up would be annoying.
from fastparquet.
I do have a client project where we are using python2, but we could conceivably migrate it. At the moment, I just need to benchmark the read times under different conditions, and if I'm going to make a fair comparison it's hard to be sure about the results if some of the tests have to use Python2 and some Python3. Currently my castra tests have to use Python2 because the data file isn't portable to Python3, but again I could convert that by exporting to something portable and recreating it using Python3. So I think for all my own use cases, I can work around it; it's just a pain rather than a showstopper. But the most important is that I would guess that it might give people pause if they want to store their data in Parquet format, because they would worry that people using Python2 couldn't use it.
from fastparquet.
In any case, pip should presumably not allow installation on Python2 if it's not supported, and the name of the wheel currently suggests that Python2 is supported.
from fastparquet.
The performance limiting pieces here are not strongly Python version specific. You'll be much more bound by, say, LLVM version than Python version. I think that you can probably make the comparison relatively faithfully even if crossing versions.
If you find that Parquet is a good solution then perhaps the client project could pay for some of the work to port fastparquet
to Python 2?
from fastparquet.
I'm old, and I don't care how to count how many times I've thought (or my students or collaborators have thought) that they had identified performance improvements or reductions, when in fact they were just being confused by seemingly irrelevant differences in setup that later turned out to be relevant. :-) So I've learned not to make any claims about performance if I'm not sure the two cases really are comparable, which is quite difficult to do between Python2 and Python3 (who knows which library might come into play for each one?). So while I too guess that Py2/3 doesn't matter in this case, I will always prefer to control it instead.
from fastparquet.
As you like. I just don't think that Python 2 support is as high a priority as some other things. I would prefer to see us build out a well fleshed out Python 3 solution and then think about Python 2 support, ideally with the financial backing of someone willing to pay for it.
from fastparquet.
Just my preference though. I don't have a good cost metric on what it will take to migrate and maintain. My guess is that it is non-trivial.
from fastparquet.
Sure.
from fastparquet.
@martindurant @mrocklin , Is fastparquet compatible with python 2.7.12 at this time or you guys are concentrating on Python 3 ? I require it for reading parquet files using dask dataframes. The error which I get when I import fastparquet
is ImportError: cannot import name encoding
from fastparquet.
@freshforlife , PR #66 seems to work well, just needs a little cleanup before merging. Feel free to use and comment.
from fastparquet.
@martindurant Thanks !! Where do I download it from ? pip install seems to work and this version is installed: fastparquet==0.0.4.post1 , however it gives the above error.
from fastparquet.
To use that, you would need to clone this repository, checkout the relevant branch and python-install. Pip should be able to install directly from that git branch if you can get the correct syntax. I will be releasing a new version of fastparquet shortly after the PR is merged.
from fastparquet.
Related Issues (20)
- Optional Ints can cause parser errors HOT 14
- Make _metadata optional on writing HOT 29
- Expected unicode, got quoted_name HOT 8
- Error message unclear in case of an inconsistent `append` HOT 4
- Partition paths not parsed correctly with leading `.` and filters HOT 3
- Schema evolution
- `test_delta_from_def_2` fails on aarch64, armv7 and ppc64le HOT 6
- OverflowError: value too large to convert to int - fastparquet.cencoding.write_thrift HOT 8
- Parquet files can't exceed 2.14GB? Write throws overflow errors when filesize in bytes exceeds int32 limit... HOT 4
- Footer metadata thrift num_rows not being properly updated if append is used to construct parquet file HOT 2
- update_file_custom_metadata(..., is_metadata_file=False): UnboundLocalError: local variable 'is_metadata' referenced before assignment HOT 2
- MultiIndex roundtrip loses timezone HOT 4
- Can't set `Categorical._codes` in `pandas=2.0` HOT 6
- KeyError: 'dir0' in partitioned dataset HOT 3
- Fails to rountrip non-`ns` `datetime64` with `pandas` 2.0 HOT 3
- OverflowError with a 3GB, 11M-line JSONL file HOT 6
- Speed up Parquet Writing? HOT 7
- fastparquet 2023.1.0 may fail to dump dataframes with nested objects HOT 2
- Reading a Parquet file produced by pyarrow results to corrupted data read HOT 5
- BUG single list of filters does not appear to AND properly HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastparquet.