Git Product home page Git Product logo

Comments (7)

gerrymanoim avatar gerrymanoim commented on August 11, 2024

So you raise a good point - I don't necessarily care about 1.1 specifically being being the minimum version (its been a few years, we can move it 1.3?), but I do think it is worth having some minimum version being compatible with the library. The situation I worry about is we decide to use some newer pandas feature without really thinking about it, forcing anyone to immediately upgrade to pandas 2.x, which is a pretty big lift.

I agree with you that we don't do this for other libraries (though arguably we should for numpy), but the few other libraries we directly depend on are probably much less painful to upgrade than doing a full pandas version bump. Unfortunately I can't find any pandas equivalent of https://numpy.org/neps/nep-0029-deprecation_policy.html.

Thoughts?

from exchange_calendars.

maread99 avatar maread99 commented on August 11, 2024

I agree the ideal is to be able to declare the minimum supported version of major dependencies.

The tests were failing as pandas 1.1 wouldn't build (here).
I bumped the minimum to 1.3 and the same thing happened (here).
Bumped it to 1.4 and it built on all platforms (here), although took an age - about 17 minutes on macos! (Worth noting this isn't an issue with the later versions - the latest one builds fine, seems to be something to do with the later v1 releases).

@gerrymanoim, If your happy with pandas>=1.4 I'll leave it there...?

from exchange_calendars.

gerrymanoim avatar gerrymanoim commented on August 11, 2024

Yep - fine with that.

Though clicking through those 1.3 seems to be a numpy error? https://github.com/gerrymanoim/exchange_calendars/actions/runs/6017394262/job/16323473168. It occurs while building pandas but the issue seems to be in actually building numpy. The version of numpy there is also ancient Downloading numpy-1.19.3.zip (7.3 MB). Looking at 1.1 again, seems to be the same case Collecting numpy==1.17.3. I'm not surprised that these numpys don't build with 3.11.

Per the DEP policy, looks like we should at minimum be using 1.22. Maybe we just need to specify that so pandas doesn't try with the min version?

from exchange_calendars.

maread99 avatar maread99 commented on August 11, 2024

Good spot on the lower numpy version.

It's nothing to do with any min numpy that we're specifying. The tests on Py 3.11 run using the dependencies defined in requirements_minpandas.txt (here) which has numpy pinned to 1.25.2. The workflow logs show that 1.25.2 is the version being collected. However, within the pandas build pandas then collects its own version of numpy as the minimum supported numpy version for the pandas version being built. So, building pandas 1.3 using Python 3.11 will use numpy 1.19.3, as determined by this pyproject.toml file for pandas 1.3.

So, to work out which minimum pandas version we should be specifying we have to work backwards from numpy...

  • The minimum numpy version for python 3.11 is 1.23.2 as defined here in the oldest-supported-numpy package (the package which pandas itself now uses in its builds to define its numpy dependency).
  • However, pandas 1.4 requires numpy <=1.22.4 (as defined on the 1.4 dev branch here). Although 1.4 is building for us, this explains the excessive build time and suggests we should go with a higher minimum pandas version.
  • Pandas 1.5 has (currently) no such restrictions, suggesting that our minimum required pandas version should be 1.5.

@gerrymanoim - let me know if you have any objections to setting it to 1.5.

(The following has ended up as a be of a 'note to self' on managing dependencies)...

I think this all kind of serves as an example why there's a reasonable argument for us to not define minimum versions of dependencies. Defining minimum versions is useful to advise clients that 'this wont work on versions <x', although the minimum version is often interpreted as it's defined, i.e. 'this will work on versions >=x'. But there can never be any guarantee that the package will work with any combination of dependencies except those as defined in the requirements file. Keeping track of what works with what is a rabbit-hole, and every minimum version will eventually start failing, often without obvious cause.

Whilst I think the ideal IS to define a minimum version of significant dependencies, in the FOSS world I don't think the value it offers clients is worth the effort required of maintainers to stay on top of it - it's a losing battle. Ideal but not practical.

Where my thinking is on this:

  • the limited time we have should be spent on keeping the package up to date with the latest versions of dependencies (rather than considering support for older versions).
  • requirements files should always accompany releases and these form the contract. We guarantee it works with this combination of latest versions (at least on the Python versions and the OSs that its tested over).
  • It's reasonable to require clients to maintain their code if they wish to use the latest version.

Interested to hear what anyone else thinks...?

@gerrymanoim, we're still in time to remove the min version for pandas if I've convinced you 😀

from exchange_calendars.

gerrymanoim avatar gerrymanoim commented on August 11, 2024

I think I still lean towards supporting some non 2.0 pandas version as an enforced testing minimum (for now), given that 1.5.3 was released in january and 2.0 was released in April. Perfectly happy for that to be 1.5 if that makes our lives easier. I'm happy to drop this guarantee in the future.

I even agree with all your thinking around FOSS. I think where I'm coming from is:

  1. Pandas is our main dependency
  2. Pandas 2.x is fairly new
  3. Pandas 1.x -> 2.x is a fairly big upgrade. Even minor versions can cause annoying to fix breaking changes for a large enough quantitative codebase. This means that people upgrade slowly/rarely.
  4. There's nothing in exchange_calendars that necessitates we use pandas 2.x functionality (or at least I haven't seen it).
  5. I want to avoid a scenario where we accidentally use a pandas 2.x only function or behavior such that clients are forced to upgrade without warning if they want any more calendar updates (or we then have to release a new minor version). I'm okay forcing this update in the future (similar to how we drop python versions from time to time), but think it is too soon to say "you must use latest pandas".

I'd even be okay where we pin min pandas and min numpy to some compatible version where we don't build from source, but I care much more about supporting all versions of python, so if py311 requires numpy 1.23.2, that kind of forces us to use at least 1.5 (I don't really have interest in maintaining python version specific deps).

fwiw here are the pandas by version download stats from pypi

num_downloads	version
139179616	1.3.5
109924039	1.5.3
83627259	1.1.5
60940600	2.0.3
40190853	1.2.5
29904745	1.0.5
27627850	2.0.1
27423437	2.0.2
26081110	1.3.4
23006841	0.24.2

from exchange_calendars.

maread99 avatar maread99 commented on August 11, 2024

👍

I've changed the min pandas to 1.5 in #323. Tests all passed with build times the same for the latest pandas as for 1.5.

(I hope to add a commit next week to move from pytz to zoneinfo, then I'll release it all as 4.3. Cheers.)

from exchange_calendars.

maread99 avatar maread99 commented on August 11, 2024

From 4.3 minimum pandas version is 1.5 (implemented in #323).

from exchange_calendars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.