**Current status: fetching pypi packages. **
Packaging is traditionally a tricky business. Python has a long history of packaging solutions that I am going to skip. I'm only going to mention that the latest iteration is a merge of setuptools and distutils.
This latest iteration still uses executable code to specify dependencies. As much as I dislike nodejs for its choice of javascript their packaging solutions is relatively sane. It's a json file, packages.json that's declarative. I.e. no if-statements that influence dependencies.
packages.json
contains one section for runtime-dependencies and one
for development-dependencies. That allows e.g. omitting the test tools
from the released version.
Specifying dependencies is only half the battle though: Almost all packaging systems can set constraints, e.g. "liba > 2.0.1" and "libz = 2.2.2". Installing one package in isolation usually works fine. If you combine all the constraints of all installed packges things tend to get more tricky. Package A requires "liba > 2.0.1" and package b requires "liba > 2.2.0". This one is easy, just pick "liba > 2.2.0". In the most general case this is a satisfiablity problem, so the common solution is to plug all constraints into a SAT solver and hope that it returns a solution before the universe ends.
In some cases there is no solution (e.g. Package depends on "liba == 2.0.1" and package b depends on "liba == 2.2.0"), we're going to skip these for now.
I'm going to move on despite having omitted a whole lot of problems such as optional dependencies or python3.
Nix is a packaging system that defines each package with a purely functional language. A side-effect (hah) of that is that every package depends on all its build inputs. The creators have a better, longer description here.
I'm really interested in this because it allows me to build and test each package with all its dependencies in total isolation. There is no chance that a random system package will sneak in and provide an import that I didn't expect to be there.
Nix also comes with a large repository of existing packages that are not python, e.g. gcc, or libblas. A lot of python packages are actually bindings to c libraries. It's useful to be able to rely on nix's existing packages.
There is a large number of packages that don't specify their imports correctly. There is no way of knowing how many without explicitly checking. Some low-N sampling makes me think at least half.
This project plans to build a nix expression for all versions of all python packages. There are some easy and some hard parts, but there are parts, i.e. we can divide and conquer. Here's the plan.
- Get a copy of pypi's metadata (done, ~1d)
- Build a large nix-derivation with all packages (tdb).
- Fetch all packages and extract their metadata (20%).
- Extract dependencies where provided (e.g. from
.egg/requires.txt
). - Write a second, hand-curated datastore of dependencies that maps (pypi-name, version) -> build-dependencies.
- Write "smoke tests" that test some of the most basic imports etc. These wouldn't replace full unit testing, but they are a good indicator for whether the project has a chance to run.
- Build everything / update store accordingly.
- Extend nix to provide a sat-solver.
I would of course love to merge all that dependency data back into the actual python packages but I have no community involvement. Maybe I can provide the database?