Git Product home page Git Product logo

fullpypi's Introduction

Building a full nix inventory of pypi (very WIP)

**Current status: fetching pypi packages. **

Packaging is traditionally a tricky business. Python has a long history of packaging solutions that I am going to skip. I'm only going to mention that the latest iteration is a merge of setuptools and distutils.

This latest iteration still uses executable code to specify dependencies. As much as I dislike nodejs for its choice of javascript their packaging solutions is relatively sane. It's a json file, packages.json that's declarative. I.e. no if-statements that influence dependencies.

packages.json contains one section for runtime-dependencies and one for development-dependencies. That allows e.g. omitting the test tools from the released version.

Specifying dependencies is only half the battle though: Almost all packaging systems can set constraints, e.g. "liba > 2.0.1" and "libz = 2.2.2". Installing one package in isolation usually works fine. If you combine all the constraints of all installed packges things tend to get more tricky. Package A requires "liba > 2.0.1" and package b requires "liba > 2.2.0". This one is easy, just pick "liba > 2.2.0". In the most general case this is a satisfiablity problem, so the common solution is to plug all constraints into a SAT solver and hope that it returns a solution before the universe ends.

In some cases there is no solution (e.g. Package depends on "liba == 2.0.1" and package b depends on "liba == 2.2.0"), we're going to skip these for now.

I'm going to move on despite having omitted a whole lot of problems such as optional dependencies or python3.

Enter Nix

Nix is a packaging system that defines each package with a purely functional language. A side-effect (hah) of that is that every package depends on all its build inputs. The creators have a better, longer description here.

I'm really interested in this because it allows me to build and test each package with all its dependencies in total isolation. There is no chance that a random system package will sneak in and provide an import that I didn't expect to be there.

Nix also comes with a large repository of existing packages that are not python, e.g. gcc, or libblas. A lot of python packages are actually bindings to c libraries. It's useful to be able to rely on nix's existing packages.

The sorry state of pypi

There is a large number of packages that don't specify their imports correctly. There is no way of knowing how many without explicitly checking. Some low-N sampling makes me think at least half.

The plan

This project plans to build a nix expression for all versions of all python packages. There are some easy and some hard parts, but there are parts, i.e. we can divide and conquer. Here's the plan.

  1. Get a copy of pypi's metadata (done, ~1d)
  2. Build a large nix-derivation with all packages (tdb).
  3. Fetch all packages and extract their metadata (20%).
  4. Extract dependencies where provided (e.g. from .egg/requires.txt).
  5. Write a second, hand-curated datastore of dependencies that maps (pypi-name, version) -> build-dependencies.
  6. Write "smoke tests" that test some of the most basic imports etc. These wouldn't replace full unit testing, but they are a good indicator for whether the project has a chance to run.
  7. Build everything / update store accordingly.
  8. Extend nix to provide a sat-solver.

And then happily ever after

I would of course love to merge all that dependency data back into the actual python packages but I have no community involvement. Maybe I can provide the database?

fullpypi's People

Contributors

teh avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

jgarte

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.