Git Product home page Git Product logo

wheelodex's Introduction

Project Status: Active — The project has reached a stable, usable
state and is being actively developed. CI Status coverage MIT License

Site | GitHub | Issues | Changelog

Packaged projects for the Python programming language are distributed in two main formats: sdists (archives of code and other files that require processing before they can be installed) and wheels (zipfiles of code ready for immediate installation). A project's wheel contains the complete information about what modules, files, & commands the project installs, along with information about what other projects the project depends on, but the Python Package Index (PyPI) (where wheels are distributed) doesn't expose any of this information! This is the problem that Wheelodex is here to solve.

Wheelodex scans PyPI for wheel files, analyzes them, and stores & displays the results. The site allows users to view the complete metadata inside wheels, search for wheels containing a given Python module or file, browse or search for wheels that define a given command or other entry point, and even find out projects' reverse dependencies.

Note that, in order to save disk space, Wheelodex only records data on wheels from the latest version of each PyPI project; wheels from older versions are periodically purged from the database. Projects' long descriptions aren't even recorded at all.

Suggestions and pull requests are welcome.

wheelodex's People

Contributors

dependabot[bot] avatar jwodder avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

pombredanne

wheelodex's Issues

Add a command for dumping information about wheel errors from the database

Organize the output by processing errors vs. wheel-inspect validation errors and their respective types.

Base the command on the following scripts I've been using for ad-hoc error dumping:

errors2dir.py
from pathlib import Path
import sqlalchemy as sa
from sqlalchemy.orm import Session
from wheelodex.models import Wheel

suberrors = {
    "no unique *.dist-info/ directory in wheel": "no-dist-info",
    "headerparser.errors.": "header-error",
    "Invalid name or filename": "invalid-name",
    "Invalid wheel filename": "invalid-name",
}

DIR = Path(__file__).with_name("errors")
DIR.mkdir(exist_ok=True)
for k, v in suberrors.items():
    d = DIR / v
    d.mkdir(exist_ok=True)
    suberrors[k] = d

engine = sa.create_engine("---URL REDACTED---")
session = Session(engine)

for whl in session.scalars(sa.select(Wheel).filter(Wheel.errors.any())):
    for e in whl.errors:
        for k, d in suberrors.items():
            if k in e.errmsg:
                edir = d
                break
        else:
            edir = DIR
        with (edir / f"whl{whl.id}-{e.id}.txt").open("w") as fp:
            print("Filename:", whl.filename, file=fp)
            print("URL:", whl.url, file=fp)
            print("Uploaded:", whl.uploaded, file=fp)
            print("Wheelodex-Version:", e.wheelodex_version, file=fp)
            print(file=fp)
            print(e.errmsg, file=fp)
invalids.py
from pathlib import Path
import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker
from wheelodex.models import WheelData

DIR = Path(__file__).with_name("invalid")
DIR.mkdir(exist_ok=True)

engine = sa.create_engine("---URL REDACTED---")
session = Session(engine)

for data in session.scalars(sa.select(WheelData).filter(WheelData.valid == False)):
    whl = data.wheel
    vdir = DIR / data.raw_data["validation_error"]["type"]
    vdir.mkdir(exist_ok=True)
    with open(str(vdir / f"whl{whl.id}.txt"), "w") as fp:
        print("Filename:", whl.filename, file=fp)
        print("URL:", whl.url, file=fp)
        print("Uploaded:", whl.uploaded, file=fp)
        print("Processed:", data.processed, file=fp)
        print("Size:", whl.size, file=fp)
        print("Wheel-Inspect-Version:", data.wheel_inspect_version, file=fp)
        print("Error-Type:", data.raw_data["validation_error"]["type"], file=fp)
        print("Message:", data.raw_data["validation_error"]["str"], file=fp)

Add "wheel" icon/logo

I'm thinking a blue tire-like wheel viewed from the side and tilted up slightly.

The icon should also be used as the picture for the wheelodex GitHub organization.

Add more tests

  • Tests for each command (process-orphan-wheels in particular)
  • More thorough tests for views

Restart nginx and/or uwsgi whenever PostgreSQL is updated

Currently, the deployment pins the PostgreSQL package to prevent unattended security updates, as updating PostgreSQL causes it to restart, breaking Wheelodex's database connection. This is obviously sub-optimal.

Possible resolution: Configure systemd to restart nginx and/or uwsgi whenever PostgreSQL is restarted. In addition, configure unattended upgrades to not run while Wheelodex jobs are running (Use systemd's Conflicts field?).

  • It seems the only way to prevent two systemd timer services from running at the same time without causing one of them to fail is to use the flock command.
    • Problem: unattended-upgrades is run as root, and the wheelodex jobs are run as the wheelodex user, so there will likely be permission errors if they both have the same lockfile.

Look into other possible resolutions, as well.

Fill in descriptions for entry point groups

Wheelodex has a list of every entry point group defined by a wheel, each one linking to a list of the entry points defined for it. In order to make things less bland and to give people more information on what they're looking at, individual groups can have summaries displayed next to them in the groups list and descriptions displayed at the tops of their entry points list (example). However, this requires someone to write out summaries & descriptions for the entry point groups first, and that's where I could use some help.

If there's an entry point group you're familiar with that's lacking a description, you can add a description to it by creating a pull request modifying the wheelodex/data/entry_points.ini file. Add a section with the same name as the group (keeping the sections in lexicographic order), and give it summary and description fields whose values are CommonMark Markdown describing the group. The description should include what projects consume the entry point group, a brief idea of what defining an entry point in the group accomplishes, and (if it exists) a link to the consuming project's documentation on using the entry point group. See entry_points.ini for examples.

Missing Documentation on how to build & execute

Really like what you built. Amazing idea.
Though it's really hard to be able to help if you don't provide a simple way to start running the service on a local machine / server.

If I may, a docker compose recipe would probably be a good start :)

Honor packaging yanking

Naïvely, if a release or asset is yanked, it should be deleted from the database — but what if the latest release (or all its wheels) is yanked, the previous release has already been purged, and the project doesn't make another release for some time? (This is similar to #32.)

Update `wheel_sort_key()` for more modern wheel tags

The wheels provided by pydantic-core should be a good source of modern tags.

Incomplete list of new tag elements to sort:

  • Platforms:

    • macosx_11_0_{arch}
    • manylinux_2_5_{arch}
    • musllinux*
  • Architectures:

    • aarch64
    • arm64
    • ppc64le
    • s390x

Distinguish extra-only project dependencies

Project dependencies that are only required for extras should be distinguished from non-extra dependencies somehow.

Ideas:

  • Make "Most Depended-On Projects" display two leaderboards, one taking extras into account and one not.

  • In addition to projects' "Reverse Dependencies" counts & lists, add a "Reverse Dependencies (No Extras)" (Working title) count and list

What data do people want to see?

Wheels have a bunch of data in them, but Wheelodex isn't quite taking advantage of all of it — yet. What information do people want to be able to search, browse, or click on?

  • Browsing by keywords and/or licenses, similar to PyDigger?
  • Browsing by classifiers? (or does PyPI already cover that well enough?)
  • Browsing Platform fields?
  • Project-URL labels?
  • Statistics on wheel Generator fields?
  • Searching by namespace packages?
  • Listing which projects have the most reverse dependencies?
  • Other metadata?

I'll probably get around to implementing most of these eventually, but I'd like some feedback on what people want first.

Make Wheel's `md5` and `sha256` fields no longer nullable

It seems that PyPI's JSON API now always fills in the relevant digests, and there are no wheels in the database with missing digests. The nullability thus no longer serves a purpose.

Be sure to change all md5 and sha256 fields & arguments in the code to non-None.

Also remove the display of null digests as "[Unknown]" in the wheel_data template.

Try to speed up file searches with an index

Try to speed up file search queries with:

CREATE EXTENSION pg_trgm;
    -- ^^ Must be run inside the database by a superuser
CREATE INDEX files_path_idx ON files USING GIN (path gin_trgm_ops);

Do likewise for other columns queried with LIKE/ILIKE?

Note: The SQLAlchemy equivalent of the CREATE INDEX statement appears to be:

sa.Index(
    "files_path_idx",
    File.path,
    postgresql_using="gin",
    postgresql_opts={"path": "gin_trgm_ops"},
)

Monitor slow PostgreSQL queries

PostgreSQL is currently configured to log queries that take more than one second, but I'm not paying attention to the logs. The logs should be shipped to an ELK stack, Papertrail.com, Data Dog, or whatever hip sysadmins are using these days.

Support browsing classifiers

  • In the METADATA displays, make each classifier into a hyperlink to a page listing all projects with that classifier.

  • Add a page listing all classifiers (linking to pages listing matching projects), alongside matching project counts.

Add footer to pages

Show the current Wheelodex version (and a link to Wheelodex's GitHub repo?) at the bottom of every page.

Support searching/filtering entry point groups

The "Browse Entry Point Groups" page should gain a search box for filtering down to just groups whose names match a given glob pattern. If only one group matches the given pattern, redirect to that group's page.

Make things look good

Wheelodex is, at time of launch, severely lacking in the aesthetics department. Some sort of styling by someone with a vague grasp of web design and UX could make quite the difference.

Some specific areas in need of beautification include:

  • The list of wheels for a project can fill almost the entire page (example); some sort of collapsible display would be nice.

  • Header field names in METADATA and WHEEL files often get wrapped at hyphens; might want to keep that from happening

  • File paths in RECORD files often get wrapped at slashes; might want to keep that from happening

  • The search boxes on the main page should really be aligned. Not sure how to do that ...

  • Putting all the page content in a 500pt-wide box is probably a stupid thing to do

  • The "recently analyzed wheels" table gets stretched out too much by wheels with long names (e.g., just about anything for Mac OS X)

Make it easier to view reverse dependencies of projects without wheels

Because projects without wheels are excluded from search and the "Browse Projects" list, it can be difficult to reach their pages in order to view their reverse dependencies (which is the only thing their pages offer other than a link to PyPI). At the moment, the only way to get to such a page is via either URL manipulation or by clicking on a link from a project that depends on them or in the reverse dependencies list of a project that they depend on.

Idea: Add a checkbox next to the project search input for also searching projects without wheels.

Add a command for purging "loose" projects

The command should delete any projects that don't have any versions and that aren't dependencies of anything.

The command should be run on a schedule, but it'd be fine if it only ran, say, once a week.

Support browsing & searching keywords

  • In the METADATA displays, make each keyword into a hyperlink to a page listing all projects with that keyword.

  • Add a "Browse Keywords" page listing keywords and their counts, with a search box for filtering by a given glob pattern.

  • See https://stackoverflow.com/q/18228994/744178 for how to keep a table of keywords and their counts up to date.

  • Should the database normalize all keywords to lower case?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.