pypa / pip-audit Goto Github PK

Audits Python environments, requirements files and dependency trees for known security vulnerabilities, and can automatically fix them

Home Page: https://pypi.org/project/pip-audit/

License: Apache License 2.0

Makefile 1.06% Python 98.94%

security security-audit python pip supply-chain

pip-audit's Introduction

pip-audit

pip-audit is a tool for scanning Python environments for packages with known vulnerabilities. It uses the Python Packaging Advisory Database (https://github.com/pypa/advisory-database) via the PyPI JSON API as a source of vulnerability reports.

This project is maintained in part by Trail of Bits with support from Google. This is not an official Google or Trail of Bits product.

Features

Support for auditing local environments and requirements-style files
Support for multiple vulnerability services (PyPI, OSV)
Support for emitting SBOMs in CycloneDX XML or JSON
Support for automatically fixing vulnerable dependencies (--fix)
Human and machine-readable output formats (columnar, Markdown, JSON)
Seamlessly reuses your existing local pip caches

Installation

pip-audit requires Python 3.8 or newer, and can be installed directly via pip:

python -m pip install pip-audit

Third-party packages

There are multiple third-party packages for pip-audit. The matrices and badges below list some of them:

In particular, pip-audit can be installed via conda:

conda install -c conda-forge pip-audit

Third-party packages are not directly supported by this project. Please consult your package manager's documentation for more detailed installation guidance.

GitHub Actions

pip-audit has an official GitHub Action!

You can install it from the GitHub Marketplace, or add it to your CI manually:

jobs:
  pip-audit:
    steps:
      - uses: pypa/[email protected]
        with:
          inputs: requirements.txt

See the action documentation for more details and usage examples.

`pre-commit` support

pip-audit has pre-commit support.

For example, using pip-audit via pre-commit to audit a requirements file:

  - repo: https://github.com/pypa/pip-audit
    rev: v2.7.3
    hooks:
      -   id: pip-audit
          args: ["-r", "requirements.txt"]

ci:
  # Leave pip-audit to only run locally and not in CI
  # pre-commit.ci does not allow network calls
  skip: [pip-audit]

Any pip-audit arguments documented below can be passed.

Usage

You can run pip-audit as a standalone program, or via python -m:

pip-audit --help
python -m pip_audit --help

usage: pip-audit [-h] [-V] [-l] [-r REQUIREMENT] [-f FORMAT] [-s SERVICE] [-d]
                 [-S] [--desc [{on,off,auto}]] [--aliases [{on,off,auto}]]
                 [--cache-dir CACHE_DIR] [--progress-spinner {on,off}]
                 [--timeout TIMEOUT] [--path PATH] [-v] [--fix]
                 [--require-hashes] [--index-url INDEX_URL]
                 [--extra-index-url URL] [--skip-editable] [--no-deps]
                 [-o FILE] [--ignore-vuln ID] [--disable-pip]
                 [project_path]

audit the Python environment for dependencies with known vulnerabilities

positional arguments:
  project_path          audit a local Python project at the given path
                        (default: None)

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -l, --local           show only results for dependencies in the local
                        environment (default: False)
  -r REQUIREMENT, --requirement REQUIREMENT
                        audit the given requirements file; this option can be
                        used multiple times (default: None)
  -f FORMAT, --format FORMAT
                        the format to emit audit results in (choices: columns,
                        json, cyclonedx-json, cyclonedx-xml, markdown)
                        (default: columns)
  -s SERVICE, --vulnerability-service SERVICE
                        the vulnerability service to audit dependencies
                        against (choices: osv, pypi) (default: pypi)
  -d, --dry-run         without `--fix`: collect all dependencies but do not
                        perform the auditing step; with `--fix`: perform the
                        auditing step but do not perform any fixes (default:
                        False)
  -S, --strict          fail the entire audit if dependency collection fails
                        on any dependency (default: False)
  --desc [{on,off,auto}]
                        include a description for each vulnerability; `auto`
                        defaults to `on` for the `json` format. This flag has
                        no effect on the `cyclonedx-json` or `cyclonedx-xml`
                        formats. (default: auto)
  --aliases [{on,off,auto}]
                        includes alias IDs for each vulnerability; `auto`
                        defaults to `on` for the `json` format. This flag has
                        no effect on the `cyclonedx-json` or `cyclonedx-xml`
                        formats. (default: auto)
  --cache-dir CACHE_DIR
                        the directory to use as an HTTP cache for PyPI; uses
                        the `pip` HTTP cache by default (default: None)
  --progress-spinner {on,off}
                        display a progress spinner (default: on)
  --timeout TIMEOUT     set the socket timeout (default: 15)
  --path PATH           restrict to the specified installation path for
                        auditing packages; this option can be used multiple
                        times (default: [])
  -v, --verbose         run with additional debug logging; supply multiple
                        times to increase verbosity (default: 0)
  --fix                 automatically upgrade dependencies with known
                        vulnerabilities (default: False)
  --require-hashes      require a hash to check each requirement against, for
                        repeatable audits; this option is implied when any
                        package in a requirements file has a `--hash` option.
                        (default: False)
  --index-url INDEX_URL
                        base URL of the Python Package Index; this should
                        point to a repository compliant with PEP 503 (the
                        simple repository API); this will be resolved by pip
                        if not specified (default: None)
  --extra-index-url URL
                        extra URLs of package indexes to use in addition to
                        `--index-url`; should follow the same rules as
                        `--index-url` (default: [])
  --skip-editable       don't audit packages that are marked as editable
                        (default: False)
  --no-deps             don't perform any dependency resolution; requires all
                        requirements are pinned to an exact version (default:
                        False)
  -o FILE, --output FILE
                        output results to the given file (default: stdout)
  --ignore-vuln ID      ignore a specific vulnerability by its vulnerability
                        ID; this option can be used multiple times (default:
                        [])
  --disable-pip         don't use `pip` for dependency resolution; this can
                        only be used with hashed requirements files or if the
                        `--no-deps` flag has been provided (default: False)

Environment variables

pip-audit allows users to configure some flags via environment variables instead:

Flag	Environment equivalent	Example
`--format`	`PIP_AUDIT_FORMAT`	`PIP_AUDIT_FORMAT=markdown`
`--vulnerability-service`	`PIP_AUDIT_VULNERABILITY_SERVICE`	`PIP_AUDIT_VULNERABILITY_SERVICE=osv`
`--desc`	`PIP_AUDIT_DESC`	`PIP_AUDIT_DESC=off`
`--progress-spinner`	`PIP_AUDIT_PROGRESS_SPINNER`	`PIP_AUDIT_PROGRESS_SPINNER=off`
`--output`	`PIP_AUDIT_OUTPUT`	`PIP_AUDIT_OUTPUT=/tmp/example`

Exit codes

On completion, pip-audit will exit with a code indicating its status.

The current codes are:

0: No known vulnerabilities were detected.
1: One or more known vulnerabilities were found.

pip-audit's exit code cannot be suppressed. See Suppressing exit codes from pip-audit for supported alternatives.

Dry runs

pip-audit supports the --dry-run flag, which can be used to control whether an audit (or fix) step is actually performed.

On its own, pip-audit --dry-run skips the auditing step and prints the number of dependencies that would have been audited.
In fix mode, pip-audit --fix --dry-run performs the auditing step and prints out the fix behavior (i.e., which dependencies would be upgraded or skipped) that would have been performed.

Examples

Audit dependencies for the current Python environment:

$ pip-audit
No known vulnerabilities found

Audit dependencies for a given requirements file:

$ pip-audit -r ./requirements.txt
No known vulnerabilities found

Audit dependencies for a requirements file, excluding system packages:

$ pip-audit -r ./requirements.txt -l
No known vulnerabilities found

Audit dependencies for a local Python project:

$ pip-audit .
No known vulnerabilities found

pip-audit searches the provided path for various Python "project" files. At the moment, only pyproject.toml is supported.

Audit dependencies when there are vulnerabilities present:

$ pip-audit
Found 2 known vulnerabilities in 1 package
Name  Version ID             Fix Versions
----  ------- -------------- ------------
Flask 0.5     PYSEC-2019-179 1.0
Flask 0.5     PYSEC-2018-66  0.12.3

Audit dependencies including aliases:

$ pip-audit --aliases
Found 2 known vulnerabilities in 1 package
Name  Version ID             Fix Versions Aliases
----  ------- -------------- ------------ -------------------------------------
Flask 0.5     PYSEC-2019-179 1.0          CVE-2019-1010083, GHSA-5wv5-4vpf-pj6m
Flask 0.5     PYSEC-2018-66  0.12.3       CVE-2018-1000656, GHSA-562c-5r94-xh97

Audit dependencies including descriptions:

$ pip-audit --desc
Found 2 known vulnerabilities in 1 package
Name  Version ID             Fix Versions Description
----  ------- -------------- ------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Flask 0.5     PYSEC-2019-179 1.0          The Pallets Project Flask before 1.0 is affected by: unexpected memory usage. The impact is: denial of service. The attack vector is: crafted encoded JSON data. The fixed version is: 1. NOTE: this may overlap CVE-2018-1000656.
Flask 0.5     PYSEC-2018-66  0.12.3       The Pallets Project flask version Before 0.12.3 contains a CWE-20: Improper Input Validation vulnerability in flask that can result in Large amount of memory usage possibly leading to denial of service. This attack appear to be exploitable via Attacker provides JSON data in incorrect encoding. This vulnerability appears to have been fixed in 0.12.3. NOTE: this may overlap CVE-2019-1010083.

Audit dependencies in JSON format:

$ pip-audit -f json | python -m json.tool
Found 2 known vulnerabilities in 1 package
[
  {
    "name": "flask",
    "version": "0.5",
    "vulns": [
      {
        "id": "PYSEC-2019-179",
        "fix_versions": [
          "1.0"
        ],
        "aliases": [
          "CVE-2019-1010083",
          "GHSA-5wv5-4vpf-pj6m"
        ],
        "description": "The Pallets Project Flask before 1.0 is affected by: unexpected memory usage. The impact is: denial of service. The attack vector is: crafted encoded JSON data. The fixed version is: 1. NOTE: this may overlap CVE-2018-1000656."
      },
      {
        "id": "PYSEC-2018-66",
        "fix_versions": [
          "0.12.3"
        ],
        "aliases": [
          "CVE-2018-1000656",
          "GHSA-562c-5r94-xh97"
        ],
        "description": "The Pallets Project flask version Before 0.12.3 contains a CWE-20: Improper Input Validation vulnerability in flask that can result in Large amount of memory usage possibly leading to denial of service. This attack appear to be exploitable via Attacker provides JSON data in incorrect encoding. This vulnerability appears to have been fixed in 0.12.3. NOTE: this may overlap CVE-2019-1010083."
      }
    ]
  },
  {
    "name": "jinja2",
    "version": "3.0.2",
    "vulns": []
  },
  {
    "name": "pip",
    "version": "21.3.1",
    "vulns": []
  },
  {
    "name": "setuptools",
    "version": "57.4.0",
    "vulns": []
  },
  {
    "name": "werkzeug",
    "version": "2.0.2",
    "vulns": []
  },
  {
    "name": "markupsafe",
    "version": "2.0.1",
    "vulns": []
  }
]

Audit and attempt to automatically upgrade vulnerable dependencies:

$ pip-audit --fix
Found 2 known vulnerabilities in 1 package and fixed 2 vulnerabilities in 1 package
Name  Version ID             Fix Versions Applied Fix
----- ------- -------------- ------------ ----------------------------------------
flask 0.5     PYSEC-2019-179 1.0          Successfully upgraded flask (0.5 => 1.0)
flask 0.5     PYSEC-2018-66  0.12.3       Successfully upgraded flask (0.5 => 1.0)

Troubleshooting

Have you resolved a problem with pip-audit? Help us by contributing to this section!

`pip-audit` shows irrelevant vulnerability reports!

In a perfect world, vulnerability feeds would have an infinite signal-to-noise ratio: every vulnerability report would be (1) correct, and (2) applicable to every use of every dependency.

Unfortunately, neither of these is guaranteed: vulnerability feeds are not immune to extraneous or spam reports, and not all uses of a particular dependency map to all potential classes of vulnerabilities.

If your pip-audit runs produce vulnerability reports that aren't actionable for your particular application or use case, you can use the --ignore-vuln ID option to ignore specific vulnerability reports. --ignore-vuln supports aliases, so you can use a GHSA-xxx or CVE-xxx ID instead of a PYSEC-xxx ID if the report in question does not have a PYSEC ID.

For example, here is how you might ignore GHSA-w596-4wvx-j9j6, which is a common source of noisy vulnerability reports and false positives for users of pytest:

# Run the audit as normal, but exclude any reports that match GHSA-w596-4wvx-j9j6
$ pip-audit --ignore-vuln GHSA-w596-4wvx-j9j6

The --ignore-vuln ID option works with all other dependency resolution and auditing options, meaning that it should function correctly with requirements-style inputs, alternative vulnerability feeds, and so forth.

It can also be passed multiple times, to ignore multiple reports:

# Run the audit as normal, but exclude any reports that match these IDs
$ pip-audit --ignore-vuln CVE-XXX-YYYY --ignore-vuln CVE-ZZZ-AAAA

`pip-audit` takes longer than I expect!

Depending on how you're using it, pip-audit may have to perform its own dependency resolution, which can take roughly as long as pip install does for a project. See the security model for an explanation.

You have two options for avoiding dependency resolution: audit a pre-installed environment, or ensure that your dependencies are already fully resolved.

If you know that you've already fully configured an environment equivalent to the one that pip-audit -r requirements.txt would audit, you can simply reuse it:

# Note the absence of any "input" arguments, indicating that the environment is used.
$ pip-audit

# Optionally filter out non-local packages, for virtual environments:
$ pip-audit --local

Alternatively, if your input is fully pinned (and optionally hashed), you can tell pip-audit to skip dependency resolution with either --no-deps (pinned without hashes) or --require-hashes (pinned including hashes).

The latter is equivalent to pip's hash-checking mode and is preferred, since it offers additional integrity.

# fails if any dependency is not fully pinned
$ pip-audit --no-deps -r requirements.txt

# fails if any dependency is not fully pinned *or* is missing hashes
$ pip-audit --require-hashes -r requirements.txt

`pip-audit` can't authenticate to my third-party index!

Authenticated third-party or private indices

pip-audit supports --index-url and --extra-index-url for configuring an alternate or supplemental package indices, just like pip.

When unauthenticated, these indices should work as expected. However, when a third-party index requires authentication, pip-audit has a few additional restrictions on top of ordinary pip:

Interactive authentication is not supported. In other words: pip-audit will not prompt you for a username/password for the index.
pip's keyring authentication is supported, but in a limited fashion: pip-audit uses the subprocess keyring provider, since audits happen in isolated virtual environments. The subprocess provider in turn is subject to additional restrictions (such as a required username); pip's documentation explains these in depth.

In addition to the above, some third-party indices have required, hard-coded usernames. For example, for Google Artifact registry, the hard-coded username is oauth2accesstoken. See #742 and pip#11971 for additional context.

Tips and Tricks

Running against a `pipenv` project

pipenv uses both a Pipfile and Pipfile.lock file to track and freeze dependencies instead of a requirements.txt file. pip-audit cannot process the Pipfile[.lock] files directly, however, these can be converted to a supported requirements.txt file that pip-audit can run against. pipenv has a built-in command to convert dependencies to a requirements.txt file (as of v2022.4.8):

$ pipenv run pip-audit -r <(pipenv requirements)

Suppressing exit codes from `pip-audit`

pip-audit intentionally does not support internally suppressing its own exit codes.

Users who need to suppress a failing pip-audit invocation can use one of the standard shell idioms for doing so:

pip-audit || true

or, to exit entirely:

pip-audit || exit 0

The exit code can also be captured and handled explicitly:

pip-audit
exitcode="${?}"
# do something with ${exitcode}

See Exit codes for a list of potential codes that need handling.

Reporting only fixable vulnerabilities

In development workflows, you may want to ignore the vulnerabilities that haven't been remediated yet and only investigate them in your release process. pip-audit does not support ignoring unfixed vulnerabilities. However, you can export its output in JSON format and externally process it. For example, if you want to exit with a non-zero code only when the detected vulnerabilities have known fix versions, you can process the output using jq as:

test -z "$(pip-audit -r requirements.txt --format=json 2>/dev/null | jq '.dependencies[].vulns[].fix_versions[]')"

A simple (and inefficient) example of using this method would be:

test -z "$(pip-audit -r requirements.txt --format=json 2>/dev/null | jq '.dependencies[].vulns[].fix_versions[]')" || pip-audit -r requirements.txt

which runs pip-audit as usual and exits with a non-zero code only if there are fixed versions for the known vulnerabilities.

Security Model

This section exists to describe the security assumptions you can and must not make when using pip-audit.

TL;DR: If you wouldn't pip install it, you should not pip audit it.

pip-audit is a tool for auditing Python environments for packages with known vulnerabilities. A "known vulnerability" is a publicly reported flaw in a package that, if uncorrected, might allow a malicious actor to perform unintended actions.

pip-audit can protect you against known vulnerabilities by telling you when you have them, and how you should upgrade them. For example, if you have somepackage==1.2.3 in your environment, pip-audit can tell you that it needs to be upgraded to 1.2.4.

You can assume that pip-audit will make a best effort to fully resolve all of your Python dependencies and either fully audit each or explicitly state which ones it has skipped, as well as why it has skipped them.

pip-audit is not a static code analyzer. It analyzes dependency trees, not code, and it cannot guarantee that arbitrary dependency resolutions occur statically. To understand why this is, refer to Dustin Ingram's excellent post on dependency resolution in Python.

As such: you must not assume that pip-audit will defend you against malicious packages. In particular, it is incorrect to treat pip-audit -r INPUT as a "more secure" variant of pip-audit. For all intents and purposes, pip-audit -r INPUT is functionally equivalent to pip install -r INPUT, with a small amount of non-security isolation to avoid conflicts with any of your local environments.

pip-audit is first and foremost a auditing tool for Python packages. You must not assume that pip-audit will detect or flag "transitive" vulnerabilities that might be exposed through Python packages, but are not actually part of the package itself. For example, pip-audit's vulnerability information sources are unlikely to include an advisory for a vulnerable shared library that a popular Python package might use, since the Python package's version is not strongly connected to the shared library's version.

Licensing

pip-audit is licensed under the Apache 2.0 License.

pip-audit reuses and modifies examples from resolvelib, which is licensed under the ISC license.

Contributing

See the contributing docs for details.

Code of Conduct

Everyone interacting with this project is expected to follow the PSF Code of Conduct.

pip-audit's People

Contributors

Stargazers

Watchers

pip-audit's Issues

Improve `pip-api`'s support for environment markers

As part of closing #1, we need to address di/pip-api#61, since real-world requirements.txts will likely contain environment markers.

Doing so should be pretty straightforward, since marker parsing is already present -- the dependencies just need to be correctly uniqued based on the satisfied markers.

Add a progress indicator to the CLI

When running pip-audit against a large environment or requirements file, querying the vulnerability service can take a decent amount of time. We should consider adding some sort of progress spinner or bar to the CLI so that the user knows that pip-audit hasn't hung or frozen.

Some considerations:

We should only render the progress indicator if we know for certain that we're on a TTY
The progress indicator's output must not interfere with the contents of stdout

Handoff: Document architecture and data model

As part of the handoff, we should deliver documentation that explains our core design decisions:

Key architectural components (vuln service, dep collection, formatting interfaces) and how to use them (examples of implementing each)
An explanation of our data model (each dependency has multiple potential vulnerabilities, etc.)

`pip-audit -r <FILE>`

pip-audit -r <FILE> should parse the supplied requirements file instead of scanning the local environment.

Figure out whether we need to evaluate environment markers in dependency resolution

In this comment thread, there's a discussion about whether we need to filter out requirements in dependency resolution due to their markers.

As of now, we think that it's not necessary due to this change to pip-api to evaluate markers during requirements parsing. But once we get requirements parsing done end to end, we should come back and confirm that this is the case.

The WIP patch (not working as expected) I had to do the filtering looks like this. If we need to filter, we can use this as a starting point and debug it.

modified   pip_audit/dependency_source/resolvelib/pypi_wheel_provider.py
@@ -123,6 +123,12 @@ class PyPIProvider(AbstractProvider):

         bad_versions = {c.version for c in incompatibilities[identifier]}

+        requirements = [
+            req
+            for req in requirements
+            if req.marker is None or req.marker.evaluate({"extra": req.extras})
+        ]
+
         # Accumulate extras
         extras: Set[str] = set()
         for r in requirements:
modified   test/dependency_source/test_resolvelib.py
@@ -73,6 +73,31 @@ def test_resolvelib_extras():
     assert expected_deps == set(resolved_deps[req])


+def test_resolvelib_conditional_extras():
+    resolver = resolvelib.ResolveLibResolver()
+
+    # First check the dependencies without extras and as a basis for comparison
+    reqs = [
+        Requirement("requests>=2.8.1 ; python_version >= '2.7'"),
+        Requirement("requests[socks]>=2.8.1 ; python_version < '2.7'"),
+    ]
+    resolved_deps = dict(resolver.resolve_all(reqs))
+    print(resolved_deps)
+    assert len(resolved_deps) == 1
+    expected_deps = set(
+        [
+            Dependency("requests", Version("2.26.0")),
+            Dependency("charset-normalizer", Version("2.0.6")),
+            Dependency("idna", Version("3.2")),
+            Dependency("certifi", Version("2021.5.30")),
+            Dependency("urllib3", Version("1.26.7")),
+        ]
+    )
+    assert reqs[0] in resolved_deps
+    assert reqs[1] not in resolved_deps
+    assert expected_deps == set(resolved_deps[reqs[0]])
+
+
 def test_resolvelib_patched(monkeypatch):
     # In the following unit tests, we'll be mocking certain function calls to test corner cases in
     # the resolver. Before doing that, use the mocks to exercise the happy path to ensure that

Add support for source distributions in `PyPIProvider`

The current provider implementation only supports wheels, whereas we'll need to support source distributions. The relevant check is here.

@woodruffw had a useful comment here outlining what this will entail.

Provide example schema for PyPI vulnerability service

The current plan is to extend the release endpoint to provide vulnerability information for a given release. We should provide an example schema that has all the information that we need for pip-audit.

Make the CLI more responsive when auditing from requirements files

Currently, the CLI can block without any indicators for a decent period while collecting dependency information from requirements files.

We should figure out a way to make it responsive (or provide a working indicator) during that initial period.

Integration into `pip`

In the medium-to-long-term, it would be great to make pip-audit a subcommand of pip, i.e. pip audit.

This will involve coordination with pip as the upstream, and requires us to figure a few things out, including but not limited to:

If we merge into pip, should we drop our pip-api dependency and use pips internal APIs directly?
Does it make sense to maintain a parallel "standalone" pip-audit that has functionality pip might not want (e.g. container scanning)?

Cache HTTP requests based on ETags

This tool should be configured to cache HTTP requests based on ETags. Once #11 is resolved, that API includes ETags and this may speed up repeated audits.

Support auditing sub-dependencies of individual projects

As a project maintainer, I'd like to be able to use pip-audit to audit the sub-dependencies of my project (likely by somehow evaluating my local source tree prior to building a distribution artifact).

E.g., I maintain https://github.com/pypa/sampleproject, which depends on peppercorn. A CVE is released for some version of peppercorn, and I need to adjust my sub-dependency specification to avoid installing affected versions.

Make the default value for `-r`/`--requirements` more clear

Currently it's not clear what the default value is from the help message:

  -r REQUIREMENTS, --requirement REQUIREMENTS
                        audit the given requirements file; this option can be used multiple times (default: [])

This should probably be something like ./requirements.txt.

Implement `--fix` for `PipSource`

For each of the available inputs, support adding a --fix flag which will automatically update the dependency specification or environment in question to exclude any found vulnerabilities.

For example:

if a vulnerability is found in a local environment, uninstall it and install a version with the fix
if a vulnerability is found in a requirements.txt file, update the version specification for the affected project to exclude the vulnerable version

This should be a meta-issue for determining the UX around this feature, and we should create sub-issues for each of the potential ways a fix can be applied.

#212
#215

Develop an `Auditor` class for both the CLI and public API

In order to isolate concerns and ensure that pip-audit can be used both as a CLI and as an API, we should create something like a pip_audit.Auditor class.

That Auditor class should be instantiated the appropriate state:

A list of dependencies to audit
The vulnerability service to use
Any additional configurable options (e.g. whether to do a dry run, ignore system dependencies, etc.)

Testing: flag tests as "online

Pick an SBOM format

Emitting a well-known SBOM format is a lower priority, but it's something we specified in the proposal.

Two good options are SPDX and CycloneDX; we should determine:

Which one(s) have good, maintained Python APIs
Which one(s) have community adoption

Design a generic adaptor for Python vulnerability services

Now that we've evaluated osv.dev (#2), we can start designing an abstract interface that individual vulnerability services can satisfy.

From discussion on Slack: we probably want something like this:

class VulnerabilityService(ABC):
    def query(spec: Dependency) -> List[VulnerabilityResult]:
        ...

    def query_all(specs: List[Dependency]) -> Dict[Dependency, List[VulnerabilityResult]]:
        # naive implementation that can be overridden if
        # a particular service supports bulk queries
        for spec in specs:
            ...

Develop a dependency collection interface

Similarly to how we've developed interfaces for vulnerability services and reporting formats: we should have an interface that individual sources of dependencies should implement.

That, in turn, will make our Auditor (#18) implementation sufficiently generic and extensible.

Support other Python packaging formats

Outside of requirements.txt, there are a few other common Python packaging files:

A few different tools store requirements in pyproject.toml (#83)
poetry puts locked (i.e., frozen) dependencies in poetry.lock (#84)
pipenv uses Pipfile and Pipfile.lock (#85)

Each of these functionally boils down to a RequirementsSource, but with a bit of pre-processing to get them out of their dedicated formats.

Figure out how to represent the absence of a known good upgrade

There are (at least) two scenarios in which we'll want to report a vulnerability in a package, but won't be (naively) able to offer upgrade advice:

The package is on its latest release, and as such cannot be upgraded any further
The package can be upgraded to a newer release that fixes a vulnerability, but not without introducing another known vulnerability

In the first case, we should probably emit a special value indicating that we're incapable of offering upgrade advice.

The second case is tricker; we could:

Not offer upgrade advice at all
Offer upgrade advice if and only if the newer version is "less" vulnerable than the older
Offer upgrade advice unconditionally, but warn the user that they're just trading one vulnerability for another

Add a `RequirementSource` to our dependency source API.

#22 will implement DependencySource for pip list; we need a corresponding implementation for parsing and concretizing requirement files.

Test against 3.10

The CI should test pip-audit against Python 3.10.

Evaluate `pip-api`

pip-api will be our source of ground truth regarding the Python packaging environment; we'll use it both to determine what's currently installed and to parse any explicitly specified requirements files.

We need to determine:

Whether pip-api returns sufficient information for our use cases
Whether it has any known bugs related to either the environment's packages or parsing requirement files
Performance characteristics, compatibility with our expected development environment
- From local testing, listing installed distributions is reasonably fast (especially since we should only need to do it at most once per pip-audit invocation).
- pip-api supports Python 3.5 and newer, so we're fine on that front.

Handoff: Rewrite all URLs

There are various URLs in the README that will need to be rewritten when this repository is transfered.

Add CLI flag to toggle vulnerability description in columns format

We currently display vulnerability descriptions in the pip-audit output. However, these descriptions can be quite long and make the output hard to read.

In the case of columns (output designed to be read by humans), we should hide descriptions unless explicitly requested.

Intermittent CI failures with `resolvelib`

These failures seem to occur intermittently on all unit tests that use resolvelib:

____________________ test_requirement_source_multiple_files ____________________

monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7fccbeeebb80>

    def test_requirement_source_multiple_files(monkeypatch):
        file1 = "requirements1.txt"
        file2 = "requirements2.txt"
        file3 = "requirements3.txt"
    
        source = requirement.RequirementSource(
            [Path(file1), Path(file2), Path(file3)],
            ResolveLibResolver(),
        )
    
        def read_file_mock(f):
            filename = f.name
            if filename == file1:
                return ["flask==2.0.1"]
            elif filename == file2:
                return ["requests==2.8.1"]
            else:
                assert filename == file3
                return ["pip-api==0.0.22\n", "packaging==21.0"]
    
        monkeypatch.setattr(_parse_requirements, "_read_file", read_file_mock)
    
>       specs = list(source.collect())

test/dependency_source/test_requirement.py:50: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pip_audit/dependency_source/requirement.py:33: in collect
    for _, deps in self.resolver.resolve_all(iter(req_values)):
pip_audit/dependency_source/interface.py:55: in resolve_all
    yield (req, self.resolve(req))
pip_audit/dependency_source/resolvelib/resolvelib.py:27: in resolve
    result = self.resolver.resolve([req])
env/lib/python3.8/site-packages/resolvelib/resolvers.py:482: in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
env/lib/python3.8/site-packages/resolvelib/resolvers.py:373: in resolve
    name = min(unsatisfied_names, key=self._get_preference)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <resolvelib.resolvers.Resolution object at 0x7fccbee6bf40>
name = 'flask'

    def _get_preference(self, name):
>       return self._p.get_preference(
            identifier=name,
            resolutions=self.state.mapping,
            candidates=IteratorMapping(
                self.state.criteria,
                operator.attrgetter("candidates"),
            ),
            information=IteratorMapping(
                self.state.criteria,
                operator.attrgetter("information"),
            ),
            backtrack_causes=self.state.backtrack_causes,
        )
E       TypeError: get_preference() got an unexpected keyword argument 'backtrack_causes'

env/lib/python3.8/site-packages/resolvelib/resolvers.py:178: TypeError

Remove `virtualenv-api` dependency

In #48, I introduced a dependency to virtualenv-api to install source distributions in a virtual environment and figure out what their dependencies are.

We should remove this dependency and instead rely on builtin modules. The builtin venv module has less functionality (it's more about creating virtual environments than modifying or introspecting them) so we'll have to do some legwork to replace it. Specifically:

Creating the virtual environment with venv.
Shell out to Python and do something like:

source env/bin/activate
pip install -e sdist_dir/
pip install pip-api
python -c "import pip_api; print(pip_api.installed_packages)"

Parse the shell output into package name and version pairs.
Filter out pip_api dependencies since those will turn up in the output also.

Support a "dry-run" mode

In "dry-run" mode, pip-audit should do everything it would normally do up to hitting the vulnerability service.

Emit results in a SBOM format

Blocked on #3.

Evaluate `resolvelib`

As a parallel or perhaps complementary track to #29: resolvelib provides an abstract interface for resolving dependencies that are produced by a client-supplied "provider" (presumably something like us querying the PyPI APIs).

This might be too generic of a library for our use case, but it might be useful if we go further into reworking pip-tools to supply a Python API.

PyPI Link: https://pypi.org/project/resolvelib/

Support for custom indices

For the time being, our MVP is scoped to just support for PyPI. However, it's worth considering what we'd require in order to support custom package indices (whether PyPI mirrors or entirely separate private indexes).

Some potential points of issue:

Our #23 adaptor will need to be sufficiently generic, adapting either PyPI or another index under the hood as configured.
We might need custom index support from pip-api? This is unclear, since our only use of pip-api in these contexts would be for requirements parsing, and the requirements file itself shouldn't make any references to the index.

Add type hints to `pip-api` and update our dependency

Title says it all: this will improve our confidence in our use of pip-api's APIs and will allow us to enable more MyPy features in our own code.

Evaluate osv.dev

Per @di: The final deliverable version of pip-audit will not use osv.dev, but instead should use a (hitherto unimplemented) REST API provided by PyPI.

Since we'll need to consume that API, we should evaluate osv.dev and determine what we'd like to be different.

Schematize the PyPI vulnerability API

Related to #72: the PyPI vulnerability API, which is part of Warehouse's codebase, is not currently schematized.

Schematizing it would increase our confidence in pip-audit's reliability, and would us to write more type/construction friendly interactions with the API, e.g. with Pydantic.

Consider parallelizing our `sdist` dependency collection

Right now, our dependency collection process (via resolvelib looks like this):

Start at top, iteratively walk the dependency list
If the dependency has a wheel, grab its dependencies from the wheel metadata for the next iteration
If the dependency only has an sdist, install that sdist in a temporary venv to compute its dependencies for the next iteration

(3) is slow, since at each iterative step we might have N source distributions that each get their own virtual environment (and virtual environment setup is pretty slow). We could probably speed it up a decent amount by using a thread pool (with a reasonable cap) to create multiple environments and collect them as they complete.

Downsides: thread pools mean more complex error and interrupt (i.e. ^C) handling. We should figure out the tradeoffs for this before committing to this direction.

Related to #56, since doing this will (hopefully) make the CLI more responsive.

Use OSV 1.0 format

Hi!

I noticed that you're using the old OSV format fields in https://github.com/trailofbits/pip-audit/blob/b3fd732bf5f70311dcd9a32e15e26889597e3060/pip_audit/service/osv.py#L42.

This field is deprecated and will be removed soon, along with the top level "package" field. The new field which replaces this is "affected". Could you please use that instead? The 1.0 spec is described at https://ossf.github.io/osv-schema/.

Evaluate `pip-tools`

We're probably going to need pip-tools to fully resolve any requirements files that we're passed, since we'll need to both resolve the entire dependency tree and concretize each dependency down to a specific version.

Link: https://pypi.org/project/pip-tools/

Link: https://github.com/jazzband/pip-tools/

Add support for extras in `PyPIProvider`

At the moment, the provider that got from the resolvelib example, doesn't support extras in the package requirement. We should make this work.

Once we add support, we should ensure that this unit test works end-to-end.

Emit a warning when the underlying `pip` is sufficiently old

pip-api does a valiant job of wrapping pip for us, but certain functionality fundamentally degrades beyond a few versions:

Earlier than 10.0.0b0: pip list -v --format=json doesn't include location information, which hampers our ability to filter "system" dependencies (#7)

We should still try our hardest with these older pips, but also emit a warning similar to the one emitted by pip itself when it's out of date.

Testing: flag tests as "online"

Our test suite currently includes "online" tests, i.e. tests that require network connectivity and access to services for their functionality.

We should mark these tests as "online," so that a user can disable them for local-only testing.

Pytest supports CLI extensions for this purpose, like in this SO post.

Use JSON schemas for our vulnerability services, where possible

OSV and PyPI both provide JSON APIs, both of which are (probably?) schematized.

We should embed their schemas and generate models (maybe pydantic ones) from them, to give ourselves more confidence about the shape of the responses we expect and to better conform to correctness by construction.

Handoff: Change the `PYPI_TOKEN` secret

Once pip-audit is handed off, the PYPI_TOKEN should be updated to someone who isn't me or ToB.

Support filtering "system" dependencies

pip_api.installed_distributions() returns every visible distribution, which can potentially include distributions provided by the system/system package manager.

Issuing messages for these might not be desirable default behavior, for a few reasons:

System-installed dependencies might be required by the system, and thus cannot be safely upgraded
System-installed dependencies might be installed via a mechanism other than pip, so issued guidance might not always be applicable.
System-installed dependencies might be patched by distribution maintainers to remove known vulnerabilities, without updating the version number.

As such, it probably makes sense for the CLI to have option(s) that allow the user to enable (or disable) filtering of dependencies that look like they're supplied by the system. This, in turn, requires us to come up with a reliable way of determining whether a given dependency is a "system" one.

Update default provider to be PyPI

Currently it's OSV:

  -s {osv,pypi}, --vulnerability-service {osv,pypi}
                        the vulnerability service to audit dependencies against (default: osv)

Support PyPI's vulnerability service via our service adaptor

Following #9 and #10: once it's available, we should support PyPI's vulnerability service via our adaptor.

Evaluate tools for introspecting container images

The syft tool supports generating a SBOM for a container image and has support for Python packages. We should check to see if we can leverage this to support container images in pip-audit.

cc: @di

Output formats for `pip-audit`

Apart from supporting a standard SBOM format (#3), pip-audit should have at least two output formats for the MVP:

A human-readable format for display on a terminal or logs
A JSON format for interpretation by other programs