Git Product home page Git Product logo

pubgrub's Introduction

PubGrub version solving algorithm

license crates.io docs.rs guide

Version solving consists in efficiently finding a set of packages and versions that satisfy all the constraints of a given project dependencies. In addition, when that is not possible, PubGrub tries to provide a very human-readable and clear explanation as to why that failed. The introductory blog post about PubGrub presents one such example of failure explanation:

Because dropdown >=2.0.0 depends on icons >=2.0.0 and
  root depends on icons <2.0.0, dropdown >=2.0.0 is forbidden.

And because menu >=1.1.0 depends on dropdown >=2.0.0,
  menu >=1.1.0 is forbidden.

And because menu <1.1.0 depends on dropdown >=1.0.0 <2.0.0
  which depends on intl <4.0.0, every version of menu
  requires intl <4.0.0.

So, because root depends on both menu >=1.0.0 and intl >=5.0.0,
  version solving failed.

This pubgrub crate provides a Rust implementation of PubGrub. It is generic and works for any type of dependency system as long as packages (P) and versions (V) implement the provided Package and Version traits.

Using the pubgrub crate

A guide with both high-level explanations and in-depth algorithm details is available online. The API documentation is available on docs.rs. A version of the API docs for the unreleased functionality from dev branch is also accessible for convenience.

Contributing

Discussion and development happens here on GitHub and on our Zulip stream. Please join in!

Remember to always be considerate of others, who may have different native languages, cultures and experiences. We want everyone to feel welcomed, let us know with a private message on Zulip if you don't feel that way.

PubGrub

PubGrub is a version solving algorithm, written in 2018 by Natalie Weizenbaum for the Dart package manager. It is supposed to be very fast and to explain errors more clearly than the alternatives. An introductory blog post was published on Medium by its author.

The detailed explanation of the algorithm is provided on GitHub, and complemented by the "Internals" section of our guide. The foundation of the algorithm is based on ASP (Answer Set Programming), and a book called "Answer Set Solving in Practice" by Martin Gebser, Roland Kaminski, Benjamin Kaufmann and Torsten Schaub.

pubgrub's People

Contributors

0xatticus avatar aleksator avatar baszalmstra avatar burntsushi avatar charliermarsh avatar dependabot[bot] avatar eh2406 avatar kianmeng avatar konstin avatar lpil avatar mpizenberg avatar njsmith avatar svanderbleek avatar tranzystorekk avatar xfbs avatar zanieb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pubgrub's Issues

make Solver::run not a trait implementation

This does bring up the related issue that Solver::run is a trait implementation, so theoretically someone could

impl<P: Package, V: Version + Hash> Solver<P, V> for BadSolver<P, V> {
    fn list_available_versions(&mut self, package: &P) -> Result<Vec<V>, Box<dyn Error>> {
        ...
    }

    fn get_dependencies(
        &mut self,
        package: &P,
        version: &V,
    ) -> Result<Option<Map<P, Range<V>>>, Box<dyn Error>> {
        ...
    }

    fn run(
        &mut self,
        package: P,
        version: impl Into<V>
    ) -> Result<Map<P, V>, PubGrubError<P, V>> {
        Ok(Map::new())
    }
}

I don't know why someone would, but our api lets them. Mabe run could be a free function that takes a impl Solver as an argument, or a method on anything that impls Solver.

Originally posted by @Eh2406 in #10 (comment)

In memory, no need to keep derivations when there is a decision

I realized that while writing the guide. Every usage of memory we have will either do something if there is a decision or something else if there isn't one yet. The only method returning a mix of decision and derivation is assignment_intersection but in fact, if there is a decision, we don't need derivations term, since the intersection will directly be the term of the decision.

So in fact we should have PackageAssignment be an enum instead of a struct. That would imply some changes in the API, and especially some algorithm changes regarding remove_decision that should not be used anymore, but I believe this is doable.

Clickable documentation links

Use rustdoc syntax to produce clickable links in the documentation where possible.

For example, [Version] instead of `Version` if the linked item is in the same module.
If the link wants to reference item from a different module, a path has to provided:
[Version](crate::version::Version).

Open question:
Is it possible to automatically test that created links are valid?

feat: let the dependency provider choose a package

We already let the DependencyProvider the responsibility of ordering versions by preference. This gives a great deal of flexibility since the provider can then change that order to implement multiple strategies such as newest versions first, oldest versions first, locked versions first, stored on file system versions first etc.

Recently, we have been discussing how to improve the performance of solver by smartly picking the next package to evaluate #32. It has been a very good improvement, but I can't stop but wonder if that should be our choice. I don't think so. And just like with versions, giving that choice back to the DependencyProvider would bring more flexibility I think. For correctness, we still need to pre-select the potential packages, but once that's done, we could let the DependencyProvider decide which one to choose from a provided iterator of potential packages. Having the precomputed term intersection provided by the solver as in #37 would even result in cleaner code since the partial_solution doesn't need to call list_available_versions anymore, the DependencyProvider can do that on its own however it wants to do it.

Non deterministic error reporting

I suppose due to the fact that iterating on a HashMap is not deterministic, the report when there is no solution is also not deterministic. You can run multiple times example/doc_interface_error.rs to experience it.

How strictly do we want reproducibility?

I am working on a hole system proptest, like is used in Cargo. One of the smallest properties I thought of is:

proptest! {
    #[test]
    fn prop_same_on_repeated_runs(
        (mut solver, cases) in registry_strategy()
    )  {
        for (name, ver) in cases {
            match (solver.run(name, ver), solver.run(name, ver)) {
                (Ok(l), Ok(r)) => assert_eq!(l, r),
                (Err(_), Err(_)) => (), // todo: how to compare Err?
                _ => panic!("not the same result")
            }
        }
    }
}

And it is failing, While I work to create a minimal example, I just wanted to be sure this is a test that we want?
Are there ways we are Ok with there being differences between runs on the same input?

Project status?

This is an awesome project, but it's a little worrying that there hasn't been a commit since 2021. What are the plans, if any, for continued development?

Async API?

I've been playing with making a WebAssembly solver for the Elm ecosystem, with the aim of it being easily usable for tooling written in JavaScript. The wasm API I've settled on for the time being looks like this:

let wasm = require("elm-solve-deps-wasm");
let solution = wasm.solve_deps(
  rootPackage,
  fetchDependencies, // function
  listVersions // function
);

It works fine but requires that the fetchDependencies and listVersions functions passed as argument are sync. However, in the NodeJS ecosystem, reading files and making http requests is mainly done with async. So this API forces the user to use sync versions of file IO, which is fine, and sync versions of http requests which is a bit more annoying to do.

In order to be able to write those two functions with async, it means we also need an async version of the DependencyProvider trait, where getDependencies and choosePackageVersion would return futures instead of normal values. I'm not sure if it's easily done, because async and traits are quite painful together from some of my readings, but I wanted to raise this use case.

Is the bump method on the Version trait actually necessary?

The ability to know the next version, expressed by the bump(&self) -> Self method on the Version trait does not seem to be vital. In the code, it is only used twice. The first time is for the definition of Range::exact(Version) -> Range. The second time is in the implementation of Display for an exact range, in order to have more readable reports. There, it is used to check if we have a range like (v1, Some(v2)) where v2 == v1.bump(). In such case it would be displayed v1 instead of v1 <= v < v2.

In addition, requiring a bump method seems like something that will prevent usage of things like pre-releases or versions based on dates, floating point numbers or other potentially valid versions otherwise.

If we were to remove the bump method, how could be represent ranges containing a single version that would still play nice with set operations like intersection and union?

don't panic: on bad dependencies

Minimizing from the panic found in #34 (comment). I get the small test case of:

#[test]
fn should_always_find_a_satisfier() {
    let mut dependency_provider = OfflineDependencyProvider::<_, NumberVersion>::new();
    dependency_provider.add_dependencies("a", 0, vec![("b", Range::none())]);

    // Run the algorithm.
    let _ = resolve(&dependency_provider, "a", 0);
}

While this is a silly input, we should not panic on it.

Move incompatibility identifiers outside of the incompatibility struct?

I usually don't like when things know how they are stored. Having an id inside the incompatibility type is one such case. At the time of introducing the incompatibility store, I actually tried three possible designs for the recursive incompatibility type. (1) Id inside the struct, (2) id outside the struct, (3) using Rc or Box. (3) was eliminated when I ported the "no solution" reporting code, because having ids for this is very useful. But (1) and (2) are still viable options.

Current development has been done on top of (1) but I personally think (2) might be better, even though slightly more verbose. You can see the concrete difference between the two with the comparison of the incompat-indices-in and incompat-indices-out branches.

Conflict resolution improvements

Writing the detailed explanation on conflict resolution helped me better understand how it works. In particular I think we can improve the conflict resolution code in many ways, including:

  1. Avoiding the computation of self.partial_solution.relation(&root_cause) after conflict resolution. That also means avoiding one impossible Failure case so that's great.
  2. Simplify prior cause computation.
  3. Doing only one pass instead of two by keeping the decision level of the last assignment that made a term of the incompatibility satisfied.

(1) I now understand that since decision levels of the satisfier and previous satisfier are different, backtracking to the previous decision level implies that only one term will be unsatisfied in the incompatibility, so the incompatibility is guaranteed to be almost satisfied. And we know that the unsatisfied package is the one of the satisfier. So no need to redo that self.partial_solution.relation(&root_cause) call after conflict resolution, we can instead return that package term of the incompatibility in the output of conflict resolution.

(2) Currently, when computing prior cause, we are explicitly passing it the term of the derivation. But that term is guaranteed to be the negation of the term corresponding to that package in the cause of that derivation. So we can simply remove that from the prior_cause method arguments. Then, we can rewrite prior cause as:

diff --git a/src/internal/incompatibility.rs b/src/internal/incompatibility.rs
index 64c7a20..31a113a 100644
--- a/src/internal/incompatibility.rs
+++ b/src/internal/incompatibility.rs
@@ -207,15 +207,14 @@ impl<P: Package, V: Version> Incompatibility<P, V> {
         satisfier: &Term<V>,
     ) -> Self {
         let kind = Kind::DerivedFrom(incompat.id, satisfier_cause.id);
-        let mut t1 = incompat.package_terms.clone();
-        let mut t2 = satisfier_cause.package_terms.clone();
-        t1.remove(package);
-        t2.remove(package);
-        let mut prior_cause_terms = Self::intersection(&t1, &t2);
-        let term = incompat.package_terms.get(package).unwrap();
-        let p_term = term.union(&satisfier.negate());
-        if p_term != Term::any() {
-            prior_cause_terms.insert(package.clone(), p_term);
+        let mut incompat1 = incompat.package_terms.clone();
+        let mut incompat2 = satisfier_cause.package_terms.clone();
+        let t1 = incompat1.remove(package).unwrap();
+        let t2 = incompat2.remove(package).unwrap();
+        let mut prior_cause_terms = Self::intersection(&incompat1, &incompat2);
+        let term = t1.union(&t2);
+        if term != Term::any() {
+            prior_cause_terms.insert(package.clone(), term);
         }
         Self {
             id,

(3) When making progress on the history of assignments, we are keeping track of the terms of the incompatibility that get satisfied. This means that we could record decision levels when those events occur. And when finally getting to the satisfier, comparing its decision level with the one of the last package term to get satisfied. Technically, this would not "exactly" be the previous satisfier, since the actual previous satisfier could correspond to the same package that the satisfier in some cases. But the guarantee of the incompatibility being almost satisfied would still hold! So I'd be super interested to see if that would make the algorithm still work, or if there is some special thing the previous satisfier has, that I didn't fully understand yet.

Allocating more than we need to in `DerivationTree::Derived`

In v0.2 DerivationTree::Derived uses a Box to point to its children, which makes it strictly a tree. However any time there is a shared_id then the tree stores in memory to full copies of the shared portion. It is more accurately a DAG. We should consider storing each cause in a RC (or maybe an ARC) so that nodes that have the same shared_idare stored in the same place in memory.

support for prereleases

Is the plan for handling prereleases? The example version::SemanticVersion does not currently support pre-releases at all.

It seems that the major challenge with prereleases -- the one worth taking into account because it would affect the design in general -- is that an end user will want to be able to influence the solve to ignore pre-releases entirely via the solver method. There doesn't seem to be an affordance for that currently.

Or am I thinking about this incorrectly? Is this really the responsibility of the DependencyProvider?

Idea to explore: incompatibility knowledge database

In some pathological cases, it's possible the solver takes much longer to find a solution. For any given dataset of packages and versions, and a given dependency provider, it is easy to find those edge case by trying to solve all packages.

What's interesting, is that once a package is solved, we have at our disposal the full set of incompatibilities that were recorded. And since incompatibilities are supposed to be always true (at least during one solve call) we could try to identify if injecting one/some of these incompatibilities directly in the dependency solver could dramatically improve the solving time. And if so, it would be interesting to see how many of these could be stored in some kind of database available to the dependency solver. These "key" incompatibilities could be useful at the package level, or even at the ecosystem level.

Of course, if we want to record incompatibilities outliving a single solve call, we'd have to be extra careful of what could be recorded. For example, it should avoid recording incompatibilities referencing to any future version, as we don't know what the future may hold. But even those could be "truncated" if needed.

I just wanted to record this idea, for future self or anyone wanting to explore it.

Additional Constraints

Im (still) working on using PubGrub to solve Conda packages. I've made a lot of progress but currently run into the problem of "constraints".

Conda packages can have two types of dependencies: regular dependencies and constraints. Regular dependencies are packages that need to be included in the solution but constraints are dependencies that don't have to be included, but if they are (via the dependency of another package), have to adhere to the constraints. I think this is similar to peer dependencies in NodeJS.

I initially modeled these as regular dependencies which I could strip from the solution after solving. However, sometimes these constraints are used to ensure a certain other package is not selected. For instance, python packages that are incompatible with pypi often have the constraint: pypy <0a0. There is no version compatible with this since it denotes the lowest possible version. Using this as a dependency results in an error DependencyOnTheEmptySet which makes sense: that's exactly what the user intends.

Is this something that could be supported with pubgrub? Is this something I could potentially add?

Rename v0.3 Range into BoundedRange

I'm currently writing the v0.2 -> v0.3 upgrade guide and realizing we still have the name Range for our default impl of VersionSet. However, this is not exactly the same as the previous v0.2 Range.
So to prevent user confusion, and also make the upgrade guide clearer, I'd suggest we rename it to BoundedRange.
That's also more precise since it's a range built with inclusive/exclusive bounds.

Fix CI for conventional commits

This project insists on using conventional commits for everything on the dev and release branches. We attempt to check this in CI. The current CI code checks that every commit to a PR or on dev or release follow the style. We also generally use a "squash and merge" merging strategy.

This doesn't work for at least three reasons:

  • PR's have red CI, if there commits don't follow the style, even though those commits will be squash away after merge.
  • When doing a squash and merge, the maintainer gets to pick a new name. But nothing checks the new name follows conventional commits.
  • After squash and merge, CI flags nonconventional names. But at that point the only thing that can be done is force pushing to dev, which is worse than the problem it's trying to solve.

I think this can be fixed by the following:

  1. Turn on merge queues. This feature is intended to allow testing multiple PR's at the same time in the exact state they will be after merging. We don't need the additional parallelism, but we do need CI to run after the commit has been squashed.
  2. Reconfigure CI to run on pushes to PR and being added to merge queue.
  3. Set the conventional commits to only run in a merge queue.

It would be wonderful to have someone test the proposed configuration in some other repo to make sure it works, and then submit a PR to change CI here.

If we don't find a good solution, I would like to remove the conventional commits checking. Force pushing to dev is worse than having to build the first draft of the change log manually.

Do not implement Package trait automatically

I've just received this by email and thought I'd put it here to have more feedback. What do you think?


Implementing a trait for all other types that implement something is a
really bad design decision because it makes all types available to an
interface, but most types are not relevant for that interface.

That's especially bad in a case where the trait (here Package) has
functions associated with it. This is not the case here, as Package
is a simple marker trait, so not too bad.

Still, making users of the library implement this is much better.

Guidance for adding graph location aware exclusions and overrides

I have reached out to the pub folks but thought I'd try you folks as well since you are very, very familiar with PubGrub.

I have ported the PubGrub solver from pub/Dart to Java with Maven and NPM artifacts in order to explore version solving in a personal project. I have the solver working and passing all relevant unit tests including functional tests that go out to Maven Central and the NPM registry. What is missing is support for Maven's exclusions and NPM overrides. I plan on using these on the artifacts as filters before supplying the artifacts to the solver (i.e. incompatiblitiesFor). The problem is that these constructs depend on the an artifacts path in the dependency graph. For example, an exclusion specified in the dependency A applies to any transitive dependency of A no matter how deep in the graph it occurs. So in order to know if an exclusion applies to a transitive dependency, I need to know the location in the dependency graph that is causing that dependency to be considered. After much investigation into the source code, I still cannot figure out how to obtain that graph and path during solving. Obviously the graph and thus the path will change as new assignments are made and when backtracking. The PubGrub implementation in pub supports overrides but they are global and independent of an artifact's location in the dependency graph. Any guidance on how to proceed would be greatly appreciated. I plan to make my repository public once I get this functionality implemented, so any help will likely help others as they consider using PubGrub in their projects.

Thanks in advance for any help!

Negative dependencies/conflicts?

How can conflicts be implemented via this pubgrub package?

e.g. package A can only be installed if package B isn't there, or package A can only be installed if package B is at least version 5, but isn't a dependency of package A

perf: keep directly the intersection of derivations in Memory

Memory stores a Vec of derivations per package. If we are to evaluate potential packages in a smarter way #32 , we need the intersection of derivation terms.

There are two methods in Memory that can return iterators to individual assignment terms, terms_for_package and potential_packages. The latter is the use case we are discussing for performance improvements in #32 . And the former is to compute relations in incompatibilities that eventually ends up computing the intersection of those terms. Considering that we seem to spend much time in relation computation #27 , this could be a non-negligible speedup.

Large files in the repository

Currently, the compressed files stored in the repository amount for 1.3 MB of data. Uncompressed, that data is partly due to the source code of this repository, and partly due to assets such as in the gh-pages branch. To understand the repartition of that data, let's have a look at the uncompressed data in the gh-pages and new-benchmark branches.

gh-pages

That branch contains the API docs generated by cargo doc that we are publishing on https://pubgrub-rs.github.io/pubgrub/pubgrub/ for people to have a look at the current API docs on the dev branch without having to clone the repo. Removing the .git/ repository, that data amounts to 2.6 MB.

It is significant compared to text files of the source code. I'd prefer if GitHub would deal with pages the same way Gitlab does, as build artifacts of CI. Unfortunately, it has to be in-tree. For this reason, I've setup the gh-pages action to always reset the gh-pages branch as an orphan branch and replace the commit instead of pushing new commits on top.

Another option would be to use an external service for our static pages such as documentation. I've used netlify for a while now and really like them. I see that there are actions to deploy to netlify so we could do that instead: https://github.com/marketplace/actions/netlify-deploy.

new-benchmark

Currently, the uncompressed data amounts to 460 KB, with 184 KB being the ron example files, so a bit more than a third of it. Those files are only needed when working on optimizations, so not useful for most users and contributors. If we keep gh-pages in tree, this is marginal. If we decide to move pages outside of the repo however, it becomes non-negligible and there are different ways we could deal with it.

First, we could just remove them and share them between each other by messages / emails. crowd screaming in the back Ok well not a very good idea ^^.

Alternatively, we could use a git submodule. This is another repo, which is declared in the dependencies of this repo, so if we clone with the --recursive option we get also retrieve the submodules. Managing the files in the submodules is a bit different than in the root repository, but if you are not familiar with git submodules, and if we choose this option at some point, I can give some advice on how to use them easily without having to manage two separate repositories on your file system.

Another option is using a git LFS provider. LFS (Large File Storage for git) seems to be the most adequate solution for this since file versions can be tracked (as with submodules) but in a way that is optimized for large files. However, it can be challenging to use correctly on CI, and there is a 1GB monthly bandwidth limit on the free tier in GitHub. That bandwidth can easily be exceeded with multiple forks / clones and CI misconfigurations. So LFS isn't the right fit for us I think.

Diversity of benchmarks

Whether we choose to move large benchmark files outside of the root repository or not, we should still evaluate if those benchmarks are useful at all. In issue #19, I made an analysis of the shape of the benchmark data for the first large index. It revealed for example that the resolution mostly involved generating incompatibilities from dependencies, that there was very few conflicts, and other things I didn't put there such as almost no dependencies could be regrouped (if a:1 depends on b:1, and a:2 depends on b:1, then a:1 union a:2 depends on b:1). It is possible that the two new benchmarks have exactly the same profile, and in such case, only one of the three is actually needed.

Being able to do some analysis of the registries generated by proptest is something that would be great to have before adding new benchmark data to the repository in my opinion.

Unexpected display formatting for ranges

Hello!

I found the printing of this range surprising.

let one = MyVersion::new(1, 0, 0);
let two = MyVersion::new(2, 0, 0);
let three = MyVersion::new(3, 0, 0);
let range = Range::higher_than(one)
    .intersection(&Range::strictly_lower_than(two))
    .union(&Range::higher_than(three));
println!("{}", range);

The output here is [ 1.0.1, 2.0.0 [ [ 4.0.0, ∞ [, which is an unfamiliar syntax to me and I think my users will find it confusing. I was expecting something closer to > 1.0.0 and < 2.0.0 or > 4.0.0
What syntax is this? Would a more widely known syntax be preferable?

Thank you,
Louis

Other functionality from the Dart implementation

The Dart implementation has some functionality that is not in the blog post. I do not know the the Dart implementation thoroughly, so if I missed something feel free to add to these notes.

  • "features" (in cargo) are a form of build flag that can enable optional dependencies. They were added to the dart implementation but where never exposed in the user interface.
  • "replacements" (in cargo the patch section and the replace) are a way to substitute the code for one package with some other code. In dart it is called dependency-overrides.
  • "pre-release" (in cargo) is part of the semver spec that allows versions like 1.0.0-alpha. I don't know how dart supports this but I believe it does.
  • "lockfiles" is a way to encode and reuse the output from a previous resolution, and even if the dependencies have changed keep things as similar as possible. In Cargo this is done by a set of heuristics that are part of the dependency provider, and it is a mess. In dart this is somehow worked into the resolver.

This is all functionality that Cargo and Dart have where the next steps for exploring is "look at how Dart did it". Even if Darts way of doing things does not work for Cargo it is probably worth making available in the library for other use cases.
I will be writing up a separate issue for functionality that Cargo has that Dart does not.

Support for concurrent metadata fetching

Fetching metadata synchronously and blocking on its resolution is bound to be extremely slow. In Orogene, our resolver is able to optimistically parallelize dependency metadata fetches before actual placement, so you can have e.g. 50 different packages looking up version information while the resolver works on the data that's already fetched.

The perf benefits of this are enormous.

Anyway just creating this issue on @Eh2406's request :)

External::NotRoot never constructed.

Looking at some converge data it looks like we never hit

Kind::NotRoot(package, version) => {
DerivationTree::External(External::NotRoot(package.clone(), version.clone()))
}

This is because conflict_resolution loops until is_terminal is true

/// Check if an incompatibility should mark the end of the algorithm
/// because it satisfies the root package.
pub fn is_terminal(&self, root_package: &P, root_version: &V) -> bool {

That is to say that we stop when we have a proof that the root package cannot be built, but do not include the last flourish of the proof that "we are required to build the root package because that's what we were asked to build". If I remember reading the original blog post correctly skipping this piece of mostly redundant information was one of the recommendations, so we could decide that this is a feature and not a bug. If we do then the next breaking release should probably remove that type from the reporting infrastructure.

SPDX license identifiers

Currently our source files contain full notice recommended by Mozilla in the beginning:

// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at http://mozilla.org/MPL/2.0/.

I propose to change the headers to use SPDX license identifier instead, which looks like this:

// SPDX-License-Identifier: MPL-2.0

This has advantages of:

  1. Following an established standard (https://spdx.dev/).
  2. Being human and machine readable.
  3. Being shorter.

I can implement the change if there are no objections from the team.

Solving slows down dramatically when testing hundreds of versions

When testing many consecutive versions of a given package, the cost of computing compatible versions seems to increase non-linearly over time.

Concretely, running this example slows down noticeably for me at around 100 iterations, and grinds to a ~halt in the 200s:

use pubgrub::error::PubGrubError;
use pubgrub::range::Range;
use pubgrub::report::{DefaultStringReporter, Reporter};
use pubgrub::solver::{OfflineDependencyProvider, resolve};
use pubgrub::version::SemanticVersion;

fn main() {
    let mut dependency_provider = OfflineDependencyProvider::<&str, SemanticVersion>::new();

    // root depends on foo...
    dependency_provider.add_dependencies(
        "root", (1, 0, 0),
        vec![
            ("foo", Range::any()),
        ],
    );

    for i in 1..1000 {
        dbg!(i);

        // foo depends on bar...
        dependency_provider.add_dependencies(
            "foo", (i, 0, 0),
            vec![
                ("bar", Range::any()),
            ],
        );

        match resolve(&dependency_provider, "root", (1, 0, 0)) {
            Ok(sol) => println!("{:?}", sol),
            Err(PubGrubError::NoSolution(mut derivation_tree)) => {
                derivation_tree.collapse_no_versions();
                eprintln!("{}", DefaultStringReporter::report(&derivation_tree));
            }
            Err(err) => panic!("{:?}", err),
        };
    }
}

I suspect that this may have to do with the increased size of the intersected versions. For example, after testing and rejecting Django 5.0a1, we produce a term like:

[ 0a0.dev0, 5.0a1 [  [ 5.0a1.post0.dev0, ∞ [

We then test Django 4.2.6, which gives us:

[ 0a0.dev0, 4.2.6 [  [ 4.2.6.post0.dev0, ∞ [

We then take the intersection of these terms, which gives us:

[ 0a0.dev0, 4.2.6 [  [ 4.2.6.post0.dev0, 5.0a1 [  [ 5.0a1.post0.dev0, ∞ [

This continues until we have a range like:

[ 0a0.dev0, 4.2rc1 [  [ 4.2rc1.post0.dev0, 4.2 [  [ 4.2.post0.dev0, 4.2.1 [  [ 4.2.1.post0.dev0, 4.2.2 [  [ 4.2.2.post0.dev0, 4.2.3 [  [ 4.2.3.post0.dev0, 4.2.4 [  [ 4.2.4.post0.dev0, 4.2.5 [  [ 4.2.5.post0.dev0, 4.2.6 [  [ 4.2.6.post0.dev0, 5.0a1 [  [ 5.0a1.post0.dev0, ∞ [

If you're testing hundreds of versions, the terms continue to grow, and I suspect this leads to some kind of quadratic behavior, since we're increasing the number of terms linearly with the number of tested versions.

The intersections aren't "wrong", and it's possible that we're doing something wrong with our version formulations -- but could we take advantage of the fact that we know there are no versions in these partial ranges?

For example, given [ 0a0.dev0, 5.0a1 [ [ 5.0a1.post0.dev0, ∞ [ and [ 0a0.dev0, 4.2.6 [ [ 4.2.6.post0.dev0, ∞ [, it would be preferable to reduce to [ 0a0.dev0, 4.2.6 [, if I'm understanding the syntax correctly.

Benchmarking tools

Performance improvements will need some benchmarking. Things are being setup for that, but for the time being we have one big registry generated for tests, that has been committed in the repository. So I'll use this one as support for the commands listed below.

Usually, performance improvement work goes uses a cycle like this one

  1. Measure
  2. Profile
  3. Optimize
  4. back to 1.

To measure timing, we can setup benchmarks. Criterion is good for micro-benchmark and supported on stable. The default cargo-bench is good enough for bigger benchmarks but requires nightly. We'll use the later for our big registry.
Profiling enables a more granular view of which functions take the most time. We usually profile both CPU and memory. Once the culprit have been identified, it is time to do some optimization and back to measure!

Measure

First we need to add debug symbols to our compilation profiles, so that the binary we generate for benchmarks can be profiled later. So we add the following in Cargo.toml.

# in Cargo.toml
[profile.release]
debug = true

[profile.bench]
debug = true

Then, we will be running the large_case benchmark available in the repo. It needs serde to load the registry configuration which is stored as data in a Ron file.

cargo +nightly bench large_case --features=serde

It will print to stdout the name of the generated executable for the benchmark, something like:

     Running target/release/deps/large_case-96ac47771292fd35

It will also print the timings results, we should note them to see if we can make improvements later.

Profiling

Using flamegraph

flamegraph -o flamegraph.svg target/release/deps/large_case-96ac47771292fd35

pubgrub-flamegraph

Using perf

sudo perf record target/release/deps/large_case-96ac47771292fd35
sudo perf report

pubgrub-perf

Using valgrind + callgrind + KCachegrind

valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes \
    --simulate-cache=yes target/release/deps/large_case-96ac47771292fd35

That generates a callgrind.out.<pid> file that we can then open with KCachegrind

pubgrub-kcachegrind

Error reporting summaries

While solving all elm packages, I stumbled upon quite a lot of unsolvable packages, for different reasons. I've classified those errors in different categories and I think it would be really nice if the error reporter was able to give one such class of high level problem before diving into the detailed tree. Here are some examples of potential categories:

  • Direct/Indirect dependency on a non-existing package
  • Indirect dependency on two incompatible versions of the same package
  • Indirect dependency on another version of itself

Better tools for testing `Ord` and `VersionSet`

Mostly a follow-up to this comment:
#108 (comment)

But we should provide some way for a user to run "all the tests we can think of" on their types. Not only that we should be using those tests on our types. And they should be documented in the user guide.

May be functions like tests_on_a_version<VS: VersionSet>(v: VS::V) , tests_on_two_versions<VS: VersionSet>(v1: VS::V, v2: VS::V), tests_on_a_set<VS: VersionSet>(vs: VS), etc. that panic if they find a property that doesn't hold. On the other hand that doesn't give a good error message about which property didn't hold.

So maybe a set of functions not_contained_in_empty<VS: VersionSet>(v: VS::V), contained_in_full<VS: VersionSet>(v: VS::V), etc. that panic if a particular property is violated. But then it can be annoying for our users to make one test per property. Maybe we can solve this with a macro somehow?

Panics and infinite loops

Hello! pubgrub is super cool, thanks for working on it! I hacked together a python package resolver: https://github.com/njsmith/posy/blob/main/src/resolve.rs

It seems to work well for simple cases. However, I've been doing some experiments with feeding my resolver intentionally conflicting constraints, to see what happens, and I've been hitting panics and infinite loops inside pubgrub. Not sure if these are my fault or not, and I haven't managed to reproduce it with a more minimal setup yet, so idk, might not even be your bug. But I figured I'd post here in case you have any ideas :-)

Here's requesting trio == 0.19 and attrs == 19.1, which is impossible because trio v0.19 has a dependency on attrs >= 19.2.0. But pubgrub panics with "this must be a decision":

❯ RUST_BACKTRACE=1 cargo run "trio == 0.19" "attrs == 19.1"
    Finished dev [unoptimized + debuginfo] target(s) in 0.08s
     Running `target/debug/posy 'trio == 0.19' 'attrs == 19.1'`
user agent: posy/0.1.0 {"ci":null,"cpu":"x86_64","installer":{"name":"posy","version":"0.1.0"},"user_data":null}
----> pubgrub called choose_package_version
-> For <root>, allowed range is: 0
Chose package <root>; now let's decide which version
<---- decision: root package magic version 0
----> pubgrub called get_dependencies <root> v0
adding dependency: trio 0.19
adding dependency: attrs 19.1
<---- dependencies complete
----> pubgrub called choose_package_version
-> For attrs, allowed range is: 19.1
-> For trio, allowed range is: 0.19
Chose package attrs; now let's decide which version
Version 21.2.0 is out of range
Version 20.3.0 is out of range
Version 20.2.0 is out of range
Version 20.1.0 is out of range
Version 19.3.0 is out of range
Version 19.2.0 is out of range
Considering attrs v19.1.0
Fetching metadata from https://files.pythonhosted.org/packages/23/96/d828354fa2dbdf216eaa7b7de0db692f12c234f7ef888cc14980ef40d1d2/attrs-19.1.0-py2.py3-none-any.whl
<---- decision: attrs v19.1.0
----> pubgrub called get_dependencies attrs v19.1.0
<---- dependencies complete
----> pubgrub called choose_package_version
-> For trio, allowed range is: 0.19
Chose package trio; now let's decide which version
Considering trio v0.19.0
Fetching metadata from https://files.pythonhosted.org/packages/35/c3/5a4befc3812b3b606e0ae9338bfdd02da3ad0a90df27dc66c37315e94f5c/trio-0.19.0-py3-none-any.whl
<---- decision: trio v0.19.0
----> pubgrub called get_dependencies trio v0.19.0
adding dependency: attrs 19.2.0 <= v
adding dependency: sortedcontainers ∗
adding dependency: async-generator 1.9 <= v
adding dependency: idna ∗
adding dependency: outcome ∗
adding dependency: sniffio ∗
<---- dependencies complete
thread 'main' panicked at 'This must be a decision', /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/partial_solution.rs:410:17
stack backtrace:
   0: std::panicking::begin_panic
             at /home/njs/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:519:12
   1: pubgrub::internal::partial_solution::PackageAssignments<P,V>::satisfier
             at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/partial_solution.rs:410:17
   2: pubgrub::internal::partial_solution::PartialSolution<P,V>::find_satisfier
             at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/partial_solution.rs:330:17
   3: pubgrub::internal::partial_solution::PartialSolution<P,V>::satisfier_search
             at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/partial_solution.rs:283:29
   4: pubgrub::internal::core::State<P,V>::conflict_resolution
             at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/core.rs:173:58
   5: pubgrub::internal::core::State<P,V>::unit_propagation
             at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/core.rs:139:52
   6: pubgrub::solver::resolve
             at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/solver.rs:95:9
   7: posy::resolve::resolve
             at ./src/resolve.rs:29:18
   8: posy::main
             at ./src/main.rs:78:9
   9: core::ops::function::FnOnce::call_once
             at /home/njs/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Or, here's a very similar case -- trio >= 0.17, attrs == 19.1. Again, all versions of trio >= 0.17 have a dependency on attrs >= 19.2, so this is unsatisfiable. In this case, I get an infinite loop:

❯ cargo run "trio >= 0.17" "attrs == 19.1"
    Finished dev [unoptimized + debuginfo] target(s) in 0.10s
     Running `target/debug/posy 'trio >= 0.17' 'attrs == 19.1'`
user agent: posy/0.1.0 {"ci":null,"cpu":"x86_64","installer":{"name":"posy","version":"0.1.0"},"user_data":null}
----> pubgrub called choose_package_version
-> For <root>, allowed range is: 0
Chose package <root>; now let's decide which version
<---- decision: root package magic version 0
----> pubgrub called get_dependencies <root> v0
adding dependency: trio 0.17 <= v
adding dependency: attrs 19.1
<---- dependencies complete
----> pubgrub called choose_package_version
-> For attrs, allowed range is: 19.1
-> For trio, allowed range is: 0.17 <= v
Chose package attrs; now let's decide which version
Version 21.2.0 is out of range
Version 20.3.0 is out of range
Version 20.2.0 is out of range
Version 20.1.0 is out of range
Version 19.3.0 is out of range
Version 19.2.0 is out of range
Considering attrs v19.1.0
Fetching metadata from https://files.pythonhosted.org/packages/23/96/d828354fa2dbdf216eaa7b7de0db692f12c234f7ef888cc14980ef40d1d2/attrs-19.1.0-py2.py3-none-any.whl
<---- decision: attrs v19.1.0
----> pubgrub called get_dependencies attrs v19.1.0
<---- dependencies complete
----> pubgrub called choose_package_version
-> For trio, allowed range is: 0.17 <= v
Chose package trio; now let's decide which version
Considering trio v0.19.0
Fetching metadata from https://files.pythonhosted.org/packages/35/c3/5a4befc3812b3b606e0ae9338bfdd02da3ad0a90df27dc66c37315e94f5c/trio-0.19.0-py3-none-any.whl
<---- decision: trio v0.19.0
----> pubgrub called get_dependencies trio v0.19.0
adding dependency: attrs 19.2.0 <= v
adding dependency: sortedcontainers ∗
adding dependency: async-generator 1.9 <= v
adding dependency: idna ∗
adding dependency: outcome ∗
adding dependency: sniffio ∗
<---- dependencies complete
----> pubgrub called choose_package_version
-> For trio, allowed range is: [ 0.17, 0.19.0 [  [ 0.19.0.post1.dev0, ∞ [
Chose package trio; now let's decide which version
Version 0.19.0 is out of range
Considering trio v0.18.0
Fetching metadata from https://files.pythonhosted.org/packages/5e/ce/1a6e875838058e9df989247ee339daa3d79cec599182a1a836ee1aa74750/trio-0.18.0-py3-none-any.whl
<---- decision: trio v0.18.0
----> pubgrub called get_dependencies trio v0.18.0
adding dependency: attrs 19.2.0 <= v
adding dependency: sortedcontainers ∗
adding dependency: async-generator 1.9 <= v
adding dependency: idna ∗
adding dependency: outcome ∗
adding dependency: sniffio ∗
<---- dependencies complete
----> pubgrub called choose_package_version
-> For trio, allowed range is: [ 0.17, 0.18.0 [  [ 0.18.0.post1.dev0, 0.19.0 [  [ 0.19.0.post1.dev0, ∞ [
Chose package trio; now let's decide which version
Version 0.19.0 is out of range
Version 0.18.0 is out of range
Considering trio v0.17.0
Fetching metadata from https://files.pythonhosted.org/packages/c6/92/46157361bc005fa4a4d190b44ed60377f695dc38d53fc339631eb97fe78a/trio-0.17.0-py3-none-any.whl
<---- decision: trio v0.17.0
----> pubgrub called get_dependencies trio v0.17.0
adding dependency: attrs 19.2.0 <= v
adding dependency: sortedcontainers ∗
adding dependency: async-generator 1.9 <= v
adding dependency: idna ∗
adding dependency: outcome ∗
adding dependency: sniffio ∗
<---- dependencies complete
----> pubgrub called choose_package_version
-> For trio, allowed range is: [ 0.17.0.post1.dev0, 0.18.0 [  [ 0.18.0.post1.dev0, 0.19.0 [  [ 0.19.0.post1.dev0, ∞ [
Chose package trio; now let's decide which version
Version 0.19.0 is out of range
Version 0.18.0 is out of range
Version 0.17.0 is out of range
Version 0.16.0 is out of range
Version 0.15.1 is out of range
Version 0.15.0 is out of range
Version 0.14.0 is out of range
Version 0.13.0 is out of range
Version 0.12.1 is out of range
Version 0.12.0 is out of range
Version 0.11.0 is out of range
Version 0.10.0 is out of range
Version 0.9.0 is out of range
Version 0.8.0 is out of range
Version 0.7.0 is out of range
Version 0.6.0 is out of range
Version 0.5.0 is out of range
Version 0.4.0 is out of range
Version 0.3.0 is out of range
Version 0.2.0 is out of range
Version 0.1.0 is out of range
Version 0.0.0 is out of range
<---- decision: no versions of trio in range

[ program hangs until killed ]

Here's a few random backtraces taken from the infinite loop:

<---- decision: no versions of trio in range
^C
Program received signal SIGINT, Interrupt.
core::ptr::const_ptr::<impl *const T>::is_null (self=0x8)
    at /home/njs/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/const_ptr.rs:37
37	    pub const fn is_null(self) -> bool {
(gdb) bt
#0  core::ptr::const_ptr::<impl *const T>::is_null (self=0x8)
    at /home/njs/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/const_ptr.rs:37
#1  0x0000555555ad9353 in core::slice::iter::Iter<T>::new (slice=...)
    at /home/njs/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/iter.rs:92
#2  0x0000555555acd8b4 in core::slice::<impl [T]>::iter (self=...)
    at /home/njs/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/mod.rs:705
#3  0x00005555557c9e65 in <T as alloc::slice::hack::ConvertVec>::to_vec (s=..., 
    alloc=...)
    at /home/njs/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/slice.rs:194
#4  0x00005555557afc5b in alloc::slice::hack::to_vec (s=..., alloc=...)
    at /home/njs/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/slice.rs:163
#5  0x00005555557cb74b in alloc::slice::<impl [T]>::to_vec_in (self=..., alloc=...)
    at /home/njs/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/slice.rs:477
#6  0x00005555556ef5f5 in <alloc::vec::Vec<T,A> as core::clone::Clone>::clone (
    self=0x7fffffff0128)
    at /home/njs/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:2336
#7  0x00005555557ca443 in <pep440::Version as core::clone::Clone>::clone (
    self=0x7fffffff0110)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pep440-0.1.1/src/lib.rs:101
#8  0x00005555557e2a32 in <posy::vocab::version::Version as core::clone::Clone>::clone (
    self=0x7fffffff0110) at src/vocab/version.rs:9
#9  0x00005555557c58c6 in <T as alloc::borrow::ToOwned>::to_owned (self=0x7fffffff0110)
    at /home/njs/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/borrow.rs:87
#10 0x00005555557d2144 in pubgrub::range::Range<V>::intersection (self=0x7fffffff06f8, 
    other=0x7fffffff00b8)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/range.rs:214
#11 0x00005555557335d7 in pubgrub::term::Term<V>::intersection (self=0x7fffffff06f0, 
    other=0x7fffffff0898)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/term.rs:84
#12 0x00005555557d661d in pubgrub::internal::partial_solution::PackageAssignments<P,V>::satisfier (self=0x555556467270, package=0x7ffff739a048, incompat_term=0x7ffff739a0a8, 
    start_term=..., store=0x7fffffff4330)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/partial_solution.rs:391
#13 0x00005555557d4faf in pubgrub::internal::partial_solution::PartialSolution<P,V>::find_satisfier (incompat=0x7ffff7399e90, package_assignments=0x7fffffff4308, 
    store=0x7fffffff4330)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/partial_solution.rs:330
#14 0x00005555557d53b6 in pubgrub::internal::partial_solution::PartialSolution<P,V>::satisfier_search (self=0x7fffffff4308, incompat=0x7ffff7399e90, store=0x7fffffff4330)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/partial_solution.rs:283
#15 0x00005555557d7882 in pubgrub::internal::core::State<P,V>::conflict_resolution (
    self=0x7fffffff4218, incompatibility=...)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/core.rs:173
#16 0x00005555557d71ee in pubgrub::internal::core::State<P,V>::unit_propagation (
    self=0x7fffffff4218, package=...)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/core.rs:139
#17 0x0000555555733f0f in pubgrub::solver::resolve (dependency_provider=0x7fffffffba10, 
    package=..., version=...)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/solver.rs:95
#18 0x00005555557feee4 in posy::resolve::resolve (requirements=0x7fffffffd990, 
    env=0x7fffffffdad8, index=0x7fffffffd860, preferred_versions=0x7fffffffdb10, 
    consider_prereleases=...) at src/resolve.rs:29
#19 0x00005555557938c6 in posy::main () at src/main.rs:78
(gdb) 
#0  pubgrub::range::Range<V>::intersection (self=0x7fffffff0348, other=0x7fffffff08a0)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/range.rs:236
#1  0x000055555573364d in pubgrub::term::Term<V>::intersection (self=0x7fffffff06f0, 
    other=0x7fffffff0898)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/term.rs:87
#2  0x00005555557d661d in pubgrub::internal::partial_solution::PackageAssignments<P,V>::satisfier (self=0x5555564676e0, package=0x7ffff6c0f418, incompat_term=0x7ffff6c0f478, 
    start_term=..., store=0x7fffffff4330)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/partial_solution.rs:391
#3  0x00005555557d4faf in pubgrub::internal::partial_solution::PartialSolution<P,V>::find_satisfier (incompat=0x7ffff6c0f410, package_assignments=0x7fffffff4308, 
    store=0x7fffffff4330)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/partial_solution.rs:330
#4  0x00005555557d53b6 in pubgrub::internal::partial_solution::PartialSolution<P,V>::satisfier_search (self=0x7fffffff4308, incompat=0x7ffff6c0f410, store=0x7fffffff4330)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/partial_solution.rs:283
#5  0x00005555557d7882 in pubgrub::internal::core::State<P,V>::conflict_resolution (
    self=0x7fffffff4218, incompatibility=...)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/core.rs:173
#6  0x00005555557d71ee in pubgrub::internal::core::State<P,V>::unit_propagation (
    self=0x7fffffff4218, package=...)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/internal/core.rs:139
#7  0x0000555555733f0f in pubgrub::solver::resolve (dependency_provider=0x7fffffffba10, 
    package=..., version=...)
    at /home/njs/.cargo/registry/src/github.com-1ecc6299db9ec823/pubgrub-0.2.1/src/solver.rs:95
#8  0x00005555557feee4 in posy::resolve::resolve (requirements=0x7fffffffd990, 
    env=0x7fffffffdad8, index=0x7fffffffd860, preferred_versions=0x7fffffffdb10, 
    consider_prereleases=...) at src/resolve.rs:29
#9  0x00005555557938c6 in posy::main () at src/main.rs:78

Any suggestions?

Support for more complex version constraints

Perhaps this is (understandably) outside the scope of this crate, but I figured I'd ask.

I'm trying to implement a solver for PEP 440, which unfortunately does not quite boil down to simple ranges. The versions do have a total order, but the constraints have some extra requirements. For example, >1 matches 1.1, 2.0, etc, but it does not match 2.0+foo despite that version being "greater". Is there any way to embed some additional arbitrary constraints into the version selection process?

Thanks for any help!

Don't require a clone in Solver's methods

Is there a way to avoid the clones and allocation in https://github.com/mpizenberg/elm-test-rs/blob/31596f874a6e9340350de3e7c783a5cde9ca396b/src/solver.rs#L439? In Cargo we use Rc<Vec<Summary>> https://github.com/rust-lang/cargo/blob/07162dbaa84aaad97d6f3be5069265e784179bcd/src/cargo/core/resolver/dep_cache.rs#L35, so that a cache hit is and Rc clone instead of a Vec clone. I wonder if there is a trate bound that works for the algorithm and allows returning Vec<>, Rc<Vec<_>>, im::Vector<_>, or even a leaked &'static [_]?

Originally brought up in #15 (comment)

SemanticVersion support for serde not documented

It would be great to document the serde feature for SemanticVersion.

Also, is there a reason for defining SemanticVersion as opposed to using the existing Semver crate which provides a full Semver2.0 implementation (SemanticVersion does not)?

Crazy Entity Component System (ECS) idea

Let first acknowledge that I know nothing about all that follows and that this is mostly a crazy idea induced by extended excitement that I needed to get out of my head. As such I'll close this issue just after opening it.

I've heard a lot of good things in game dev about ECS for flexible and efficient (thanks to data locality) modelization of game mechanics. Aaaand, if you squint a lot, we have a game loop, entities data (assignments and incompatibilities) and components processing units (conflict resolution, unit propagation)?

That may be very far fetched, and not practical, and actually not correct since I don't know a thing about ECS except the vague concept, but hey, any idea can be useful right?

Benchmark on crates.io index

It is a bit far from now, but one thing that would be great to optimize "real-world" performances would be to use "real-world" packages registries. For example, we could build the index of crates.io and make a benchmark that would solve dependencies for the last version of every crate. It would take into account the performances of both the pubgrub implementation and a dependency retriever implementation.

Solving dependencies of crates.io is not possible for the time being though since we don't handle all the possibilities that cargo enables. But we could probably simplify the index in ways that enables benchmarking even if we don't support everything yet. Features could for example be removed from the index. We could consider only versions below 1.0 if we don't want to handle multiple major. Remove packages using pre-releases or replacements etc.

Alternatively, we could start with simpler indexes from other programming languages. I think for example that we could already solve elm packages. I don't know about others.

Ultimate test would probably be solving dependencies of all NPM packages ahah. That could be a blog post XD.

Check the correctness of our errors in tests

PubGrub has good error messages, in fact that's one of its main selling points. When resolution fails you get a rich object with all the information to explain why that failure occurred. Specifically it is a derivation tree that proves that the requested resolution is impossible. Each node make some statement about a set of packages that cannot simultaneously be selected and points to two other nodes that when combined proof this note statement. The leaf nodes are statements provided to the solver by the dependency provider. The root node claims that "the requested package cannot be selected". If all of the base facts are true and all of the deductions are valid then this is a valid proof of unsatisfiedability of the requested resolution. But how can we verify that "if"?

Existing tests:

  • We have tests that model the entire resolution problem in a SAT solver. We use this to verify the high order bit. Namely that if pubgrub found a solution then the SAT solver except that solution, or if pubgrub did not find a solution then the SAT solver also do not find a solution. This tells us that we errored on the right occasions but nothing about the contents of our error object.

Upcoming tests:

  • #129, extends this by looking at our error object to see which facts it depends on then only provides these facts to the SAT solver. Is careful to only remove facts where removing facts make more problems solvable. If the SAT solver cannot find a solution then we know that the error object correctly used a set of facts sufficient to prove the resolution problem impossible. But it tells us nothing about deductions are actually valid.

What else could we be testing:

  • "Each node make some statement about a set of packages that cannot simultaneously be selected" For every node this should be easy enough to check. In the SAT solver configured to represent the entire dependency resolution problem, for each node one at a time we can encode the term in the node and verify that it is unsat. This verifies that every term in our proof if in fact making a true statement of incompatibility.
  • "all of the base facts are true" this should also be easy to verify. Check with the dependency provider that the statement is true.
  • "all of the deductions are valid" this is the hardest to verify. I think we can do it. I think a valid deduction means that any selection that satisfies the node would also satisfy one of its direct children. For each node in a new SAT context encode the nodes statement and encode the statement for each of the nodes direct children, then ask the SAT solver if there's any way to satisfy the root without satisfying either of the children.

DependencyOnTheEmptySet

For the Conda solver I build I implemented a VersionSet that builds a bitset of applicable version from a Conda specific range specifier. I can do this because I can know upfront exactly which versions are available of any given package. This makes implementing complement, intersection, etc very easy and fast. An Empty can also be very easily expressed if all bits are set to zero.

However, sometimes I encounter a package version (lets call it A) that has a dependency range that doesn't select any package version. This is fine and should indicate that package version A should also not be selected. However, because the range equals to empty I get the DependencyOnTheEmptySet error and the solve process aborts.

I was wondering if that behavior is an optimization or if that's something that is required by the algorithm. In my case, due to the way my VersionSet implementation behaves it doesn't make sense. Is it something that I could optionally turn off?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.