Git Product home page Git Product logo

hashconsing's Introduction

crates.io Documentation ci

hashconsing

hashconsing is a Rust hash consing library.

It is based on Type-Safe Modular Hash-Consing by Filliâtre and Conchon. It is slightly less efficient as uses Rust's HashMaps, not a custom built structure.

For more details see the documentation.

Known projects using hashconsing

  • kinō, a model-checker for transition systems
  • hoice, a machine-learning-based predicate synthesizer for horn clauses

License

MIT/Apache-2.0

Contributors

hashconsing's People

Contributors

adrienchampion avatar alex-ozdemir avatar pat-lafon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

hashconsing's Issues

Pool based allocation

Is there anyway to make the HConsed pointers allocate using some memory pool? For example the ArenaArc type in shared_arena looks pretty compatible with std::sync::Arc, and it would help a lot for applications where a large number of objects needs to be allocated and deallocated.

Rc variant and compressing pointer size

Hi,

I've been expermenting with hconsing with NBE for lambda calculus. This library is the primary inspiration, which I've mostly copied. Originally, I was using the library, but ran into some performance problems:

  1. I believe the uid is not needed, the internal NonNull pointer data in the Rc is enough to act as a unique reference. As every Rc is the spoke on a fuzzy ball surrounding a RcBox, even if the location of the RcBox were to somehow move, it must move in a way such that all constructed Rc's remain sound. Of course, the Weak stored in the HConsign has the same properties, because it references the same RcBox. Moreover, once the pointer is reclaimed, that is the exact condition in which the unique id would be reusable because the original Weak would fail to upgrade.
  2. The usage of Arc with no ability to switch to an Rc. In my case, there is no point in being multithreaded. Parallelization in evaluating lambda terms is not very good unless you take an optimal lambda evaluator route.

"Fixing" those two problems were about a 2-3x speedup for my benchmarks, especially because lambda terms are explosive in terms of size, so halving the pointer size ends up being a useful win.

Ideally, I'd like these changes to make it into this library, because then I can just pull it from cargo :)

Unrelated, I would prefer if the pointer types where called Hc and Ahc, but this is a minor thing that is not important to me, I can just rename the types in my project.

If you are interested I could draft a PR with the above changes.

HConsign's table should regularly garbage collect dropped weak pointers

First, thanks for writing this crate. It's been super useful!

When writing a fuzzing harness for code relying on hashconsing, I detected that HConsign increasingly accumulates state. I see tens of thousands of entries in HConsign.table even after dropping all the hashconsed elements.

I'm not sure what a good solution would be here, but options might include:

  • using https://docs.rs/weak-table/0.3.0/weak_table/. It garbage collects on resize, which is a great place to do it since the operation is already O(N) due to rehasing. However, this crate has its own implementation of a hash table, and it'll likely be quite a bit slower than the stdlib's one.
  • exposing a method to give users the ability to GC dropped entries whenever convenient to them.

Weak tables

Is there a way to use WHConsed as keys in a weak hash table like weak_table? If not I can write one and open a PR

Derive macro?

Hey! This is a super cool crate. I actually just had someone introduce me to this topic and it was cool to see a crate that does it. I was curious if there had been any thought into implementing a derive macro for this? I'm not 100% on the details but I'm imagining something like

#[derive(HConsed, Debug, Hash, Clone, PartialEq, Eq)]
enum Term {
    Var(usize),
    Lam(Term),
    App(Term, Term)
}

instead of currently

type Term = HConsed<ActualTerm> ;

#[derive(Debug, Hash, Clone, PartialEq, Eq)]
enum ActualTerm {
    Var(usize),
    Lam(Term),
    App(Term, Term)
}

HConsed's Send and Sync traits should be bounded on the contained type

Currently, HConsed implements the Send and Sync traits unconditionally:

hashconsing/src/lib.rs

Lines 354 to 355 in 1b91c14

unsafe impl<T> Sync for HConsed<T> {}
unsafe impl<T> Send for HConsed<T> {}

This is a possible soundness issue because it allows types T that aren't necessarily thread safe to be used across threads as long as they are wrapped in an HConsed<T>.

Sort of a contrived example but the following demonstrates a data-race that segfaults safe rust using this:

Click to expand example
#![forbid(unsafe_code)]

use hashconsing::{HConsign, HConsed, HashConsign};

use std::hash::{Hash, Hasher};
use std::cell::Cell;
use crossbeam_utils::thread;

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum RefOrInt<'a> {
    Ref(&'a u64),
    Int(u64),
}

#[derive(PartialEq, Eq)]
struct HashableCell<T: Eq + PartialEq + Copy> { cell: Cell<T> }
// Fake hashing function just so we can get a HConsed going.
impl<T: Eq + PartialEq + Copy> Hash for HashableCell<T> {
    fn hash<H: Hasher>(&self, state: &mut H) {
        1024.hash(state);
    }
}

static SOME_INT: u64 = 123;

fn main() {
    let cell = Cell::new(RefOrInt::Ref(&SOME_INT));
    let hashable_cell = HashableCell { cell : cell };

    let mut factory: HConsign<_> = HConsign::empty();
    let hcons_cell_ref = factory.mk(&hashable_cell);
    thread::scope(|s| {
        s.spawn(move |_| {
            let smuggled_cell = &hcons_cell_ref.get().cell;

            loop {
                // Repeatedly write Ref(&addr) and Int(0xdeadbeef) into the cell.
                smuggled_cell.set(RefOrInt::Ref(&SOME_INT));
                smuggled_cell.set(RefOrInt::Int(0xdeadbeef));
            }
        });

        loop {
            if let RefOrInt::Ref(addr) = hashable_cell.cell.get() {
                // Hope that between the time we pattern match the object as a
                // `Ref`, it gets written to by the other thread.
                if addr as *const u64 == &SOME_INT as *const u64 {
                    continue;
                }

                // Due to the data race, obtaining Ref(0xdeadbeef) is possible
                println!("Pointer is now: {:p}", addr);
                println!("Dereferencing addr will now segfault: {}", *addr);
            }
        }
    });
}

This outputs:

Pointer is now: 0xdeadbeef

Return Code: -11 (SIGSEGV)

(Issue found by @sslab-gatech's Rust group)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.