Git Product home page Git Product logo

range-set-blaze's Introduction

range-set-blaze

github crates.io docs.rs

Integer sets as fast, sorted, integer ranges with full set operations

The integers can be any size (u8 to u128) and may be signed (i8 to i128). The set operations include union, intersection, difference, symmetric difference, and complement.

The crate's main struct is RangeSetBlaze, a set of integers. See the documentation for details.

Unlike the standard BTreeSet and HashSet, RangeSetBlaze does not store every integer in the set. Rather, it stores sorted & disjoint ranges of integers in a cache-efficient BTreeMap. It differs from other interval libraries -- that we know of -- by offering full set operations and by being optimized for sets of clumpy integers.

We can construct a RangeSetBlaze from unsorted & redundant integers (or ranges). When the inputs are clumpy, construction will be linear in the number of inputs and set operations will be sped up quadratically.

The crate's main trait is SortedDisjoint. It is implemented by iterators of sorted & disjoint ranges of integers. See the SortedDisjoint documentation for details.

With any SortedDisjoint iterator we can perform set operations in one pass through the ranges and with minimal (constant) memory. It enforces the "sorted & disjoint" constraint at compile time. This trait is inspired by the SortedIterator trait from the sorted_iter crate. SortedDisjoint differs from its inspiration by specializing on disjoint integer ranges.

The crate supports no_std, WASM, and embedded projects. Use the command:

cargo add range-set-blaze --features "alloc" --no-default-features

Benchmarks

See the benchmarks for performance comparisons with other range-related crates.

Generally, for many tasks involving clumpy integers and ranges, RangeSetBlaze is much faster than alternatives.

The benchmarks are in the benches directory. To run them, use cargo bench.

Articles

Examples

Example 1


Here we take the union (operator “|”) of two RangeSetBlaze's:

Example 1

use range_set_blaze::RangeSetBlaze;

 // a is the set of integers from 100 to 499 (inclusive) and 501 to 1000 (inclusive)
let a = RangeSetBlaze::from_iter([100..=499, 501..=999]);
 // b is the set of integers -20 and the range 400 to 599 (inclusive)
let b = RangeSetBlaze::from_iter([-20..=-20, 400..=599]);
// c is the union of a and b, namely -20 and 100 to 999 (inclusive)
let c = a | b;
assert_eq!(c, RangeSetBlaze::from_iter([-20..=-20, 100..=999]));

Example 2


In biology, suppose we want to find the intron regions of a gene but we are given only the transcription region and the exon regions.

Example 2

We create a RangeSetBlaze for the transcription region and a RangeSetBlaze for all the exon regions. Then we take the difference between the transcription region and exon regions to find the intron regions.

use range_set_blaze::RangeSetBlaze;

let line = "chr15   29370   37380   29370,32358,36715   30817,32561,37380";

// split the line on white space
let mut iter = line.split_whitespace();
let chr = iter.next().unwrap();

// Parse the start and end of the transcription region into a RangeSetBlaze
let trans_start: i32 = iter.next().unwrap().parse().unwrap();
let trans_end: i32 = iter.next().unwrap().parse().unwrap();
let trans = RangeSetBlaze::from_iter([trans_start..=trans_end]);
assert_eq!(trans, RangeSetBlaze::from_iter([29370..=37380]));

// Parse the start and end of the exons into a RangeSetBlaze
let exon_starts = iter.next().unwrap().split(',').map(|s| s.parse::<i32>());
let exon_ends = iter.next().unwrap().split(',').map(|s| s.parse::<i32>());
let exon_ranges = exon_starts
    .zip(exon_ends)
    .map(|(s, e)| s.unwrap()..=e.unwrap());
let exons = RangeSetBlaze::from_iter(exon_ranges);
assert_eq!(
    exons,
    RangeSetBlaze::from_iter([29370..=30817, 32358..=32561, 36715..=37380])
);

// Use 'set difference' to find the introns
let intron = trans - exons;
assert_eq!(intron, RangeSetBlaze::from_iter([30818..=32357, 32562..=36714]));
for range in intron.ranges() {
    let (start, end) = range.into_inner();
    println!("{chr}\t{start}\t{end}");
}

range-set-blaze's People

Contributors

bonsairobo avatar carlkcarlk avatar lrvdijk avatar mrothnet avatar striezel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

range-set-blaze's Issues

Consider being more conservative with trait bounds

The Integer trait has many and some very strict requirements, of which many seem to me like they aren't necessary for all the functionality, making it cumbersome or impossible to use the crate for a custom type. In the stdlib there's often no trait bounds on the struct at all, even though most of their methods need them (ref HashMap)!

Most concerning is the Copy bound, which prevents the use of dynamically sized integers like BigInt. There's also Debug and Display, which prevent the use of types used in cryptographic settings (though I can't think of a use case off the top of my head). The presence of most of the other bounds means that even someone with a simple newtype needs to implement a lot of unnecessary traits for it.

I also think that conceptually it should be possible to support certain methods for things like floating point where iteration doesn't really make sense without compromising your implementation. It should still be possible to do many things like take unions, intersections and check for membership, but maybe I'm missing something about your implementation details.

On a related note, the stdlib Step trait could have been nice, but sadly it hasn't been stabilized yet.

`no_std` support

It would be very useful for this crate to be usable in no_std environments. At a glance, the only hard requirement I see in the crate is really alloc, so it could be more lax about requiring the whole std support. That would allow uses in bare-metal and wasm.

'ui' tests are broken

tests in the ui folder need to be removed or fixed, this is probably not the intended output:


NOTE: writing the following output to `tests/ui/tee.stderr`.
┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
error[E0599]: the method `union` exists for struct `Tee<std::slice::Iter<'_, ({integer}, {integer})>>`, but its trait bounds were not satisfied
 --> tests/ui/tee.rs:7:17
  |
7 |     let _c = a0.union(a1);
  |                 ^^^^^ method cannot be called on `Tee<std::slice::Iter<'_, ({integer}, {integer})>>` due to unsatisfied trait bounds
  |
 ::: $CARGO/itertools-0.10.5/src/tee.rs
  |
  | pub struct Tee<I>
  | -----------------
  | |
  | doesn't satisfy `<_ as IntoIterator>::Item = &RangeSetBlaze<_>`
  | doesn't satisfy `<_ as IntoIterator>::Item = RangeSetBlaze<_>`
  | doesn't satisfy `_: range_set_blaze::MultiwayRangeSetBlaze<'_, _>`
  | doesn't satisfy `_: range_set_blaze::MultiwayRangeSetBlazeRef<_>`
  | doesn't satisfy `_: range_set_blaze::SortedDisjoint<_>`
  |
 ::: $RUST/core/src/slice/iter.rs
  |
  | pub struct Iter<'a, T: 'a> {
  | -------------------------- doesn't satisfy `_: range_set_blaze::SortedDisjoint<_>`
  |
  = note: the following trait bounds were not satisfied:
          `<Tee<std::slice::Iter<'_, ({integer}, {integer})>> as IntoIterator>::Item = &RangeSetBlaze<_>`
          which is required by `Tee<std::slice::Iter<'_, ({integer}, {integer})>>: range_set_blaze::MultiwayRangeSetBlaze<'_, _>`
          `<Tee<std::slice::Iter<'_, ({integer}, {integer})>> as IntoIterator>::Item = RangeSetBlaze<_>`
          which is required by `Tee<std::slice::Iter<'_, ({integer}, {integer})>>: range_set_blaze::MultiwayRangeSetBlazeRef<_>`
          `std::slice::Iter<'_, ({integer}, {integer})>: range_set_blaze::SortedDisjoint<_>`
          which is required by `Tee<std::slice::Iter<'_, ({integer}, {integer})>>: range_set_blaze::SortedDisjoint<_>`
          `<&Tee<std::slice::Iter<'_, ({integer}, {integer})>> as IntoIterator>::Item = &RangeSetBlaze<_>`
          which is required by `&Tee<std::slice::Iter<'_, ({integer}, {integer})>>: range_set_blaze::MultiwayRangeSetBlaze<'_, _>`
          `&Tee<std::slice::Iter<'_, ({integer}, {integer})>>: IntoIterator`
          which is required by `&Tee<std::slice::Iter<'_, ({integer}, {integer})>>: range_set_blaze::MultiwayRangeSetBlaze<'_, _>`
          `<&Tee<std::slice::Iter<'_, ({integer}, {integer})>> as IntoIterator>::Item = RangeSetBlaze<_>`
          which is required by `&Tee<std::slice::Iter<'_, ({integer}, {integer})>>: range_set_blaze::MultiwayRangeSetBlazeRef<_>`
          `&Tee<std::slice::Iter<'_, ({integer}, {integer})>>: IntoIterator`
          which is required by `&Tee<std::slice::Iter<'_, ({integer}, {integer})>>: range_set_blaze::MultiwayRangeSetBlazeRef<_>`
          `<&Tee<std::slice::Iter<'_, ({integer}, {integer})>> as IntoIterator>::Item = _`
          which is required by `&Tee<std::slice::Iter<'_, ({integer}, {integer})>>: range_set_blaze::MultiwaySortedDisjoint<_, _>`
          `&Tee<std::slice::Iter<'_, ({integer}, {integer})>>: IntoIterator`
          which is required by `&Tee<std::slice::Iter<'_, ({integer}, {integer})>>: range_set_blaze::MultiwaySortedDisjoint<_, _>`
          `<&mut Tee<std::slice::Iter<'_, ({integer}, {integer})>> as IntoIterator>::Item = &RangeSetBlaze<_>`
          which is required by `&mut Tee<std::slice::Iter<'_, ({integer}, {integer})>>: range_set_blaze::MultiwayRangeSetBlaze<'_, _>`
          `<&mut Tee<std::slice::Iter<'_, ({integer}, {integer})>> as IntoIterator>::Item = RangeSetBlaze<_>`
          which is required by `&mut Tee<std::slice::Iter<'_, ({integer}, {integer})>>: range_set_blaze::MultiwayRangeSetBlazeRef<_>`

warning: unused import: `range_set_blaze::prelude`
 --> tests/ui/tee.rs:2:5
  |
2 | use range_set_blaze::prelude::*;
  |     ^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default
┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈

Make `AssumeSortedStarts` easier to use

I'm trying to replace my own range set implementation with this crate, and something I need to do often is merge sorted ranges that are not necessarily disjoint. It seems like UnionIter and AssumeSortedStarts are the right tools for this. There are a few small changes that I think would improve this use case:

  1. Document SortedStarts and AssumeSortedStarts. I find it odd that these have #[doc(hidden)]. It was not obvious that such APIs existed until I snooped around the code a bit.
  2. Add a constructor for AssumeSortedStarts which takes I: IntoIterator. Right now it only takes I: Iterator, so I have to manually call into_iter() when I use it.
  3. Maybe add an extension trait so I can do something like: [0..1, 0..2].assume_sorted() instead of AssumeSortedStarts::new([0..1, 0..2]).

I can probably make a PR with these changes if you like.

Consider using `[T; 2]` instead of `RangeInclusive<T>`

I think RangeInclusive is a poor choice for this crate for multiple reasons. In order of importance:

  1. RangeInclusive<T> has a significantly larger size than [T; 2], because RangeInclusive contains a bool field. This is a strong hint that Rust's range types were designed specifically to be used as iterators, not to be stored in data structures.
  2. Rust's range types were designed for two things: iteration and slice indexing. Neither of these are core use cases for this crate. Even when those use cases are relevant, it's very easy to convert from [T; 2] to RangeInclusive<T> as needed.

There are likely more reasons. See here. I'd be surprised if you hadn't already encountered some of these issues while writing this library.

I suggest either using [T; 2] or defining your own more ergonomic range type.

Add a into_inner and from_inner to would make serialization easier.

As per @bonsairobo's suggestion on issue #15

Add a into_inner and from_inner method that would return the underlying BTreeMap and construct from it.

They would not return or require the running length. Instead, the from_inner would validate the BTreeMap and compute the length (O(#of ranges)).

One complication: When the map RangeMapBlaze is ready, RangeSetBlaze may be re-defined as RangeMapBlaze<T,()) with the unit type as the value. This will change the type of the inner BTreeMap from BTreeMap<T,T> to something like BTreeMap<T,(T,())>, which means old, serialized data may be incompatible.

Build fails for target armv7-linux-androideabi

Hello there!

I recently tried to build a project of mine for Android that uses your wonderful library and encountered a few different categories of error:

error[E0437]: type `Output` is not a member of trait `Integer`
 --> src/integer.rs:9:5
  |
9 |     type Output = usize;
  |     ^^^^^^^^^^^^^^^^^^^^ not a member of trait `Integer`

Several of these for the various integer widths and then the following:

error[E0277]: the trait bound `i128: OverflowingSub` is not satisfied
   --> src/integer.rs:154:18
    |
154 | impl Integer for i128 {
    |                  ^^^^ the trait `OverflowingSub` is not implemented for `i128`
    |
    = help: the following other types implement trait `OverflowingSub`:
              i16
              i32
              i64
              i8
              isize
              u16
              u32
              u64
            and 2 others
note: required by a bound in `Integer`
   --> src/lib.rs:58:7
    |
45  | pub trait Integer:
    |           ------- required by a bound in this trait
...
58  |     + OverflowingSub
    |       ^^^^^^^^^^^^^^ required by this bound in `Integer`

error[E0277]: the trait bound `i128: NumCast` is not satisfied
   --> src/integer.rs:154:18
    |
154 | impl Integer for i128 {
    |                  ^^^^ the trait `NumCast` is not implemented for `i128`
    |
    = help: the following other types implement trait `NumCast`:
              f32
              f64
              i16
              i32
              i64
              i8
              isize
              u16
            and 4 others
note: required by a bound in `Integer`
   --> src/lib.rs:55:7
    |
45  | pub trait Integer:
    |           ------- required by a bound in this trait
...
55  |     + num_traits::NumCast
    |       ^^^^^^^^^^^^^^^^^^^ required by this bound in `Integer`

rror[E0277]: the trait bound `i128: Bounded` is not satisfied
   --> src/integer.rs:154:18
    |
154 | impl Integer for i128 {
    |                  ^^^^ the trait `Bounded` is not implemented for `i128`
    |
    = help: the following other types implement trait `Bounded`:
              f32
              f64
              i16
              i32
              i64
              i8
              isize
              u16
            and 4 others
note: required by a bound in `Integer`
   --> src/lib.rs:54:7
    |
45  | pub trait Integer:
    |           ------- required by a bound in this trait
...
54  |     + num_traits::Bounded
    |       ^^^^^^^^^^^^^^^^^^^ required by this bound in `Integer`

I also get the above for the u128 implementation of Integer.

I'm new to rust myself and could use some pointers chasing down the appropriate solutions for this. The first set of errors go away if you omit the associated type completely (via commenting is what I tried), but that doesn't affect the others and I'm not sure of the consequences of that. My plan had been to refine the #[cfg(...)] around them. As far as the others I think I'm at the "needs to know the root cause of the discrepancies" phase and that's not clear to me as a novice.

Steps to reproduce:

rustup target add armv7-linux-androideabi
cargo build --target armv7-linux-androideabi

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.