kampersanda / sucds Goto Github PK
View Code? Open in Web Editor NEWCollection of succinct data structures in Rust
Home Page: https://docs.rs/sucds
License: Apache License 2.0
Collection of succinct data structures in Rust
Home Page: https://docs.rs/sucds
License: Apache License 2.0
The current version implements input iterators taking references such as EliasFano::from_bits
.
But, for primitive types, call-by-value is enough and call-by-reference would not be useful.
Docs should be generated automatically in CI.
Related to #13
There are several implementations of rank/select bit vector data structure,
and some of them have already been / will be implemented in this library.
Since there are data structures, like wavelet matrix
, uses rank/select bit vector as a building block,
it is better that implementing trait
for rank/select bit vector as below and let user be able to choose concrete implementations:
trait RsBitVector {
fn rank0(&self, pos: usize) -> usize;
fn rank1(&self, pos: usize) -> usize;
fn select0(&self, k: usize) -> usize;
fn select1(&self, k: usize) -> usize;
...
}
@kampersanda How do you think about above? If it is roughly OK, I will create PR to discuss this more concretely.
SSIA
This would be accomplished by porting https://github.com/ot/succinct/blob/master/bp_vector.hpp, although it is necessary to understand the data structure first.
nice library. I remember the original ot
code has compact_elias_fano codes where universe and num elems can be passed in during serialization to make the format more compact. this is especially important when we want to store many small lists where the overhead of storing these explicitly can be prohibitive.
Any plans for such structures?
The coverage of docs.rs should be 100%.
https://docs.rs/crate/sucds/latest
I ran into a stack overflow in my system since EliasFano
takes up 336 bytes on the stack! The main culprit is DArray
, which takes up 288 bytes. It looks like the main issue is structures like DArrayIndex
, which include three different heap allocations, which each take 24 bytes. Then, DArray
has two DArrayIndex
es and a Rank9SelIndex
.
I'm just Box
ing these structures in my system for now, but I'm curious if you'd be interested in PRs to make the structures more space efficient. Here are some ideas:
Vec<T>
to Box<[T]>
for immutable structures. This takes the per-allocation cost down from 24 to 16 bytes.EliasFano
into a single, large heap allocation. This will likely make the code uglier but (1) have better heap utilization, since we can make a single, precise allocation, and (2) dramatically reduce stack usage.Sucds can implement FM-index thanks to WaveletMatrix.
Not an issue just some praise...
This project is a fantastic hidden gem. Extremely useful for big data tasks. Thank you for the great work.
Basically, sucds
does not recommend to use unsafe approaches such as get_unchecked
because deserialization from any data is supported, making it easy to build data structures that contain invalid values.
However, unsafe approaches can be safely used to some parts by prechecking.
Current APIs contain some redundant functions.
For example, EliasFano::from_bits
and EliasFano::from_bitvec
can be generalized.
For maintenance cost, such functions should be unified.
Hello!
I've started implementing support for rkyv
, a zero-copy deserialization framework - my use-cases entail mmap
ing data in from disk, and rkyv
allows me to do that that quite efficiently. My initial work is here.
Would you be interested in a PR that adds support for rkyv
more generally? If so, I'd be happy to clean up my initial work and contribute it here for everyone to use. I've found it surprisingly easy to do so far (I think it speaks to the good design choices made here) and would be pleased to do so if it's a feature you want.
(If you do want this feature, I'd be interested in your thoughts as to the general direction I've taken with the initial bit of work; if that's not the general design pattern you'd like to see then I can adjust.)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.