Git Product home page Git Product logo

rrb-vector's Introduction

rbb-vector

An implementation of a Relaxed Radix Balanced Vector (RRB-Vector), an efficient sequence data structure. It supports fast indexing, iteration, concatenation and splitting.

For more information, see rrb-vector on Hackage.

rrb-vector's People

Contributors

konsumlamm avatar meooow25 avatar sgraf812 avatar

Stargazers

Preetham Gujjula avatar  avatar  avatar Fabrizio Ferrai avatar András Kovács avatar Yuriy Al. Shirokov avatar 题叶 avatar  avatar Heitor avatar Teo Camarasu avatar Mike Ledger avatar  avatar Kamil Adam avatar Andrejs Agejevs avatar Josh Miller avatar winterland avatar Rory Tyler Hayford avatar David Garland avatar Daniel Kahlenberg avatar Marco Z avatar

Watchers

Teo Camarasu avatar James Cloos avatar  avatar  avatar

rrb-vector's Issues

Reorder arguments to index, adjust, and adjust'

For other arrays, containers, strings, etc., including Sequence that users will often be converting from, index takes the container before the Int, and adjust and adjust' take the (a -> a) function as the first argument. In addition to being consistent with these cases, I think one can argue that the standard order is more often useful for currying.

I realize this is a breaking change, but I think it'd be much better to make it now, before there are too many users, rather than wait. Users that are surprised by the change should at least get a compiler error, plus you'll obviously bump the library's version number.

Thanks very much for this library!! Sorry to nit-pick.

Transient API

RRB trees are appealing because like vectors, they offer an even more efficient transient (as opposed to persistent) API based on unsafe mutable primitives.

Can we have such a module in this library? Plus maybe another module that wraps the unsafe API with linear types.

See also Ed Kmett's transients lib.

Pattern synonyms?

Hi, @konsumlamm, thanks for publishing rrb-vector. I'd like to try it out as a replacement for Seqs in a few projects I'm working on. When working with Seqs, I find using the bundled pattern synonyms quite convenient. Would you be interested in adding similar pattern synonyms for Vector (I didn't see any in the haddocks)? If you think it might be a good choice, I could try to implement them and open a PR. Thanks!

Question: Why sliceable arrays?

Why are the arrays stored in the RRB-tree nodes O(1)-sliceable?
It adds a memory cost of 2 words per node, but the benefit is not clear to me from looking through the code.
The original paper describing the tree and its Scala implementation use regular fixed-length arrays, so reading that didn't shed light on it either.

`unzip = unzipWith id`

It would be nice to define unzip simply as unzipWith id, but we need to make sure that this doesn't decrease its performance.

Sorting utilities

It would be useful to have sorting utilities that function directly on rrb vectors so conversions between types to get sorting aren't necessary; sortBy at the very least would be useful.

Add value-strict interface

I would find a value-strict interface to this data structure helpful.

The proposal is to add a new module called Data.RRBVector.Strict which exports a newtype wrapper around the existing type, and functions and typeclass instances that are all value-strict.

If this sounds OK I'd be happy to prepare a PR.

Property tests for invariants

I would like to add some tests that check that structure invariants of the tree are not violated, if that sounds ok.
Functions <|, ><, insertAt, deleteAt seem to be violating invariants.

lens type-classes instances

Supporting the appropriate type-classes from lens like Seq does could make transitioning from Seq to Vector as simple as just changing the used type constructor for code using lens.

Unboxed RRB-Vector

Originally proposed by u/sansboarders on reddit.

Using unboxed arrays in the tree structure would probably improve performance (note that I haven't done any benchmarks yet, so I'm not sure). There are a few different "kinds of unboxed values" to consider:

I don't know what the differences (in performance and ergonomics) between these three are or if it's possible to use these for the inner nodes (instead of only the leafs), so input on this is welcome.

One thing I'd like to avoid is duplicating too much code for this. One solution would be to implement a common typeclass, although this would probably make things slower (unless everything is specialized?) and type inference worse, so I'm not sure it's worth the trouble.

Improve `Arbitrary1` instance

Currently, the Arbitrary1 instance in the test suite only generates balanced and a specific kind of unbalanced trees. It would be nice to have more kinds of unbalanced trees or trees resulting from applying several operations to an initially balanced tree.

Optimized `replicate`

Currently, replicate n x can be implemented as fromList (replicate n x) (using replicate for lists).

However, this can be optimized by sharing as many nodes as possible, avoiding unnecessary allocations. This would not only be faster than using fromList, but also make the resulting vector consume less memory.

`cons` and `snoc` don't respect the offset of an existing Array

When you cons or snoc a vector that previously had its head removed, the resultant Vector has the head in it.

This is due to the copying in the cons and snoc functions ignoring the offset in the original Arrays. See here: https://hackage.haskell.org/package/rrb-vector-0.2.0.1/docs/src/Data.RRBVector.Internal.Array.html#cons

Reproduction below.

 > import Data.RRBVector
 > x = fromList "ab"
 > viewl x
 Just ('a',fromList "b")
 > let Just (_, y) = viewl x
 > y
 fromList "b"
 > 'a' <| y
 fromList "aa"

(Semi-)Sparse Vector?

Your folks' great work has inspired me to think about a sparse vector/array type, that is usually (much) more space efficient than IntMap. I'm thinking of a Word64 of bits for present/missing, together with a SmallArray of the non-missing elements (for dimension <= 64) or sparse subvectors. Hopefully with bit-counting instructions for countLeadingZeros or countTrailingZeros, indexing would be pretty fast.

My application is sparse linear algebra, where vectors vary gradually in sparseness. For example, one very large case from integer factoring I believe starts with an integer (or integer mod p) matrix with around 100,000 rows, each of which averages 20-30 nonzero entries. As one does (clever) Gaussian elimination, the rows steadily get more dense. Thus I'm thinking that for most of the algorithm, this data structure would be much better than either extreme of IntMap (good for very sparse vectors), or fully dense arrays.

For linear algebra at least, the basic operations are singleton, zipWith/merge (e.g. adding sparse vectors), map, and fold. For high-performance (exact) work, it'd be nice eventually to also have these types over unboxed integers mod p where p < 2^8, 2^16, 2^32, or 2^64 respectively.

What do you think? Thanks in advance for any advice or interest, and sorry if this is somewhat off-topic for rrb-vector.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.