Git Product home page Git Product logo

Comments (5)

bastrich avatar bastrich commented on May 27, 2024 1

Hi @richardstartin, sorry for my original post, I researched the source code a bit more and can see that my proposal doesn't work and doesn't make any sense (or at least is not full). Indeed, I have to work with containers and their layout. Will try to research more and implement some working solution.

from roaringbitmap.

bastrich avatar bastrich commented on May 27, 2024

For the case when the range is 0 - Long.MAX_VALUE, the memory overhead should be around 2Mb which looks big but may be acceptable for most of the cases.

from roaringbitmap.

lemire avatar lemire commented on May 27, 2024

@richardstartin Do you want to take this one?

from roaringbitmap.

richardstartin avatar richardstartin commented on May 27, 2024

hi @bastrich - there's a couple of things to consider

  1. the existing immutable implementation is used for range indexes on offline segments in Apache Pinot, and I don't want to risk performance regressions because it's used in some large systems. So I particularly like that your proposal introduces an entirely separate implementation of the appender.
  2. the existing implementation is somewhat over-optimised for the workflow of building indexes and mapping them from disk on offline segments in Pinot, and along with being immutable, the data structure is densely packed which means there is no space for containers to grow into. Combined with roaring container compression for regions of the bit slices, this makes the feasibility of non-append updates somewhat complicated, because flipping a single bit can have significant ramifications for layout. In the best case, flipping a bit in a bitmap container does not affect layout. In run or array containers the container may need to grow or shrink, which would require shifting all data after the site of the modification. If the container type has to change as a result of the modification, then the update is even more disruptive to layout. I'm not sure that your proposal deals with this complexity - but if I have missed something please let me know and we can discuss it further.

Despite the complexity, several users have now requested mutability in various issues, so I think it's worth implementing. Point 1 above warrants a separate implementation (leaving the existing code alone means users can't be affected).

Point 2 above requires a different in-memory layout using Container objects which can manage all the complexities of growing/changing locally without invalidating the rest of the data structure. A new class MutableRangeBitmap could have all of the same public instance methods as RangeBitmap (eq, lt and so on) but include methods to update values given a row index, and could be implemented in terms of a List<Container[]> or similar. This is very similar to the way ImmutableRoaringBitmap and RoaringBitmap are different implementations of the same data structure but the first one supports memory mapping at the expense of updates, and the second one is mutable but not mapped from disk.

At some point in the future we could consider bringing the two implementations under the same interface (which would probably end up being a breaking change because the best name for that interface might be RangeBitmap and given that there is only one major user, this can be managed). This would eventually lead to things like test reuse across implementations.

Do you want to have a go at implementing MutableRangeBitmap using List<Container[]> instead? I'm happy to discuss further and help with review.

from roaringbitmap.

bastrich avatar bastrich commented on May 27, 2024

Hi @richardstartin, thank you for the detailed response.

I didn't think of the solution you proposed but I will check it, thank you!
Also I will try to implement the solution I proposed to better show what I mean. Perhaps I'm wrong, then I can go only with the solution you proposed.

I can create 2 PRs so we can look at both options and compare them.

Are there any expected timelines?

from roaringbitmap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.