Comments (5)
Hi @richardstartin, sorry for my original post, I researched the source code a bit more and can see that my proposal doesn't work and doesn't make any sense (or at least is not full). Indeed, I have to work with containers and their layout. Will try to research more and implement some working solution.
from roaringbitmap.
For the case when the range is 0 - Long.MAX_VALUE, the memory overhead should be around 2Mb which looks big but may be acceptable for most of the cases.
from roaringbitmap.
@richardstartin Do you want to take this one?
from roaringbitmap.
hi @bastrich - there's a couple of things to consider
- the existing immutable implementation is used for range indexes on offline segments in Apache Pinot, and I don't want to risk performance regressions because it's used in some large systems. So I particularly like that your proposal introduces an entirely separate implementation of the appender.
- the existing implementation is somewhat over-optimised for the workflow of building indexes and mapping them from disk on offline segments in Pinot, and along with being immutable, the data structure is densely packed which means there is no space for containers to grow into. Combined with roaring container compression for regions of the bit slices, this makes the feasibility of non-append updates somewhat complicated, because flipping a single bit can have significant ramifications for layout. In the best case, flipping a bit in a bitmap container does not affect layout. In run or array containers the container may need to grow or shrink, which would require shifting all data after the site of the modification. If the container type has to change as a result of the modification, then the update is even more disruptive to layout. I'm not sure that your proposal deals with this complexity - but if I have missed something please let me know and we can discuss it further.
Despite the complexity, several users have now requested mutability in various issues, so I think it's worth implementing. Point 1 above warrants a separate implementation (leaving the existing code alone means users can't be affected).
Point 2 above requires a different in-memory layout using Container
objects which can manage all the complexities of growing/changing locally without invalidating the rest of the data structure. A new class MutableRangeBitmap
could have all of the same public instance methods as RangeBitmap
(eq
, lt
and so on) but include methods to update values given a row index, and could be implemented in terms of a List<Container[]>
or similar. This is very similar to the way ImmutableRoaringBitmap
and RoaringBitmap
are different implementations of the same data structure but the first one supports memory mapping at the expense of updates, and the second one is mutable but not mapped from disk.
At some point in the future we could consider bringing the two implementations under the same interface (which would probably end up being a breaking change because the best name for that interface might be RangeBitmap
and given that there is only one major user, this can be managed). This would eventually lead to things like test reuse across implementations.
Do you want to have a go at implementing MutableRangeBitmap
using List<Container[]>
instead? I'm happy to discuss further and help with review.
from roaringbitmap.
Hi @richardstartin, thank you for the detailed response.
I didn't think of the solution you proposed but I will check it, thank you!
Also I will try to implement the solution I proposed to better show what I mean. Perhaps I'm wrong, then I can go only with the solution you proposed.
I can create 2 PRs so we can look at both options and compare them.
Are there any expected timelines?
from roaringbitmap.
Related Issues (20)
- Roaring64Bitmap : Overflow error when calculating ANDNOT between two 64-bitmaps HOT 6
- Can't check value 65535 using `contains(long minimum, long supremum)` HOT 9
- Can't find version 0.9.41 in the Maven repository HOT 3
- RoaringBitmap add an efficient `findOffset` method for the index of one assigned value. HOT 7
- Roaring Wavelet Trees? HOT 1
- Not able to stringify roaring bitmap. HOT 1
- Inconsistent Results between `RoaringBitmap.andNotCardinality(s1, s2)` and `RoaringBitmap.andNot(s1, s2).getCardinality()` HOT 3
- Seeing null element in highLowContainer HOT 1
- Inconsistent Results between `RoaringBitmap.andNotCardinality(s1, s2)` and `RoaringBitmap.andNot(s1, s2).getCardinality()` HOT 3
- How to and List<? extends RoaringBitmap> efficiently HOT 3
- Support for the Java module system? HOT 14
- intersect method throws ArrayIndexOutOfBoundsException after serialize/deserialize HOT 3
- Question about Roaring64Bitmap mapping and streaming HOT 1
- Implement roaring_bitmap_internal_validate HOT 2
- An overflow when infinitely iterating over run containers, ArrayIndexOutOfBoundException is thrown then HOT 3
- OSGi Bundles support HOT 1
- ImmutableBitmap's batchiterator.advanceIfNeeded skips over target in certain conditions HOT 3
- AddOffset for Roaring64Bitmap
- Or of two RunContainers can produce an under-sized BitmapContainer HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from roaringbitmap.