Comments (7)
I am a person who is interested in contributing to open source. Looks like this issue is a good one for a starter. May I work on this issue?
from velox.
Welcome, @amaliujia . Sure, feel free to pick this up. CC: @atanu1991 who is working on a related issue #153
from velox.
I want to discuss a bit on high level before start to implement.
MultiRange is a class that contains a list of Filter, with nan_allowed and null_allowed. So my current idea of the implementation is:
- Two MultiRange can merge only if they have the same nan_allowed and null_allowed.
- Two MultiRange can merge only if they have the same data type on all the Filters.
- If 1 and 2 are true, then simply have a nested loop to do
for f1 in MultiRangeA.getFilters():
for f2 in MultiRangeB.getFilters():
f1.mergeWith(f2);
Can you give me some suggestions?
from velox.
Rui,
MultiRange represents and OR of multiple filters. NaN values pass only if MultiRange::nanAllowed is true. Null values pass only if nullAllowed is true.
Merging two filters is combining the filters with an AND. A value passes the new filter only if it passes both the original filters.
Here is how I would represent a merge of two MultiRange filters (with just 2 filters in each for simplicity):
(a OR b) AND (c OR d) = (a AND c) OR (a AND d) OR (b AND c) OR (b AND d)
Therefore, what you have proposed is very close.
- Calculate nullAllowed for the combined filter:
bool bothNullAllowed = nullAllowed_ && other->testNull();
- Calculate nanAllowed for the combined filter: {same as above}
- Loop over all combinations of filters from left and right side and merge these.
- Construct a new filter using flags and list of filters from previous steps.
In step (3), make sure to check if a merge of two filter is kAlwaysFalse, kIsNull, kIsNotNull. In this case, the filter can be dropped from the result (null and NaN flags in the nested filters do not count for MultiRange filter; only top-level flags take effect).
Step (3) may result in an empty list of filters or a list of one. If the list is empty, return kAlwaysFalse, kIsNull, or kIsNotNull as appropriate (see nullOrFalse(bothNullAllowed) in Filter.cpp). If the list contains just one filter, return that filter after incorporating the null flag, e.g. if bothNullAllowed && !filter->testNull() return filter.mergeWith(IsNull) else return filter.
That should be it.
(Check out #233 that implements merge logic for BytesValues and BytesRanges.)
from velox.
Thanks @mbasmanova!
I prepared a WIP impl based on your suggestions: #299.
Can you comment on the PR to see if I am on the right track? I haven't added test cases and I can add those later.
Sorry to occupy too much of your time. I am a beginner so probably need more guidance at this moment (for example on how to deal with nullness in Volex)? Maybe later I could help to contribute a doc about nullness, nan, always true, always false, is null, is not null with Filter.
from velox.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
from velox.
Looks like this issue was resolved with #299 . Closing.
from velox.
Related Issues (20)
- Enhance WindowFuzzer to support k-range and k-rows frames with column boundary HOT 5
- Build issue with ABFS adapter HOT 2
- Velox write to ABFS storage fail with FileSink is not registered issue
- Add background daemon in Velox runtime for periodic runtime metric reporting HOT 3
- ‘FOLLY_MAYBE_UNUSED’ does not name a type
- Add Long Divide Spark function
- CI 'Ubuntu debug with resolve_dependency' is failing with The repository '...ubuntu/22.04...' is no longer signed HOT 2
- Native worker segmentation fault on makeExprSetFromFlag HOT 2
- date_format returns wrong results in Spark
- Parquet reading failed to decompress LZO files
- Failure in Fuzzer Jobs / Build (pull_request) HOT 3
- Add support for GenericViews in ContainerRowSerde HOT 2
- Support Spark Generate with outer HOT 8
- Extend Simple Function API to allow reporting errors without throwing HOT 9
- Velox can't open the ORC files created by spark-sql HOT 6
- Free uninitialized variable-width key column in abandon partial agg mode. HOT 3
- Query execution tracing and replay tool HOT 3
- Make evaluateOnce() work with extended types
- Complete tpch q2, q4 and q11 in TPC-H micro benchmark suite HOT 2
- ParquetReader needs to implement columnStatistics()
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from velox.