Comments (4)
Should we also change the signature and return Result?
Yes I think so
Should we also change the signature and return Result?
I actually think using new_null_array
is wrong here. It returns the wrong type -- for example if we are getting row counts for a Utf8
column, using new_null_arrray will return a StringArray
rather than a UInt64Array
.
I actually hit this issue when working on #10924
And I had to change it to
let Some(parquet_index) = self.parquet_index else {
let num_row_groups = metadatas.into_iter().count();
return Ok(Arc::new(UInt64Array::from_iter(
std::iter::repeat(None).take(num_row_groups),
)));
};
To make a test pass
from arrow-datafusion.
I actually think using
new_null_array
is wrong here. It returns the wrong type
I passed &DataType::UInt64
into new_null_array(...)
. However, your approach solves the issue with downcasting the result from new_null_array so it makes more sense here. Thank you.
@alamb just to confirm my understanding (regarding my second question) row_group_row_counts
should also return a null_array when we reference a non-existing column; thus regardless if row group or data page both row_counts behave the same way when encountering a missing col?
from arrow-datafusion.
take
from arrow-datafusion.
@alamb
I have two questions here:
Should we also change the signature and return Result<Uint64Array>
?
This however, would require to downcast_ref the result from new_null_array
and clone the result.
Something like:
let Some(parquet_index) = self.parquet_index else {
return Ok(self
.make_null_array(&DataType::UInt64, metadatas)
.as_any()
.downcast_ref::<UInt64Array>()
.expect("failed to downcast array")
.clone());
};
If we change the signature we also have to make some changes downstream, e.g. in row_groups.rs fn null_counts
Yet another question, what is the correct behaviour in data_page_row_counts
if we cannot find a matching parquet_index, should we return a null_array (like we do already) or provide the row_group_row_count instead, since we cannot access the pages via parquet_index anyway? Is it even correct to not return a null-array in row_group_row_counts even if we can't match the column?
EDIT: I guess this answers the second question:
If there is not a column, the UInt64Array should be all nulls
from arrow-datafusion.
Related Issues (20)
- regression: query error with case when as the group field HOT 6
- ScalarValue::new_ten error cites one not ten
- Support Map as a ScalarValue HOT 4
- Consider using `Arc::clone` to clone Arcs to make it clear they aren't deep copies HOT 1
- Support unparsing the Value Plan of Array (List) to SQL String HOT 2
- Support `Dictionary` in Parquet Metadata Statistics HOT 4
- Is pre-compile pattern string in regexp_match operation HOT 3
- Implement physical plan serialization for COPY plans `CsvLogicalExtensionCodec` HOT 1
- Remove special casting of `Min` / `Max` built in `AggregateFunctions` HOT 2
- Remove `Min`/`Max` references from `AggregateExec::get_minmax_descr` HOT 1
- Remove `Min`/`Max` references from `AggregateStatistics` HOT 7
- Union columns can never be `NULL` (I think?) HOT 1
- Allow sorting to improve `FixedSizeBinary` filtering HOT 6
- Consolidate examples to make them easier to navigate HOT 3
- Inconsistent behavior of '+-' arithmetic operator on Boolean type HOT 1
- Optionally display schema in explain plan
- math functions precision with f64 doesn't quite vibe on FreeBSD HOT 1
- Break datafusion-catalog code into its own crate HOT 5
- Support `FixedSizedBinaryArray` Parquet Data Page Statistics HOT 1
- Support `DictionaryArray` Parquet Data Page Statistics HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-datafusion.