Comments (12)
Thanks to capturing these failed tests, @williamhyun
I conducted some investigation.
Prior to #1055, it was just coincidence that these tests would pass, after #1055 was fixed so that the push-down judgement could continue to be executed.
testIsNaN[format = orc]
This test is no longer working.
When orc is not written to Double.NaN, and DoubleStatistics sum is a finite value, we can correctly push downno_nans = NaN
and skip this stripe.testNotNaN[format = orc]
is still a valid test.
The reason this test doesn't pass is that the statistics check for the Float type is missing.
orc/java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java
Lines 696 to 703 in 70c504c
category == TypeDescription.Category.DOUBLE || category == TypeDescription.Category.FLOAT
from orc.
Thank you @guiyanakuang !
I am retesting on both Iceberg and Spark.
from orc.
Thank you, I am rerunning the tests here:
from orc.
Thank you for reporting, @williamhyun .
from orc.
Oh, thank you, @guiyanakuang !
from orc.
I also quickly re-checked the Iceberg test. So, TestMetricsRowGroupFilter
passed except the following one mentioned by @guiyanakuang .
shouldRead = shouldRead(isNaN("no_nans"));
Assert.assertTrue("Should read: NaN counts are not tracked in Parquet metrics", shouldRead);
from orc.
If you get the results, please share them to the dev mailing list, @williamhyun .
from orc.
@williamhyun
ORC-1147 landed to main/branch-1.7/1.6. Could you rerun the integration tests, please?
from orc.
Thank you for rerunning it. Yes, I saw Spark passed and Iceberg has only the following. If we comment out the test, it will pass cleanly.
org.apache.iceberg.data.TestMetricsRowGroupFilter > testIsNaN[format = orc] FAILED
java.lang.AssertionError: Should read: NaN counts are not tracked in Parquet metrics
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.apache.iceberg.data.TestMetricsRowGroupFilter.testIsNaN(TestMetricsRowGroupFilter.java:308)
from orc.
BTW,
- if you feel ok, please close this issue and move forward, @williamhyun . :)
- Since the same patches are in the branch-1.6, I'll trigger integration test for them separately as a release manager of Apache ORC 1.6.14 as a part of #1081 .
from orc.
One more thing. @williamhyun . If you don't mind, please follow the style of #1081 and update your release issue, #1046 . It will be greatly helpful for people to track your efforts and progress.
from orc.
Thank you for verifying and closing this.
from orc.
Related Issues (20)
- ORC-1696: Fix ClassCastException when reading avro decimal type in bechmark HOT 1
- ORC-1696: Fix ClassCastException when reading avro decimal type in bechmark HOT 1
- ORC-1696: Fix ClassCastException when reading avro decimal type in bechmark HOT 1
- ORC-1699: Fix SparkBenchmark in Parquet format according to SPARK-40918 HOT 1
- orc-tools unknown subcommand "Merge" HOT 3
- Release ORC 2.0.1 HOT 1
- Buffer size too small. size HOT 5
- [CPP] Flushing without writing after registering a timeseries will cause a core dump.
- ORC-1721: Upgrade `aircompressor` to 0.27 HOT 1
- ORC-1721: Upgrade `aircompressor` to 0.27 HOT 1
- Release ORC 1.9.4 HOT 1
- [Java] Different semantic of lengths for CHAR(n) with C++ HOT 1
- ORC-1540: Remove MacOS 11 from GitHub Action CI and docs HOT 1
- ORC-1540: Remove MacOS 11 from GitHub Action CI and docs HOT 1
- ORC-1540: Remove MacOS 11 from GitHub Action CI and docs HOT 1
- ORC-1619: Add `MacOS 14` to GitHub Action HOT 1
- ORC-1738: [C++] Fix wrong Int128 maximum value HOT 1
- ORC-1738: [C++] Fix wrong Int128 maximum value HOT 1
- ORC-1738: [C++] Fix wrong Int128 maximum value HOT 1
- How does hive.exec.orc.default.buffer.size affect the file size? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from orc.