Comments (2)
This is like an optimization for empty tables. The result above is when both t3 and t4 are empty. After insert some values, the plan displayed correctly.
> insert into t3 values (1,2), (2,3)
;
+-------+
| count |
+-------+
| 2 |
+-------+
1 row(s) fetched.
Elapsed 0.004 seconds.
> explain select count(*) from ((select distinct c1, c2 from t3 order by c1 ) union all (select distinct c2, c1 from t4 order by c1));
+---------------+-------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+-------------------------------------------------------------------------------------------+
| logical_plan | Aggregate: groupBy=[[]], aggr=[[COUNT(Int64(1)) AS COUNT(*)]] |
| | Union |
| | Projection: |
| | Sort: t3.c1 ASC NULLS LAST |
| | Projection: t3.c1 |
| | Aggregate: groupBy=[[t3.c1, t3.c2]], aggr=[[]] |
| | TableScan: t3 projection=[c1, c2] |
| | Projection: |
| | Sort: t4.c1 ASC NULLS LAST |
| | Projection: t4.c1 |
| | Aggregate: groupBy=[[t4.c2, t4.c1]], aggr=[[]] |
| | Projection: t4.c2, t4.c1 |
| | TableScan: t4 projection=[c1, c2] |
| physical_plan | AggregateExec: mode=Final, gby=[], aggr=[COUNT(*)] |
| | CoalescePartitionsExec |
| | AggregateExec: mode=Partial, gby=[], aggr=[COUNT(*)] |
| | RepartitionExec: partitioning=RoundRobinBatch(14), input_partitions=28 |
| | UnionExec |
| | ProjectionExec: expr=[] |
| | AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1, c2@1 as c2], aggr=[] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | RepartitionExec: partitioning=Hash([c1@0, c2@1], 14), input_partitions=14 |
| | RepartitionExec: partitioning=RoundRobinBatch(14), input_partitions=1 |
| | AggregateExec: mode=Partial, gby=[c1@0 as c1, c2@1 as c2], aggr=[] |
| | MemoryExec: partitions=1, partition_sizes=[1] |
| | ProjectionExec: expr=[] |
| | AggregateExec: mode=FinalPartitioned, gby=[c2@0 as c2, c1@1 as c1], aggr=[] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | RepartitionExec: partitioning=Hash([c2@0, c1@1], 14), input_partitions=1 |
| | AggregateExec: mode=Partial, gby=[c2@0 as c2, c1@1 as c1], aggr=[] |
| | MemoryExec: partitions=1, partition_sizes=[0] |
| | |
+---------------+-------------------------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.018 seconds.
> select count(*) from ((select distinct c1, c2 from t3 order by c1 ) union all (select distinct c2, c1 from t4 order by c1));
+----------+
| COUNT(*) |
+----------+
| 2 |
+----------+
1 row(s) fetched.
Elapsed 0.019 seconds.
from arrow-datafusion.
This is like an optimization for empty tables. The result above is when both t3 and t4 are empty. After insert some values, the plan displayed correctly.
I believe that is correct
You can see how the optimizer transforms the plan using EXPLAIN VERBOSE
from arrow-datafusion.
Related Issues (20)
- Aggregate with `FILTER` clause and alias errors HOT 1
- Incorrect LEFT JOIN evaluation result on OR conditions HOT 6
- SMJ: incorrect result for filtered right outer join HOT 1
- Convert `Regr` to UDAF HOT 1
- Convert `Correlation` to UDAF HOT 1
- SMJ producing different results than HashJoin when doing a semi join HOT 7
- Construction of user-defined table functions (UDTFs) should be async to allow for async schemas HOT 1
- Real-time streaming support HOT 1
- Data set which is much bigger than RAM HOT 4
- ci: clippy failed on main
- Convert `Grouping` to UDAF HOT 2
- Convert `BitAnd`, `BitOr`, `BitXor` to UDAF HOT 2
- CTE in a UNION query can escape its scope HOT 1
- Unclear error message when calling a function with no parameters.
- [Epic] Implement support for `StringView` in DataFusion HOT 8
- Implement equality `=` and inequality `<>` support for `StringView` HOT 6
- Implement `arrow_cast` support for `StringView` and `BinaryView` HOT 1
- use StringViewArray when reading String columns from Parquet HOT 11
- [EPIC] Continued correct and improved extracting Parquet statistics into ArrayRefs
- Update ListingTable to use `StatisticsConverter`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-datafusion.