Comments (11)
When -128.2::float
is treated as float64, there is some precision difference. It should not be related to average function.
> explain select avg(a) from (select -128.2::float a union all select 32768.3 union all select 27.3);
+---------------+--------------------------------------------------------+
| plan_type | plan |
+---------------+--------------------------------------------------------+
| logical_plan | Aggregate: groupBy=[[]], aggr=[[AVG(a)]] |
| | Union |
| | Projection: Float64(-128.1999969482422) AS a |
| | EmptyRelation |
| | Projection: Float64(32768.3) AS a |
| | EmptyRelation |
| | Projection: Float64(27.3) AS a |
| | EmptyRelation |
| physical_plan | AggregateExec: mode=Final, gby=[], aggr=[AVG(a)] |
| | CoalescePartitionsExec |
| | AggregateExec: mode=Partial, gby=[], aggr=[AVG(a)] |
| | UnionExec |
| | ProjectionExec: expr=[-128.1999969482422 as a] |
| | PlaceholderRowExec |
| | ProjectionExec: expr=[32768.3 as a] |
| | PlaceholderRowExec |
| | ProjectionExec: expr=[27.3 as a] |
| | PlaceholderRowExec |
| | |
+---------------+--------------------------------------------------------+
from arrow-datafusion.
take
from arrow-datafusion.
good catch, we can check that
Even simple conversion gives weird result
let expr2 = ScalarValue::Float32(Some(128.2))
.cast_to(&DataType::Float64)
.unwrap()
;
println!("expr2: {:?}", expr2);
expr2: Float64(128.1999969482422)
Opposite conversion is ok though
let expr2 = ScalarValue::Float64(Some(128.2))
.cast_to(&DataType::Float32)
.unwrap();
println!("expr2: {:?}", expr2);
expr2: Float32(128.2)
For me looks like a conversion issue, especially strange its happening when we convert into larger precision.
from arrow-datafusion.
Filed rust-lang/rust#125777
from arrow-datafusion.
Thats weird.
> explain select -128.2::float union all select -128.2;
+---------------+-------------------------------------------------------------------+
| plan_type | plan |
+---------------+-------------------------------------------------------------------+
| logical_plan | Union |
| | Projection: Float64(-128.1999969482422) AS (- Float64(128.2)) |
| | EmptyRelation |
| | Projection: Float64(-128.2) AS (- Float64(128.2)) |
| | EmptyRelation |
| physical_plan | UnionExec |
| | ProjectionExec: expr=[-128.1999969482422 as (- Float64(128.2))] |
| | PlaceholderRowExec |
| | ProjectionExec: expr=[-128.2 as (- Float64(128.2))] |
| | PlaceholderRowExec |
| | |
+---------------+-------------------------------------------------------------------+
So first values transforms into -128.1999969482422
, but the second is intact
from arrow-datafusion.
Thats weird.
explain select -128.2::float union all select -128.2;
I think it is because the first -128.2
is read as float 32, then treated as float 64. I said "treated" because I don't see there is cast expression. It may be happened when DataFusion parses float literal and resolves its data type.
from arrow-datafusion.
I think it is because the first
-128.2
is read as float 32, then treated as float 64. I said "treated" because I don't see there is cast expression. It may be happened when DataFusion parses float literal and resolves its data type.
Agreed. I also found it.
> explain select 128.2::float;
+---------------+------------------------------------------------+
| plan_type | plan |
+---------------+------------------------------------------------+
| logical_plan | Projection: Float32(128.2) AS Float64(128.2) |
| | EmptyRelation |
| physical_plan | ProjectionExec: expr=[128.2 as Float64(128.2)] |
| | PlaceholderRowExec |
| | |
+---------------+------------------------------------------------+
2 row(s) fetched.
Elapsed 0.004 seconds.
> explain select 128.2;
+---------------+------------------------------------------------+
| plan_type | plan |
+---------------+------------------------------------------------+
| logical_plan | Projection: Float64(128.2) |
| | EmptyRelation |
| physical_plan | ProjectionExec: expr=[128.2 as Float64(128.2)] |
| | PlaceholderRowExec |
| | |
+---------------+------------------------------------------------+
2 row(s) fetched.
Elapsed 0.004 seconds.
> explain select 128.2::double;
+---------------+------------------------------------------------+
| plan_type | plan |
+---------------+------------------------------------------------+
| logical_plan | Projection: Float64(128.2) |
| | EmptyRelation |
| physical_plan | ProjectionExec: expr=[128.2 as Float64(128.2)] |
| | PlaceholderRowExec |
| | |
+---------------+------------------------------------------------+
2 row(s) fetched.
Elapsed 0.003 seconds.
> explain select -128.2 union all select -128.2::double;
+---------------+----------------------------------------------------+
| plan_type | plan |
+---------------+----------------------------------------------------+
| logical_plan | Union |
| | Projection: Float64(-128.2) AS Float64(-128.2) |
| | EmptyRelation |
| | Projection: Float64(-128.2) AS Float64(-128.2) |
| | EmptyRelation |
| physical_plan | UnionExec |
| | ProjectionExec: expr=[-128.2 as Float64(-128.2)] |
| | PlaceholderRowExec |
| | ProjectionExec: expr=[-128.2 as Float64(-128.2)] |
| | PlaceholderRowExec |
| | |
+---------------+----------------------------------------------------+
2 row(s) fetched.
Elapsed 0.008 seconds.
> explain select -128.2 union all select -128.2::float;
+---------------+----------------------------------------------------------------+
| plan_type | plan |
+---------------+----------------------------------------------------------------+
| logical_plan | Union |
| | Projection: Float64(-128.2) AS Float64(-128.2) |
| | EmptyRelation |
| | Projection: Float64(-128.1999969482422) AS Float64(-128.2) |
| | EmptyRelation |
| physical_plan | UnionExec |
| | ProjectionExec: expr=[-128.2 as Float64(-128.2)] |
| | PlaceholderRowExec |
| | ProjectionExec: expr=[-128.1999969482422 as Float64(-128.2)] |
| | PlaceholderRowExec |
| | |
+---------------+----------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.008 seconds.
If we cast 128.2
to double
, the value won't be changed. I guess the reason is that DataFusion performs type coercion between float32 and float64 when combining two select plans.
from arrow-datafusion.
I wanted to share something I found. I tried to observe the plan transformation step by step:
sql_to_plan -> analyze -> optimize
SQL: select 128.2 union all select 128.2::float
********
Original LogicalPlan:
Union
Projection: Float64(128.2) AS Float64(128.2)
EmptyRelation
Projection: CAST(CAST(Float64(128.2) AS Float32) AS Float64) AS Float64(128.2)
EmptyRelation
********
Do some analyze:
Union
Projection: Float64(128.2) AS Float64(128.2)
EmptyRelation
Projection: CAST(CAST(Float64(128.2) AS Float32) AS Float64) AS Float64(128.2)
EmptyRelation
********
Do some optimize:
Union
Projection: Float64(128.2) AS Float64(128.2)
EmptyRelation
Projection: Float64(128.1999969482422) AS Float64(128.2)
EmptyRelation
********
I found that CAST
is removed after the plan is optimized. I'm not sure which optimization rule does this—maybe unwrap_cast_in_comparison.rs
? I'm not sure. I'll keep tracking it tomorrow.
from arrow-datafusion.
After some testing, I found it could be an arrow-cast issue. When trying to cast a float32 to float64, the precision is missing.
We can easy to reproduce this case like:
> select (128.2::float)::double
;
+-------------------+
| Float64(128.2) |
+-------------------+
| 128.1999969482422 |
+-------------------+
1 row(s) fetched.
Elapsed 0.089 seconds.
I found it happens in SimplifyExpressions
rule. SimpleExpr
will try to evaluate the casting expression.
datafusion/datafusion/common/src/scalar/mod.rs
Line 2506 in 3d00760
cast_to
will invoke the arrow-cast to get the result.
It can also be reproduced like
let expr2 = ScalarValue::Float64(Some(128.2))
.cast_to(&DataType::Float32)
.unwrap()
.cast_to(&DataType::Float64)
.unwrap();
println!("expr2: {:?}", expr2);
------------------------------------
expr2: Float64(128.1999969482422)
It happens in arrow-cast-51.0.0/src/cast.rs
(I'm not sure where is the code placed).
(Float16, Float32) => cast_numeric_arrays::<Float16Type, Float32Type>(array, cast_options),
I think it maybe not a DataFusion issue but a arrow-cast issue.
@comphead @viirya What do you think?
from arrow-datafusion.
Indeed, it's a conversion issue. I tried to dig into the root cause and found that arrow-cast uses the crate num to handle casting problems. I tried to use it directly:
let value_32: f32 = 128.2;
println!("value_32: {:?}", value_32);
let value_64: f64 = num::cast(value_32).unwrap();
println!("value_64: {:?}", value_64);
-------------------
value_32: 128.2
value_64: 128.1999969482422
I have no idea how to fix it in DataFusion. Maybe we need to go back to the crate num
to fix this behavior?
@comphead WDYT?
from arrow-datafusion.
Well, that might be some representation issue.
println!("value_64: {:?}", 128.2_f32 as f64);
from arrow-datafusion.
Related Issues (20)
- Implement `hf://` / "hugging face" integration in datafusion-cli HOT 6
- Clippy failed on main: consider removing unnecessary double parentheses
- Convert builtin Sum aggregate function to UDAF
- FIRST/LAST_VALUE behavior changes HOT 3
- CLI cannot create external tables with format options
- `stride` arg of `array_slice()` should be optional HOT 2
- Precision/length parameter of varchar/char types is ignored HOT 6
- Feedback request for providing configurable UDF functions HOT 15
- DataFrame.except() does not work with structs in schema HOT 4
- Extract parquet statistics from `Time32` and `Time64` columns HOT 1
- Extract parquet statistics from `Interval` columns HOT 5
- Extract parquet statistics from `LargeBinary` columns
- Extract parquet statistics from `Duration` columns HOT 3
- Extract parquet statistics from `Decimal256` columns
- Extract parquet statistics from `LargeUtf8` columns HOT 2
- Extract parquet statistics from `f16` columns HOT 1
- Extract parquet statistics from timestamps with timezones HOT 1
- Repeat scalar function panics on negative repeat counts.
- Update split_part to support negative indexes vs failing
- Error `NamedStructField should be rewritten in OperatorToFunction with subquery` if query is wrapped in view HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-datafusion.