Git Product home page Git Product logo

Comments (11)

viirya avatar viirya commented on July 23, 2024 1

When -128.2::float is treated as float64, there is some precision difference. It should not be related to average function.

> explain select avg(a) from (select -128.2::float a union all select 32768.3 union all select 27.3);
+---------------+--------------------------------------------------------+
| plan_type     | plan                                                   |
+---------------+--------------------------------------------------------+
| logical_plan  | Aggregate: groupBy=[[]], aggr=[[AVG(a)]]               |
|               |   Union                                                |
|               |     Projection: Float64(-128.1999969482422) AS a       |
|               |       EmptyRelation                                    |
|               |     Projection: Float64(32768.3) AS a                  |
|               |       EmptyRelation                                    |
|               |     Projection: Float64(27.3) AS a                     |
|               |       EmptyRelation                                    |
| physical_plan | AggregateExec: mode=Final, gby=[], aggr=[AVG(a)]       |
|               |   CoalescePartitionsExec                               |
|               |     AggregateExec: mode=Partial, gby=[], aggr=[AVG(a)] |
|               |       UnionExec                                        |
|               |         ProjectionExec: expr=[-128.1999969482422 as a] |
|               |           PlaceholderRowExec                           |
|               |         ProjectionExec: expr=[32768.3 as a]            |
|               |           PlaceholderRowExec                           |
|               |         ProjectionExec: expr=[27.3 as a]               |
|               |           PlaceholderRowExec                           |
|               |                                                        |
+---------------+--------------------------------------------------------+

from arrow-datafusion.

goldmedal avatar goldmedal commented on July 23, 2024 1

take

from arrow-datafusion.

comphead avatar comphead commented on July 23, 2024 1

good catch, we can check that

Even simple conversion gives weird result

        let expr2 = ScalarValue::Float32(Some(128.2))
            .cast_to(&DataType::Float64)
            .unwrap()
            ;
        println!("expr2: {:?}", expr2);

        expr2: Float64(128.1999969482422)

Opposite conversion is ok though

        let expr2 = ScalarValue::Float64(Some(128.2))
            .cast_to(&DataType::Float32)
            .unwrap();
        println!("expr2: {:?}", expr2);

        expr2: Float32(128.2)

For me looks like a conversion issue, especially strange its happening when we convert into larger precision.

from arrow-datafusion.

comphead avatar comphead commented on July 23, 2024 1

Filed rust-lang/rust#125777

from arrow-datafusion.

comphead avatar comphead commented on July 23, 2024

Thats weird.

> explain select -128.2::float union all select -128.2;
+---------------+-------------------------------------------------------------------+
| plan_type     | plan                                                              |
+---------------+-------------------------------------------------------------------+
| logical_plan  | Union                                                             |
|               |   Projection: Float64(-128.1999969482422) AS (- Float64(128.2))   |
|               |     EmptyRelation                                                 |
|               |   Projection: Float64(-128.2) AS (- Float64(128.2))               |
|               |     EmptyRelation                                                 |
| physical_plan | UnionExec                                                         |
|               |   ProjectionExec: expr=[-128.1999969482422 as (- Float64(128.2))] |
|               |     PlaceholderRowExec                                            |
|               |   ProjectionExec: expr=[-128.2 as (- Float64(128.2))]             |
|               |     PlaceholderRowExec                                            |
|               |                                                                   |
+---------------+-------------------------------------------------------------------+

So first values transforms into -128.1999969482422, but the second is intact

from arrow-datafusion.

viirya avatar viirya commented on July 23, 2024

Thats weird.

explain select -128.2::float union all select -128.2;

I think it is because the first -128.2 is read as float 32, then treated as float 64. I said "treated" because I don't see there is cast expression. It may be happened when DataFusion parses float literal and resolves its data type.

from arrow-datafusion.

goldmedal avatar goldmedal commented on July 23, 2024

I think it is because the first -128.2 is read as float 32, then treated as float 64. I said "treated" because I don't see there is cast expression. It may be happened when DataFusion parses float literal and resolves its data type.

Agreed. I also found it.

> explain select 128.2::float;
+---------------+------------------------------------------------+
| plan_type     | plan                                           |
+---------------+------------------------------------------------+
| logical_plan  | Projection: Float32(128.2) AS Float64(128.2)   |
|               |   EmptyRelation                                |
| physical_plan | ProjectionExec: expr=[128.2 as Float64(128.2)] |
|               |   PlaceholderRowExec                           |
|               |                                                |
+---------------+------------------------------------------------+
2 row(s) fetched. 
Elapsed 0.004 seconds.

> explain select 128.2;
+---------------+------------------------------------------------+
| plan_type     | plan                                           |
+---------------+------------------------------------------------+
| logical_plan  | Projection: Float64(128.2)                     |
|               |   EmptyRelation                                |
| physical_plan | ProjectionExec: expr=[128.2 as Float64(128.2)] |
|               |   PlaceholderRowExec                           |
|               |                                                |
+---------------+------------------------------------------------+
2 row(s) fetched. 
Elapsed 0.004 seconds.

> explain select 128.2::double;
+---------------+------------------------------------------------+
| plan_type     | plan                                           |
+---------------+------------------------------------------------+
| logical_plan  | Projection: Float64(128.2)                     |
|               |   EmptyRelation                                |
| physical_plan | ProjectionExec: expr=[128.2 as Float64(128.2)] |
|               |   PlaceholderRowExec                           |
|               |                                                |
+---------------+------------------------------------------------+
2 row(s) fetched. 
Elapsed 0.003 seconds.

> explain select -128.2 union all select -128.2::double;
+---------------+----------------------------------------------------+
| plan_type     | plan                                               |
+---------------+----------------------------------------------------+
| logical_plan  | Union                                              |
|               |   Projection: Float64(-128.2) AS Float64(-128.2)   |
|               |     EmptyRelation                                  |
|               |   Projection: Float64(-128.2) AS Float64(-128.2)   |
|               |     EmptyRelation                                  |
| physical_plan | UnionExec                                          |
|               |   ProjectionExec: expr=[-128.2 as Float64(-128.2)] |
|               |     PlaceholderRowExec                             |
|               |   ProjectionExec: expr=[-128.2 as Float64(-128.2)] |
|               |     PlaceholderRowExec                             |
|               |                                                    |
+---------------+----------------------------------------------------+
2 row(s) fetched. 
Elapsed 0.008 seconds.

> explain select -128.2 union all select -128.2::float;
+---------------+----------------------------------------------------------------+
| plan_type     | plan                                                           |
+---------------+----------------------------------------------------------------+
| logical_plan  | Union                                                          |
|               |   Projection: Float64(-128.2) AS Float64(-128.2)               |
|               |     EmptyRelation                                              |
|               |   Projection: Float64(-128.1999969482422) AS Float64(-128.2)   |
|               |     EmptyRelation                                              |
| physical_plan | UnionExec                                                      |
|               |   ProjectionExec: expr=[-128.2 as Float64(-128.2)]             |
|               |     PlaceholderRowExec                                         |
|               |   ProjectionExec: expr=[-128.1999969482422 as Float64(-128.2)] |
|               |     PlaceholderRowExec                                         |
|               |                                                                |
+---------------+----------------------------------------------------------------+
2 row(s) fetched. 
Elapsed 0.008 seconds.

If we cast 128.2 to double, the value won't be changed. I guess the reason is that DataFusion performs type coercion between float32 and float64 when combining two select plans.

from arrow-datafusion.

goldmedal avatar goldmedal commented on July 23, 2024

I wanted to share something I found. I tried to observe the plan transformation step by step:
sql_to_plan -> analyze -> optimize

SQL: select 128.2 union all select 128.2::float
********
Original LogicalPlan:
 Union
  Projection: Float64(128.2) AS Float64(128.2)
    EmptyRelation
  Projection: CAST(CAST(Float64(128.2) AS Float32) AS Float64) AS Float64(128.2)
    EmptyRelation
********
Do some analyze:
 Union
  Projection: Float64(128.2) AS Float64(128.2)
    EmptyRelation
  Projection: CAST(CAST(Float64(128.2) AS Float32) AS Float64) AS Float64(128.2)
    EmptyRelation
********
Do some optimize:
 Union
  Projection: Float64(128.2) AS Float64(128.2)
    EmptyRelation
  Projection: Float64(128.1999969482422) AS Float64(128.2)
    EmptyRelation
********

I found that CAST is removed after the plan is optimized. I'm not sure which optimization rule does this—maybe unwrap_cast_in_comparison.rs? I'm not sure. I'll keep tracking it tomorrow.

from arrow-datafusion.

goldmedal avatar goldmedal commented on July 23, 2024

After some testing, I found it could be an arrow-cast issue. When trying to cast a float32 to float64, the precision is missing.
We can easy to reproduce this case like:

> select (128.2::float)::double
;
+-------------------+
| Float64(128.2)    |
+-------------------+
| 128.1999969482422 |
+-------------------+
1 row(s) fetched. 
Elapsed 0.089 seconds.

I found it happens in SimplifyExpressions rule. SimpleExpr will try to evaluate the casting expression.

let cast_arr = cast_with_options(&self.to_array()?, data_type, &cast_options)?;

cast_to will invoke the arrow-cast to get the result.
It can also be reproduced like

    let expr2 = ScalarValue::Float64(Some(128.2))
    .cast_to(&DataType::Float32)
    .unwrap()
    .cast_to(&DataType::Float64)
    .unwrap();
    println!("expr2: {:?}", expr2);
------------------------------------
expr2: Float64(128.1999969482422)

It happens in arrow-cast-51.0.0/src/cast.rs (I'm not sure where is the code placed).

(Float16, Float32) => cast_numeric_arrays::<Float16Type, Float32Type>(array, cast_options),

I think it maybe not a DataFusion issue but a arrow-cast issue.

@comphead @viirya What do you think?

from arrow-datafusion.

goldmedal avatar goldmedal commented on July 23, 2024

Indeed, it's a conversion issue. I tried to dig into the root cause and found that arrow-cast uses the crate num to handle casting problems. I tried to use it directly:

    let value_32: f32 = 128.2;
    println!("value_32: {:?}", value_32);
    let value_64: f64 = num::cast(value_32).unwrap();
    println!("value_64: {:?}", value_64);
-------------------
value_32: 128.2
value_64: 128.1999969482422

I have no idea how to fix it in DataFusion. Maybe we need to go back to the crate num to fix this behavior?
@comphead WDYT?

from arrow-datafusion.

comphead avatar comphead commented on July 23, 2024

Well, that might be some representation issue.

    println!("value_64: {:?}", 128.2_f32 as f64);

from arrow-datafusion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.