Git Product home page Git Product logo

Comments (4)

jpivarski avatar jpivarski commented on May 29, 2024

It's not surprising that some mass calculations return nan, especially if they're rare. The mass is a square root of a subtraction of two potentially large numbers, so a small fractional error in energy and momentum can make the argument of the square root slightly negative, and then you get nan.

Is the odd thing that whether you get nan or not depends on what order you apply the [0], [msk], and .mass? That does sound wrong: numerical round-off is one thing, but the same numbers should be applied to each other in every case—you should either always get nan or never for different ways of expressing the same quantity in different ways. If there's some sort of off-by-one error in which energy elements are applied to which momentum elements, that's a big issue and needs to be addressed.

Is it possible to make a reproducible example that doesn't require the original data files? Like, can you create Lorentz values that show this issue using fromiter or by explicitly building JaggedArrays? It looks like you've isolated the particular vectors that show the issue.

Even when the mass isn't nan, it's suspiciously close to 1× the muon mass. That's weird. Even if the muons are nearly collinear, you'd get 2× the muon mass...

from awkward-0.x.

mverzett avatar mverzett commented on May 29, 2024

Hi @jpivarski , thanks for looking into it. I tried to export back to ROOT (both float32 and 64), but that washes away the issue. Indeed it seems is a numerical rounding plus something more.

I isolated the problematic vectors here. There are two vectors, v1 and v2, each of three elements. If summed they yield the issue in the second element. I left it like this in case the issue actually starts in the sum, and kept other two values just for reference.

Thanks again!

import awkward
ff = awkward.load('test.awkd')
(ff['v1'] + ff['v2']).mass
# array([87.1220124 ,         nan, 65.57995346])
(ff['v1'] + ff['v2'])[1].mass
# 0.1089482379549958

from awkward-0.x.

jpivarski avatar jpivarski commented on May 29, 2024

Okay, first off: it's not a scary off-by-one error. Awkward/Uproot-Methods is not adding the wrong numbers together or anything like that.

It is a round-off error, and the reason you see a difference is because f12[1].mass creates a Python object with Python float types as x/y/z/t attributes and f12.mass[1] performs a operation on the NumPy arrays before extracting value [1]. Python float types are 64-bit. Your NumPy arrays happen to be 32-bit. NumPy preserves the 32-bit precision through the calculation, and this makes the difference between a small mass, resulting from the subtraction of large energy and momentum, and an imaginary mass, which np.sqrt returns as nan.

See below—I'm removing the np.sqrt part of the calculation for clarity:

>>> v12 = (ff['v1'] + ff['v2'])
>>> v12.t[1]**2 - v12.x[1]**2 - v12.y[1]**2 - v12.z[1]**2
0.011869718553498387
>>> (v12.t**2 - v12.x**2 - v12.y**2 - v12.z**2)[1]
-0.008600915200076997

When the np.sqrt is applied, the first case returns a small value and the second returns nan.

@henryiii This is a lesson for the Vector library going forward: single-Lorentz vector objects should have the same storage as arrays of Lorentz vectors. Their x/y/z/t values should probably be NumPy scalar views of the arrays they came from, and hopefully that will ensure that the same precision of operations is performed on single vectors as arrays of vectors.

Maybe Awkward1 needs to change: currently, it returns Python values if you select one element of an array. I kindof hate that feature of NumPy, though, because its scalars are not JSON-encodable, and you end up staring at the screen wondering why it can't turn the number 12 into JSON (because NumPy scalars get printed to the screen the same way as Python numbers).

Alternatively, we can have a policy of doing all our math in 64-bit precision. It's 2020 and that's not slower, even on most GPUs. 32-bit floats are important for data storage/packing on disk, but errors accumulate in a calculation. If you want to move this issue over to vector as a question of policy, that'd be good.

from awkward-0.x.

mverzett avatar mverzett commented on May 29, 2024

@jpivarski thanks for digging into that! Indeed I was mostly scared by a possible off-by-one or hidden function state failure, that's reassuring. My two cents would be indeed to move to 64bits for calculation.

Thanks again!

from awkward-0.x.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.