Comments (4)
It's not surprising that some mass calculations return nan
, especially if they're rare. The mass is a square root of a subtraction of two potentially large numbers, so a small fractional error in energy and momentum can make the argument of the square root slightly negative, and then you get nan
.
Is the odd thing that whether you get nan
or not depends on what order you apply the [0]
, [msk]
, and .mass
? That does sound wrong: numerical round-off is one thing, but the same numbers should be applied to each other in every case—you should either always get nan
or never for different ways of expressing the same quantity in different ways. If there's some sort of off-by-one error in which energy elements are applied to which momentum elements, that's a big issue and needs to be addressed.
Is it possible to make a reproducible example that doesn't require the original data files? Like, can you create Lorentz values that show this issue using fromiter
or by explicitly building JaggedArrays? It looks like you've isolated the particular vectors that show the issue.
Even when the mass isn't nan
, it's suspiciously close to 1× the muon mass. That's weird. Even if the muons are nearly collinear, you'd get 2× the muon mass...
from awkward-0.x.
Hi @jpivarski , thanks for looking into it. I tried to export back to ROOT (both float32 and 64), but that washes away the issue. Indeed it seems is a numerical rounding plus something more.
I isolated the problematic vectors here. There are two vectors, v1
and v2
, each of three elements. If summed they yield the issue in the second element. I left it like this in case the issue actually starts in the sum, and kept other two values just for reference.
Thanks again!
import awkward
ff = awkward.load('test.awkd')
(ff['v1'] + ff['v2']).mass
# array([87.1220124 , nan, 65.57995346])
(ff['v1'] + ff['v2'])[1].mass
# 0.1089482379549958
from awkward-0.x.
Okay, first off: it's not a scary off-by-one error. Awkward/Uproot-Methods is not adding the wrong numbers together or anything like that.
It is a round-off error, and the reason you see a difference is because f12[1].mass
creates a Python object with Python float
types as x/y/z/t attributes and f12.mass[1]
performs a operation on the NumPy arrays before extracting value [1]
. Python float
types are 64-bit. Your NumPy arrays happen to be 32-bit. NumPy preserves the 32-bit precision through the calculation, and this makes the difference between a small mass, resulting from the subtraction of large energy and momentum, and an imaginary mass, which np.sqrt
returns as nan
.
See below—I'm removing the np.sqrt
part of the calculation for clarity:
>>> v12 = (ff['v1'] + ff['v2'])
>>> v12.t[1]**2 - v12.x[1]**2 - v12.y[1]**2 - v12.z[1]**2
0.011869718553498387
>>> (v12.t**2 - v12.x**2 - v12.y**2 - v12.z**2)[1]
-0.008600915200076997
When the np.sqrt
is applied, the first case returns a small value and the second returns nan
.
@henryiii This is a lesson for the Vector library going forward: single-Lorentz vector objects should have the same storage as arrays of Lorentz vectors. Their x/y/z/t values should probably be NumPy scalar views of the arrays they came from, and hopefully that will ensure that the same precision of operations is performed on single vectors as arrays of vectors.
Maybe Awkward1 needs to change: currently, it returns Python values if you select one element of an array. I kindof hate that feature of NumPy, though, because its scalars are not JSON-encodable, and you end up staring at the screen wondering why it can't turn the number 12
into JSON (because NumPy scalars get printed to the screen the same way as Python numbers).
Alternatively, we can have a policy of doing all our math in 64-bit precision. It's 2020 and that's not slower, even on most GPUs. 32-bit floats are important for data storage/packing on disk, but errors accumulate in a calculation. If you want to move this issue over to vector as a question of policy, that'd be good.
from awkward-0.x.
@jpivarski thanks for digging into that! Indeed I was mostly scared by a possible off-by-one or hidden function state failure, that's reassuring. My two cents would be indeed to move to 64bits for calculation.
Thanks again!
from awkward-0.x.
Related Issues (20)
- dynamically created methods are confusing for users HOT 1
- Achieve masking HOT 8
- AssertionError when Table is part of a list HOT 5
- Potential bug with subsequent masking HOT 2
- Reduction of empty elements HOT 2
- IndexError when masking empty jaggedArray made from offsets HOT 1
- awkward method names HOT 9
- TypeError when using array.mean(weights) HOT 2
- Cyclic array? HOT 1
- broken link in readme HOT 1
- Installing awkward-numba in usermode breaks awkward HOT 3
- Syntax warning due to comparison of literals using is in Python 3.8 HOT 1
- Inconsistent Filesizes with .awkd Files HOT 6
- Bug in string comparison in StringArray HOT 1
- mean, std fail on ChunkedArrays HOT 1
- AttributeError when trying to read a particular format of awkward array HOT 5
- JaggedArray.fromiter() functions fails for python lists HOT 2
- Small detail; broadcasting seems to work a little different to what is implied in the documentation. HOT 6
- Accumulate numpy arrays inside the loop HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from awkward-0.x.