pigimonaco / pinocchio Goto Github PK
View Code? Open in Web Editor NEWPINpointing Orbit Crossing Collapsed Hierarchical Objects
Home Page: http://adlibitum.oats.inaf.it/monaco/pinocchio.html
License: GNU General Public License v2.0
PINpointing Orbit Crossing Collapsed Hierarchical Objects
Home Page: http://adlibitum.oats.inaf.it/monaco/pinocchio.html
License: GNU General Public License v2.0
Hello,
As discussed in our online meeting last week, we would like to use the new Pinocchio version 5 (currently in the fivedotzero
branch) for our simulations related to the Square Kilometer Array, but we find it to be less stable than version 4.1.3 (current master
).
In runs on the HPC cluster, the crashes are not 100% reproducible. But they tend to happen more often in large runs where we (naturally) max out the memory per node. This instability makes us hesitant to use v5 instead of v4.
I have noticed that, when I turn on the Address Sanitizer at compile time and run the small example that ships with Pinocchio (in the example
folder) locally, it flags parts of the code where crashes often occur in the large runs on the HPC cluster. So I think it is worth taking a closer look.
In the interest of reproducibility, I'm attaching a Dockerfile
and Makefile
that I used to diagnose possible memory errors. Remove the .txt
ending after downloading those files, GitHub wouldn't let me upload without the extension. When on the fivedotzero
branch, put them into the project root folder.
Build Pinocchio inside the Docker container with:
docker build --tag pinocchio_v5 .
Then run the example with:
docker run --interactive --tty --rm --volume $(pwd):/cwd --workdir /cwd/example pinocchio_v5 pinocchio parameter_file
This Docker container is very similar to the one in which I deploy Pinocchio on the cluster, via Sarus. It uses the MPI implementation (MPICH 3.1.4) that is ABI compatible with what's installed natively (i.e. outside the container) on the cluster itself. The Makefile
overrides the one in the src
folder, and ultimately just compiles with the debug flags that include the Address Sanitizer (-fsanitize=address
).
Here is the (shortened) output from running the example:
❯ docker run --interactive --tty --rm --volume $(pwd):/cwd --workdir /cwd/example pinocchio_v5 pinocchio parameter_file
[Wed Oct 25 2023 08:58:38] This is pinocchio V5.0, running on 1 MPI tasks
This version uses 3LPT displacements
Radiation is included in the Friedmann equations
Ellipsoidal collapse will be computed as Monaco (1995)
Reading parameters from file parameter_file
Flag for this run: example
…
[Wed Oct 25 2023 08:59:39] Storing velocities
[Wed Oct 25 2023 08:59:39] Done computing velocities, cpu time = 0.179195 s
[Wed Oct 25 2023 08:59:40] Number of collapsed particles to z=0: 3343409
[Wed Oct 25 2023 08:59:40] Finishing fmax, total fmax cpu time = 55.598078
IO : 0.000000 ( 55.598077 total time without I/O)
FFT : 15.093578
COLLAPSE : 25.301670
[Wed Oct 25 2023 08:59:40] Second part: fragmentation of the collapsed medium
[Wed Oct 25 2023 08:59:40] Task 0 reallocated memory for 1.117586 Gb
[Wed Oct 25 2023 08:59:40] Creating map of needed particles
=================================================================
==1==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f7563d580b0 at pc 0x7f75c25d4681 bp 0x7fff09fa0c30 sp 0x7fff09fa03e0
WRITE of size 1000000 at 0x7f7563d580b0 thread T0
#0 0x7f75c25d4680 in __interceptor_memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:799
#1 0x564414111fd3 in create_map /source/Pinocchio/fragment.c:801
#2 0x56441410e13b in fragment /source/Pinocchio/fragment.c:299
#3 0x56441410de20 in fragment_driver /source/Pinocchio/fragment.c:135
#4 0x5644140d5d77 in main /source/Pinocchio/pinocchio.c:212
#5 0x7f75c1d3d1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x271c9)
#6 0x7f75c1d3d284 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x27284)
#7 0x5644140d5820 in _start (/usr/bin/pinocchio+0x19820)
0x7f7563d580b0 is located 0 bytes to the right of 1199999152-byte region [0x7f751c4ef800,0x7f7563d580b0)
allocated by thread T0 here:
#0 0x7f75c26448d5 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:85
#1 0x564414106171 in reallocate_memory_for_fragmentation /source/Pinocchio/allocations.c:555
#2 0x56441410de10 in fragment_driver /source/Pinocchio/fragment.c:132
#3 0x5644140d5d77 in main /source/Pinocchio/pinocchio.c:212
#4 0x7f75c1d3d1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x271c9)
SUMMARY: AddressSanitizer: heap-buffer-overflow ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:799 in __interceptor_memset
Shadow bytes around the buggy address:
0x0fef2c7a2fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0fef2c7a2fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0fef2c7a2fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0fef2c7a2ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0fef2c7a3000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0fef2c7a3010: 00 00 00 00 00 00[fa]fa fa fa fa fa fa fa fa fa
0x0fef2c7a3020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0fef2c7a3030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0fef2c7a3040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0fef2c7a3050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0fef2c7a3060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==1==ABORTING
Here, the Address Sanitizer outright aborts the run due to a heap buffer overflow when it gets to this piece of code:
Lines 800 to 801 in e5d3780
By contrast, with v4 (current master
) and a few straightforward fixes to make it compile inside that container, it runs until the very end ("Pinocchio done!") and only afterwards reports a few "detected memory leaks", which are not critical.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.