Comments (9)
I can confirm that the current master works fine with and without MPI if I comment out this section:
# Compute the total luminosity in the 9 GAMA bands
Star_internal_property_names=Luminosities,Luminosities,Luminosities,Luminosities,Luminosities,Luminosities,Luminosities,Luminosities,Luminosities,
Star_internal_property_index_in_file=0,1,2,3,4,5,6,7,8,
Star_internal_property_input_output_unit_conversion_factors=1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
Star_internal_property_calculation_type=aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,
Star_internal_property_output_units=unitless,unitless,unitless,unitless,unitless,unitless,unitless,unitless,unitless,
from velociraptor-stf.
I've reproduced this problem with the latest master
. Different runs give different problems though (double-free, invalid pointer, corrupted double-linked list, etc) which points to some type of memory corruption.
After a small debugging session I think I spotted what the problem could be. In MPISendReceiveFOFStarInfoBetweenThreads
the receive buffer proprecvbuff
is resized to numrecv
, but then it receives numrecv * numextrafields
values:
VELOCIraptor-STF/src/mpiroutines.cxx
Lines 2751 to 2755 in 8f380fc
Similar code is correctly coded in MPISendReceiveFOFHydroInfoBetweenThreads
though:
VELOCIraptor-STF/src/mpiroutines.cxx
Lines 2699 to 2703 in 8f380fc
Then it's wrong again in MPISendReceiveFOFBHInfoBetweenThreads
and MPISendReceiveFOFExtraDMInfoBetweenThreads
:
VELOCIraptor-STF/src/mpiroutines.cxx
Lines 2803 to 2807 in 8f380fc
VELOCIraptor-STF/src/mpiroutines.cxx
Lines 2855 to 2859 in 8f380fc
From the git history it seems like this problem has always been present since these routines were introduced (April 2020). The routine that behaves correctly does so because I fixed it on an earlier commit 082ff68 for a similar problem reported in #54. Back then I didn't realise this affected more than one routine, and now that it became clear I'll obviously go and fix them all. I'll try to unify a bit the code as well, but without being too disruptive.
from velociraptor-stf.
That all makes sense. Thanks.
Indeed, not related to the latest master. I tried to go back in time to find a version that works yesterday but couldn't. That got me really confused so I took a break.
But, one thing that is different, and that your test reveals, is that we are now making use of the extra star properties in the config file. That is something we have not used much.
I added these config options recently in our EAGLE setup and tested it without MPI. So the problem went unnoticed.
Josh then tried that same setup but in MPI-only mode (as he still gets hit by the negative density bug when using OMP) and that when it all went wrong.
Hopefully the solution you had worked out for the hydro case can be relatively easily transplanted here.
from velociraptor-stf.
@MatthieuSchaller I pushed the relevant changes to the issue-87
branch. I tested them against the dataset/config you provided and now the code gets past the original problem, so it seems like the issue is gone.
However, at dataset writing time there is a new crash due to some invalid size passed to the HDF5 routines. I'll look into that separately on a different issue, but it appears to be specific to parallel HDF5 writing, so deactivating that option might be a workaround. Given this new, additional problem we can't really confirm the issue in this ticket is really gone until we have a successful execution, so I'll refrain from merging for the time being until we have a fix for the second issue.
from velociraptor-stf.
Great. Let me know whether there is anything I can help with or test. I suppose there isn't much point before the i/o-time crash is solved but nevertheless can try if needed.
from velociraptor-stf.
@MatthieuSchaller a test that you can try, if the resulting files are useful, is to run with the latest issue-87
branch but without parallel HDF5 writing, thus hopefully avoiding #88.
from velociraptor-stf.
Yes, it all works smoothly.
from velociraptor-stf.
The version with parallel-hdf5 hangs while writing.
from velociraptor-stf.
Thanks @MatthieuSchaller for confirming the fix works. Since both you and I have seen the fix working separately I've merged it now the master
branch, so I'm closing this issue. I'll try to focus now on #88.
from velociraptor-stf.
Related Issues (20)
- Incorrectly sized buffer given for MPI_Bcast reception HOT 1
- Writing parallel properties file in hydro builds is broken HOT 6
- SO list offsets are wrong/counterintuitive HOT 6
- Inconsistent array names between properties files HOT 6
- SO list output too large and possibly wrong HOT 11
- Error in writing HDF5 outputs HOT 5
- Improve VR's memory usage for extra data in Particles
- DMO Zoom on-the-fly with SWIFT segfault HOT 4
- OpenMP bug in temperature calculations. HOT 5
- Memory usage blowing up in large DMO runs HOT 22
- Differences in halo masses when switching on/off substructure search HOT 2
- Mistakes in metallity calculations HOT 1
- Apparently wrong output when using Star_internal_property options
- HIGHRES needs undocumented Extensive_interloper_properties_output config option
- Can't write more than 2GB HDF5 datasets in parallel HOT 28
- More potential issues hidden in MPISendReceive*InfoBetweenThreads functions
- Uninitialised variables in PropData class
- Compiling with DVR_USE_GAS=ON but no other options doesn't work HOT 5
- Buffer overflow in PotentialTree with OpenMP HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from velociraptor-stf.