icrar / velociraptor-stf Goto Github PK
View Code? Open in Web Editor NEWThis project forked from pelahi/velociraptor-stf
Galaxy/(sub)Halo finder for N-body simulations
License: MIT License
This project forked from pelahi/velociraptor-stf
Galaxy/(sub)Halo finder for N-body simulations
License: MIT License
Describe the bug
Running VR with MPI switched on, OMP switched off and hydro switched on breaks on EAGLE boxes.
To Reproduce
VR_USE_HYDRO=ON
, VR_MPI=ON
and VR_OPENMP=OFF
.mpirun -np 16 stf -C vrconfig_3dfof_subhalos_SO_hydro.cfg -i eagle_0036 -o halos_mpi_0036 -I 2
This is a standard XL snapshot with our standard config file.
The code segfaults after printing
[0000] [ 346.430] [ info] search.cxx:352 Finished linking across MPI domains in 1.258 [min]
somewhere in a MPI call trying to clear a vector of size 10^16 (!!).
The input can be found here if necessary: /snap7/scratch/dp004/jlvc76/SWIFT/EoS_tests/swiftsim/examples/EAGLE_ICs/EAGLE_25/eagle_0036.hdf5
and /snap7/scratch/dp004/jlvc76/SWIFT/EoS_tests/swiftsim/examples/EAGLE_ICs/EAGLE_25/vrconfig_3dfof_subhalos_SO_hydro.cfg
.
The same setup works either:
Note that the snapshot is made of one single file if that is relevant.
(duplicated from the public repo)
From the config file we can currently request (among others):
Aperture totals
Average mass weighted
It would be great if we could add an "aperture mass-weighted average" to the list. This would let us then trivially construct HI mass functions for instance.
While reading the code I found that Option.gas_sft_threshold
is never initialised: it doesn't have a default value (so it cannot be assumed it takes a fixed value at startup), and there's no place in the code that writes into it (either from the command line, configuration file, or input data files). On the other hand the value is used in a few places in the code, particularly in few of the functions in substructureproperties.cxx
.
$> grep -RIn gas_sfr_threshold src/
src/substructureproperties.cxx:616: if (SFR>opt.gas_sfr_threshold) pdata[i].M_gas_sf+=mval;
src/substructureproperties.cxx:641: if (SFR>opt.gas_sfr_threshold) {
src/substructureproperties.cxx:693: if (SFR>opt.gas_sfr_threshold) {
src/substructureproperties.cxx:871: if (SFR>opt.gas_sfr_threshold){
src/substructureproperties.cxx:1473: if (SFR>opt.gas_sfr_threshold) pdata[i].M_gas_sf+=mval;
src/substructureproperties.cxx:1546: if (SFR > opt.gas_sfr_threshold) {
src/substructureproperties.cxx:1754: if (SFR>opt.gas_sfr_threshold) {
src/substructureproperties.cxx:1809: if (SFR>opt.gas_sfr_threshold) {
src/substructureproperties.cxx:5637: if (SFR>opt.gas_sfr_threshold) {
src/substructureproperties.cxx:5841: if (SFR>opt.gas_sfr_threshold) EncMassGasSF+=mass;
src/substructureproperties.cxx:5891: if (SFR>opt.gas_sfr_threshold) oldrc_gas_sf=rc;
src/substructureproperties.cxx:5993: if (SFR>opt.gas_sfr_threshold) {
src/substructureproperties.cxx:6071: if (SFR>opt.gas_sfr_threshold) EncMassGasSF+=mass;
src/substructureproperties.cxx:6116: if (SFR>opt.gas_sfr_threshold) oldrc_gas_sf=rc;
src/substructureproperties.cxx:6202: if (Pval->GetSFR()>opt.gas_sfr_threshold)
src/substructureproperties.cxx:6238: if (sfrval>opt.gas_sfr_threshold)
src/substructureproperties.cxx:6274: if (Pval->GetSFR()>opt.gas_sfr_threshold)
src/substructureproperties.cxx:6311: if (sfrval>opt.gas_sfr_threshold)
src/allvars.h:775: Double_t gas_sfr_threshold;
Given the variable is uninitialised, this potentially means some results (basically whatever happens within those if
s) cannot be guaranteed to be consistent across different compilations/settings/platforms/etc.
The fix should be easy: firstly, make sure the variable has a default value, and secondly, we'll probably need a way to assign arbitrary values for this threshold.
@MatthieuSchaller, I have two questions about this:
0
is a good default value, so I'll go with that. For setting arbitrary values: do you reckon they are supposed to be given via the configuration file (via a new setting), or should they somehow be computed from the input data?(copied over from the public github repo)
There seems to be something not quite correct with the field tage_star
in the output catalogues.
The data is in the range 10^8
- 10^9
but the units are reported as being internal time units, which would typically be something like 9*10^9 years already.
I don't know whether the issue is in the calculation of the values or whether the units displayed are incorrect.
Also, just to be sure, how are the ages defined? Is there any weighting of any kind for instance?
Compilation of hot_gas_properties
branch fails with the following traceback:
/cosma/home/dp004/dc-alta2/data7/xl-zooms/hydro/VELOCIraptor-STF/src/substructureproperties.cxx(593): error: class "NBody::Particle" has no member "GetTemperature"
temp = Pval->GetTemperature();
^
/cosma/home/dp004/dc-alta2/data7/xl-zooms/hydro/VELOCIraptor-STF/src/substructureproperties.cxx(1418): error: class "NBody::Particle" has no member "GetTemperature"
temp=Pval->GetTemperature();
^
/cosma/home/dp004/dc-alta2/data7/xl-zooms/hydro/VELOCIraptor-STF/src/substructureproperties.cxx(5140): error: class "NBody::Particle" has no member "GetTemperature"
temp = Pval->GetTemperature()*mass;
^
/cosma/home/dp004/dc-alta2/data7/xl-zooms/hydro/VELOCIraptor-STF/src/hdfio.cxx(1609): error: class "NBody::Particle" has no member "SetTemperature"
for (int nn=0;nn<nchunk;nn++) Part[count++].SetTemperature(doublebuff[nn]);
^
/cosma/home/dp004/dc-alta2/data7/xl-zooms/hydro/VELOCIraptor-STF/src/hdfio.cxx(1627): error: class "NBody::Particle" has no member "SetTemperature"
for (int nn=0;nn<nchunk;nn++) Pbaryons[bcount++].SetTemperature(doublebuff[nn]);
^
compilation aborted for /cosma/home/dp004/dc-alta2/data7/xl-zooms/hydro/VELOCIraptor-STF/src/substructureproperties.cxx (code 2)
make[2]: *** [src/CMakeFiles/velociraptor.dir/substructureproperties.cxx.o] Error 2
make[2]: *** Waiting for unfinished jobs....
compilation aborted for /cosma/home/dp004/dc-alta2/data7/xl-zooms/hydro/VELOCIraptor-STF/src/hdfio.cxx (code 2)
make[2]: *** [src/CMakeFiles/velociraptor.dir/hdfio.cxx.o] Error 2
make[1]: *** [src/CMakeFiles/velociraptor.dir/all] Error 2
make: *** [all] Error 2
I am using a fresh and updated clone of the repo and compiling with the following modules on cosma7:
module purge
module load cmake/3.18.1
module load intel_comp/2020-update2
module load intel_mpi/2020-update2
module load ucx/1.8.1
module load parmetis/4.0.3-64bit
module load parallel_hdf5/1.10.6
module load fftw/3.3.8cosma7
module load gsl/2.5
I am using the following cmake
flags:
cmake . -DVR_USE_HYDRO=ON \
-DVR_USE_SWIFT_INTERFACE=OFF \
-DCMAKE_CXX_FLAGS="-fPIC" \
-DCMAKE_BUILD_TYPE=Release \
-DVR_ZOOM_SIM=ON \
-DVR_MPI=OFF
make -j
Thank you in advance for looking into this! Please, let me know if I can provide further info for understanding the issue.
I am looking at adding H_2 and HI masses in the catalogs, and especially to the aperture measurements.
Some advice would be nice here.
I could:
In spirit, it's similar to the SFR we accumulate but is more specific to SWIFT-EAGLE++.
@pelahi any thoughts on what the best choice would be to keep things tidy before I start typing?
As mentioned in a couple of comments in #15, when trying to write data for extra properties (gas/bh/stars) VR crashes. This problem happens not only for BH or star data, but also for gas data.
To reproduce follow the same steps outlined in #15. Locally I'm running an OpenMP-disabled build with the following command line:
./stf -C ~/icrar/vr/EAGLE-XL/vrconfig_3dfofbound_subhalos_SO_hydro.cfg -i ~/icrar/vr/EAGLE-XL/colibre_2729 -o output -I 2 -s 16
In this example I have only the following extra properties on:
Gas_internal_property_names=ElementMassFractions,SpeciesFractions,SpeciesFractions,SpeciesFractions,
Gas_internal_property_index_in_file=0,0,1,2,
Gas_internal_property_input_output_unit_conversion_factors=1.0,1.0,1.0,1.0
Gas_internal_property_calculation_type =averagemassweighted,averagemassweighted,averagemassweighted,averagemassweighted,
Gas_internal_property_output_units=unitless,unitless,unitless,unitless,
Almost at the end of the VR execution the program fails:
[...]
0 Sort particles and compute properties of 5502 objects
0 Calculate properties using minimum potential particle as reference
0 Sort particles by binding energy
Memory report, func = SortAccordingtoBindingEnergy--line--4661 task = 0 : Average = 6.480034 GB, Data = 6.343040 GB, Dirty = 0.000000 GB, Library = 0.000000 GB, Peak = 6.958450 GB, Resident = 6.329998 GB, Shared = 0.004063 GB, Size = 6.531044 GB, Text = 0.002087 GB,
0 getting CM
0 Done getting CM in 0.095405
Done FOF masses 0.000417
0 getting energy
0 Have calculated potentials 208.703257
0Done getting energy in 208.885781
0 getting bulk properties
terminate called after throwing an instance of 'std::bad_array_new_length'
what(): std::bad_array_new_length
Thread 1 "stf" received signal SIGSEGV, Segmentation fault.
__libc_signal_block_app (set=0x7fffffffaad8) at ../sysdeps/unix/sysv/linux/internal-signals.h:75
75 ../sysdeps/unix/sysv/linux/internal-signals.h: No such file or directory.
(gdb) bt
#0 __libc_signal_block_app (set=0x7fffffffaad8) at ../sysdeps/unix/sysv/linux/internal-signals.h:75
#1 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:40
#2 0x00007ffff7118859 in __GI_abort () at abort.c:79
#3 0x00007ffff739e951 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff73aa47c in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff73aa4e7 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff73aa799 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x00007ffff739e426 in __cxa_throw_bad_array_new_length () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x0000555555590b5a in Math::GMatrix::operator= (m=<error reading variable: Cannot access memory at address 0xffffffff7fffffff>, this=0x80000000) at /home/rtobar/scm/git/VELOCIraptor-STF/src/substructureproperties.cxx:5906
#9 CalcPhaseSigmaTensor (n=<optimised out>, p=<optimised out>, I=..., itype=<optimised out>) at /home/rtobar/scm/git/VELOCIraptor-STF/src/substructureproperties.cxx:3718
#10 0x00005555556fa7c2 in GetProperties (opt=..., nbodies=<optimised out>, Part=0x7fff69434010, ngroup=5502, pfof=<optimised out>, numingroup=@0x7fffffffb398: 0x5556607bcb90, pdata=@0x7fffffffb390: 0x5556554b8ed8, noffset=@0x7fffffffb3a8: 0x555660396210)
at /home/rtobar/scm/git/VELOCIraptor-STF/src/substructureproperties.cxx:1050
#11 0x000055555570396f in SortAccordingtoBindingEnergy (opt=..., nbodies=13289344, Part=0x7fff69434010, ngroup=5502, pfof=@0x7fffffffb508: 0x7fff5c96c010, numingroup=<optimised out>, pdata=<optimised out>, ioffset=0) at /home/rtobar/scm/git/VELOCIraptor-STF/src/substructureproperties.cxx:4671
#12 0x000055555559efc5 in main (argc=<optimised out>, argv=<optimised out>) at /home/rtobar/scm/git/VELOCIraptor-STF/src/main.cxx:516
So far the only detail I can add is that in my environment this does not occur when using low levels of optimization (cmake -DCMAKE_BUILD_TYPE=Debug
). Because of this it's getting a bit difficult to debug: when running with optimizations on many variables/stackframes are not properly visible or get mixed up, but when building for debugging the problem goes away. I'm currently trying to find a middle ground that lets me get more information.
Describe the bug
Latest master.
Adding
Gas_internal_property_names=XXX,
Gas_internal_property_index_in_file=0,
Gas_internal_property_input_output_unit_conversion_factors=1.0e10,
Gas_internal_property_calculation_type=aperture_total,
Gas_internal_property_output_units=solar_mass,
for any existing field XXX breaks. The code does not read in the field and dies rapidly when trying to write the config.
If instead I use
Gas_internal_property_calculation_type=max,
then everything works. The same is true for any of the options apart from aperture_total
and aperture_average
which breaks in the same way.
If I change the value of CALCQUANTITYAPERTURETOTAL
to 19 in allvars.h (was -1) then the fields get read in.
(That whole section feels like it should really be an enum type!)
I don't know whether that is a proper fix though as further down the line, the extra calculation type is used modulo some quantity (e.g. in ExtraPropInitValue() from substructureproperties.cxx).
This is an issue on my MacBook pro after upgrading to HDF5 1.12. The compiler is clang12. It appears that an additional parameter is now required by H5Oget_info
and H5Oget_info_by_name
.
After the most basic invocation of cmake (to make sure the code compiled after updating a number of libraries), I ran make and hit the following error;
[ 45%] Building CXX object src/CMakeFiles/velociraptor.dir/hdfio.cxx.o
In file included from /Users/cpower/Codes/VELOCIraptor-STF/src/hdfio.cxx:28:
/Users/cpower/Codes/VELOCIraptor-STF/src/hdfitems.h:161:9: error: no matching
function for call to 'H5Oget_info_by_name3'
H5Oget_info_by_name(ids.back(), parts[0].c_str(), &object_info, ...
^~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/hdf5/1.12.0_1/include/H5version.h:772:31: note: expanded from
macro 'H5Oget_info_by_name'
#define H5Oget_info_by_name H5Oget_info_by_name3
^~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/hdf5/1.12.0_1/include/H5Opublic.h:188:15: note: candidate
function not viable: requires 5 arguments, but 4 were provided
H5_DLL herr_t H5Oget_info_by_name3(hid_t loc_id, const char *name, H5O_i...
I fixed this by updating the instances in hdfitems.h
- namely, at line 161
, changing
H5Oget_info_by_name(ids.back(), parts[0].c_str(), &object_info, H5P_DEFAULT);
to
H5Oget_info_by_name(ids.back(), parts[0].c_str(), &object_info, H5O_INFO_ALL, H5P_DEFAULT);
and at line 203
, changing
H5Oget_info(id, &object_info);
to
H5Oget_info(id, &object_info,H5O_INFO_ALL);
I used the following to identify the appropriate fixes;
https://portal.hdfgroup.org/display/HDF5/H5O_GET_INFO_BY_NAME3
https://portal.hdfgroup.org/display/HDF5/H5O_GET_INFO3
This hasn't affected versions of VR running on e.g. NCI Gadi - there the version of HDF5 is still 1.10.
VR is crashing in WriteSOCatalog
with the following message:
[0000] [1445.514] [ info] io.cxx:1292 Saving SO particle lists to halos-2.catalog_SOlist.0
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
/var/slurm/slurmd/job2998665/slurm_script: Zeile 4: 16915 Abgebrochen builds/53/stf -i /cosma/home/dc-borr1/c7dataspace/XL_wave_1/runs_correct_dt/Run_0/eagle_0000 -C /cosma/home/dc-borr1/c7dataspace/XL_wave_1/runs_correct_dt/vrconfig_3dfof_subhalos_SO_hydro.cfg -I 2 -o halos-2
This feels like a bug introduced with #57, but it will require a bit of investigation.
The ifdef on this line
VELOCIraptor-STF/src/hdfitems.h
Line 846 in bab7659
seems to be incorrect. It should likely be USEHDFCOMPRESSION
.
In #87 it was found that the four MPISendReceiveFOF*InfoBetweenThreads
functions all had a bug in their logic (which was duplicated across all functions) where an input buffer was not sized correctly for the amount of data it received via MPI. A fix was issued for this that both solved the buffer size issue and also removed the code duplication by providing a single function that performed the data exchange. It was also found that #54 had fixed one of those functions already, but had failed to identify the broader problem affecting all four functions.
After fixing #87 I went and had another look at the rest of the functions in this file (mpiroutines.cxx
). I realised there are several MPISendReceive*InfoBetweenThreads
families of functions, namely:
MPISendReceive<component>InfoBetweenThreads
MPISendReceiveBuffWith<component>InfoBetweenThreads
MPISendReceiveFOF<component>InfoBetweenThreads
(<component>
are Hydro
, Star
, BH
and ExtraDM
)
From this list, the last item are the functions fixed in #87. The rest however seem to also follow a similar structure, and from a quick overview they also contain a copy of the same data exchange pattern (some with and some without the same buffer sizing bug) that was fixed and consolidated as a reusable function in #87. We should revisit these functions and try to reuse the new function where possible, thus minimizing the chances of running into memory corruption issues again.
Describe the bug
There appears to be some memory leaks when running VR via on-the-fly calls within SWIFT.
I have been using GCC's address sanitizer to get more info about possible allocations that
are not being freed.
To Reproduce
Steps to reproduce the behavior:
Results
Without MPI, the run completes and the memory sanitizer reports this:
=================================================================
==169982==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 344 byte(s) in 2 object(s) allocated from:
#0 0x7fbf1aad5b47 in operator new[](unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10fb47)
#1 0x564ec73ad936 in BuildPGList(long long, long long, long long*, long long*, NBody::Particle*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/buildandsortarrays.cxx:71
#2 0x564ec746f09d in SearchSubset(Options&, long long, long long, NBody::Particle*, long long&, long long, long long*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:1400
#3 0x564ec748e1df in SearchSubSub(Options&, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle> >&, long long*&, long long&, long long&, PropData*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:2991
#4 0x564ec71b164d in InvokeVelociraptorHydro /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:537
#5 0x564ec71adf23 in InvokeVelociraptor /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:283
#6 0x564ec711c969 in velociraptor_invoke /home/matthieu/Desktop/Swift-git/io/swiftsim/src/velociraptor_interface.c:1018
#7 0x564ec70dce49 in engine_check_for_dumps /home/matthieu/Desktop/Swift-git/io/swiftsim/src/engine_io.c:453
Direct leak of 192 byte(s) in 1 object(s) allocated from:
#0 0x7fbf1aad5b47 in operator new[](unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10fb47)
#1 0x564ec73ad936 in BuildPGList(long long, long long, long long*, long long*, NBody::Particle*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/buildandsortarrays.cxx:71
#2 0x564ec746f09d in SearchSubset(Options&, long long, long long, NBody::Particle*, long long&, long long, long long*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:1400
#3 0x564ec748e1df in SearchSubSub(Options&, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle> >&, long long*&, long long&, long long&, PropData*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:2991
#4 0x564ec71b164d in InvokeVelociraptorHydro /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:537
#5 0x564ec71adf23 in InvokeVelociraptor /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:283
#6 0x564ec711c969 in velociraptor_invoke /home/matthieu/Desktop/Swift-git/io/swiftsim/src/velociraptor_interface.c:1018
#7 0x564ec70dced4 in engine_check_for_dumps /home/matthieu/Desktop/Swift-git/io/swiftsim/src/engine_io.c:400
Direct leak of 128 byte(s) in 8 object(s) allocated from:
#0 0x7fbf1aad5b47 in operator new[](unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10fb47)
#1 0x564ec74ba063 in CleanAndUpdateGroupsFromSubSearch(Options&, long long&, NBody::Particle*, long long*&, long long&, long long*&, long long**&, long long&, long long*&, long long*&, long long&, long long&) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:\
2772
#2 0x564ec748e383 in SearchSubSub(Options&, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle> >&, long long*&, long long&, long long&, PropData*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:2993
#3 0x564ec71b164d in InvokeVelociraptorHydro /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:537
#4 0x564ec71adf23 in InvokeVelociraptor /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:283
#5 0x564ec711c969 in velociraptor_invoke /home/matthieu/Desktop/Swift-git/io/swiftsim/src/velociraptor_interface.c:1018
#6 0x564ec70dced4 in engine_check_for_dumps /home/matthieu/Desktop/Swift-git/io/swiftsim/src/engine_io.c:400
Direct leak of 80 byte(s) in 5 object(s) allocated from:
#0 0x7fbf1aad5b47 in operator new[](unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10fb47)
#1 0x564ec74ba063 in CleanAndUpdateGroupsFromSubSearch(Options&, long long&, NBody::Particle*, long long*&, long long&, long long*&, long long**&, long long&, long long*&, long long*&, long long&, long long&) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:\
2772
#2 0x564ec748e383 in SearchSubSub(Options&, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle> >&, long long*&, long long&, long long&, PropData*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:2993
#3 0x564ec71b164d in InvokeVelociraptorHydro /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:537
#4 0x564ec71adf23 in InvokeVelociraptor /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:283
#5 0x564ec711c969 in velociraptor_invoke /home/matthieu/Desktop/Swift-git/io/swiftsim/src/velociraptor_interface.c:1018
#6 0x564ec70dce49 in engine_check_for_dumps /home/matthieu/Desktop/Swift-git/io/swiftsim/src/engine_io.c:453
(And also two SWIFT-related calls which are not leaks. These are just not easy to free so we don't clean up at the end of a run.)
Not much is leaked here by the VR invokations but it could nevertheless be a sign of a larger problem in bigger runs.
I don't know enough about these code sections in VR to know whether these are big problems or not.
I am running VR (version 1.60) on low resolution SWIFT runs, this particular box is 100^3Mpc with 180^3 particles (M_dm = 5.45e9, M_gas = 1.02e9). When analysing one of the observables we are interested in, namely baryon/gas fractions of halos, the baryon fraction was a lot higher than expected.
Upon further inspection it appears the values of the values obtained for M_200_crit_star (And possibly also M_200_mean_star, M_500_star etc.) are systematically higher than what would be expected.
To show this I compared the M_200 obtained from VR for the different components with the naive method of summing the mass of all particles of that component found within R_200_crit obtained from VR as can be seen in the figure below
For DM the results have some difference but are relatively close to the black line (these are also a lot more likely to be sufficiently sampled) For stars, and to a much lesser extent also the gas, the values obtained from the sum are systematically lower. For gas the difference might just be the systematics of my naive method but for stars the difference is very large.
Here are some examples of halos with very high baryons fractions and large differences between the sum of stellar mass in R_200_crit and M_200_crit_star
log(M_200) [Msun] of this halo: 12.182202934684451
log(M_200_gas) [Msun] of this halo: 11.629046093260513
log(M_200_stars) [Msun] of this halo: 11.64550349357618
r_200 [kpc] of this halo: 237.24077771516048
Dark matter mass (Sum) [log10 Msun] = 12.249682
Gas mass (Sum) [log10 Msun] = 11.431889
Stellar mass (Sum) [log10 Msun] = 11.05762
log(M_200) [Msun] of this halo: 11.282095177295409
log(M_200_gas) [Msun] of this halo: 10.755906558809153
log(M_200_stars) [Msun] of this halo: 10.783129379073438
r_200 [kpc] of this halo: 118.89221547906023
Dark matter mass (Sum) [log10 Msun] = 11.440651
Gas mass (Sum) [log10 Msun] = 10.576609
Stellar mass (Sum) [log10 Msun] = 9.912582
If you need more information please let me know.
Describe the bug
Compiling with VR_USE_GAS=ON
(but not the rest of VR_USE_HYDRO=ON
) doesn't work.
To Reproduce
cmake .. -DVR_USE_GAS=ON
Latest master.
make -j
Expected behavior
The build works.
Log files
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/bgfield.cxx(7):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/haloproperties.cxx(9):
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/fofalgo.cxx(5):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/mpihdfio.cxx(9):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/mpigadgetio.cxx(9):
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/mpiramsesio.cxx(9):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.cxx(5):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/ramsesio.cxx(20):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/localbgcomp.cxx(7):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/exceptions.cxx(9):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/hdfio.cxx(26):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(7):
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.cxx(5):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/mpiroutines.cxx(14):
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/nchiladaio.cxx(7):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/mpinchiladaio.cxx(11):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/omproutines.cxx(11):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/unbind.cxx(8):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/mpitipsyio.cxx(10):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/swiftinterface.cxx(6):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/localfield.cxx(8):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/io.cxx(7):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/mpivar.cxx(9):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/gadgetio.cxx(7):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/tipsyio.cxx(7):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/buildandsortarrays.cxx(5):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/utilities.cxx(6):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/ui.cxx(6):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/search.cxx(9):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
aperture_M_gas_highT[i]*=opt.h;
^
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3106): error: class "NBody::Particle" has no member "GetSFR"
auto sfr = Pval->GetSFR();
^
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3119): error: class "NBody::Particle" has no member "GetZmet"
pdata[i].Z_mean_gas_highT_incl += massval * Pval->GetZmet();
^
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3171): error: class "PropData" has no member "M_gas_nsf"
pdata[hostindex].M_gas_nsf_incl += pdata[i].M_gas_nsf;
^
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3172): error: class "PropData" has no member "M_gas_sf"
pdata[hostindex].M_gas_sf_incl += pdata[i].M_gas_sf;
^
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3189): error: class "PropData" has no member "M_gas_nsf"
pdata[i].M_gas_nsf_incl += pdata[i].M_gas_nsf;
^
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3190): error: class "PropData" has no member "M_gas_sf"
pdata[i].M_gas_sf_incl += pdata[i].M_gas_sf;
^
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3415): error: class "NBody::Particle" has no member "GetSFR"
sfr[j] = Part[taggedparts[j]].GetSFR();
^
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3417): error: class "NBody::Particle" has no member "GetZmet"
Zgas[j] = Part[taggedparts[j]].GetZmet();
^
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3483): error: class "NBody::Particle" has no member "GetSFR"
sfr[offset+j] = PartDataGet[taggedparts[j]].GetSFR();
^
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3485): error: class "NBody::Particle" has no member "GetZmet"
Zgas[offset+j] = PartDataGet[taggedparts[j]].GetZmet();
^```
Describe the bug
The SO lists produced by the current master appear to be about a factor 10 larger than those produced by older versions for the same snapshot. I have a particular example where the number of particles in the SO list is almost 8e8, for a simulation that has less than 1e8 particles. Unsurprisingly, the majority of the particles is reported to be part of multiple SOs (on average 9, with an extreme case of a particle being part of 53 SOs).
I manually checked the distances of the reported SO particles to the SO centre and found that the maximum distance is a factor 10-15 larger than the maximum SO radius (in this case R_100_rhocrit
). I wonder whether this has an impact on the reported SO properties, since I am unable to reproduce the various mass values using the particles from the SO list within the various SO radii (they typically are off by a factor 3).
By comparing legacy snapshots processed with different versions, I managed to constrain the problem to the following diff: ICRAR:64348794522c16f96ef4890ee94a08615fbd06c4...ICRAR:a43325cec3108d40f61214bab0cab50069dcb258. The oldest version produces sensible SO lists for which we were able to manually confirm that the SO properties are correct, while the newest one produces the same results as current master which might or might not be correct, but definitely have excessively large SO lists. This is based on the commit hashes reported in the .configuration
file; I was never actually able to run any of these versions myself due to various errors.
I tried this both with and without MPI enabled (all single node, single thread runs) and the results look similar (not exactly the same). The MPI-enabled output file is 3 times larger, but I guess this is due to compression.
If Velociraptor is built with parallel HDF5 support and we have MPI_number_of_tasks_per_write>1 and Calculate_radial_profiles=1 in the .cfg file then the code crashes with the following message:
HDF5-DIAG: Error detected in HDF5 (1.10.3) MPI-process 15:
#000: H5Dio.c line 322 in H5Dwrite(): could not get a validated dataspace from file_space_id
major: Invalid arguments to routine
minor: Bad value
#001: H5S.c line 254 in H5S_get_validated_dataspace(): selection + offset not within extent
major: Dataspace
minor: Out of range
Failed to write dataset: Npart_profile
The problem seems to be in write_dataset_nd() in hdfitems.h. When parallel HDF5 is enabled one MPI communicator is created for each output file. All of the tasks in one communicator write to the same output file. Each task needs to calculate the offset at which it should write its data and this offset calculation is wrong - it results in tasks trying to write beyond the bounds of the dataset.
The offset is stored in dims_offset and calculated as follows (this occurs several times in hdfitems.h):
MPI_Allgather(dims_single.data(), rank, MPI_UNSIGNED_LONG_LONG, mpi_hdf_dims.data(), rank, MPI_UNSIGNED_LONG_LONG, comm);
MPI_Allreduce(dims_single.data(), mpi_hdf_dims_tot.data(), rank, MPI_UNSIGNED_LONG_LONG, MPI_SUM, comm);
for (auto i=0;i<rank;i++) {
dims_offset[i] = 0;
if (flag_first_dim_parallel && i > 0) continue;
for (auto j=1;j<=ThisWriteTask;j++) {
dims_offset[i] += mpi_hdf_dims[i*NProcs+j-1];
}
}
The dimensions of the arrays on all tasks are gathered in mpi_hdf_dims and used to compute the offset in each dimension for the data to be written by this task (dims_offset). Here I think the index into mpi_hdf_dims is wrong. I think
for (auto j=1;j<=ThisWriteTask;j++) {
dims_offset[i] += mpi_hdf_dims[i*NProcs+j-1];
}
should really be
for (auto j=0;j<ThisWriteTask;j++) {
dims_offset[i] += mpi_hdf_dims[j*rank+i];
}
because after the allgather mpi_hdf_dims contains rank elements from each MPI task. j is looping over the lower numbered MPI tasks and i is the index within the block received from each task.
This would mean that any multidimensional output arrays will either be corrupted or cause a crash.
I'm running Swift with on the fly velociraptor on the EAGLE_low_z/EAGLE_12 example in the Swift repository using the vrconfig_3dfof_subhalos_SO_hydro.cfg parameter file. My VR configuration is
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_FLAGS_RELEASE="-O3 -xAVX -g" \
-DCMAKE_C_FLAGS_RELEASE="-O3 -xAVX -g" \
-DCMAKE_POSITION_INDEPENDENT_CODE=ON \
-DCMAKE_C_COMPILER=icc \
-DCMAKE_CXX_COMPILER=icpc \
-DVR_USE_SWIFT_INTERFACE=ON \
-DVR_USE_HYDRO=ON
and for Swift I use
../configure \
--enable-ipo \
--enable-debug \
--with-hdf5 \
--with-fftw \
--with-parmetis \
--with-gsl \
--with-tbbmalloc \
--with-hydro=sphenix --with-kernel=wendland-C2 --with-subgrid=EAGLE-XL \
--with-velociraptor=`pwd`/../../VELOCIraptor-STF/build/src
and I run it with
mpirun -np 2 ../../../build/examples/swift_mpi \
--cosmology --eagle --velociraptor \
--threads=16 eagle_12.yml
This crashes on the first VR invocation in substructureproperties.cxx line 6024:
x = Pval[i].GetHydroProperties();
At this point Pval[i].hydro is a null pointer, which I think is what causes the crash because GetHydroProperties just does "return *hydro".
Looking at swiftinterface.cxx, Part.hydro is only set if the parameter swift_gas_parts was passed to InvokeVelociraptorHydro() and was not null:
#ifdef GASON
if (swift_gas_parts != NULL)
{
for (auto i=0; i<num_hydro_parts; i++)
{
index = swift_gas_parts[i].index;
parts[index].SetHydroProperties(hydro);
}
free(swift_gas_parts);
}
#endif
It looks like output of extra properties has been only partially implemented for on the fly runs. Things that are missing:
If I comment out the extra properties in the .cfg file then it survives a VR invocation without crashing.
As described in #54 (comment) by @MatthieuSchaller:
If it helps, there is smaller test case here:
/snap7/scratch/dp004/jlvc76/SWIFT/EoS_tests/swiftsim/examples/EAGLE_low_z/EAGLE_6/eagle_0000.hdf5
/snap7/scratch/dp004/jlvc76/SWIFT/EoS_tests/swiftsim/examples/EAGLE_low_z/EAGLE_6/vrconfig_3dfof_subhalos_SO_hydro.cfg
This one crashes about 20s after start so might be easier.Config is: cmake ../ -DVR_USE_HYDRO=ON -DCMAKE_BUILD_TYPE=Debug
Run command line is: stf -C vrconfig_3dfof_subhalos_SO_hydro.cfg -i eagle_0000 -o halos_0000 -I 2Problem happens with gcc or ICC,
Running with -DVR_OPENMP=OFF also crashes in the same way,
Running without VR_MPI_REDUCE crashes in a different way. There the crash happens when reading in stuff.
This issue is to keep track of the last sentence. Indeed when running with -DVR_MPI_REDUCE=OFF the following crash happens:
[bolano:21822] Read -1, expected 50000000, errno = 14
[bolano:21822] *** Process received signal ***
[bolano:21822] Signal: Segmentation fault (11)
[bolano:21822] Signal code: Invalid permissions (2)
[bolano:21822] Failing at address: 0x7f75ac021000
[bolano:21822] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14bb0)[0x7f75baef6bb0]
[bolano:21822] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x1851d3)[0x7f75baaee1d3]
[bolano:21822] [ 2] /usr/lib/x86_64-linux-gnu/libopen-pal.so.40(opal_convertor_unpack+0x85)[0x7f75ba7f01c5]
[bolano:21822] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv_request_progress_frag+0x1bf)[0x7f75b8c1c5df]
[bolano:21822] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x95)[0x7f75b8c42ed5]
[bolano:21822] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x53a3)[0x7f75b8c433a3]
[bolano:21822] [ 6] /usr/lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x34)[0x7f75ba7de854]
[bolano:21822] [ 7] /usr/lib/x86_64-linux-gnu/libopen-pal.so.40(ompi_sync_wait_mt+0xb5)[0x7f75ba7e5315]
[bolano:21822] [ 8] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv+0x833)[0x7f75b8c0eff3]
[bolano:21822] [ 9] /usr/lib/x86_64-linux-gnu/libmpi.so.40(MPI_Recv+0xf5)[0x7f75bb2f93e5]
[bolano:21822] [10] builds/54/stf(_Z34MPIReceiveParticlesFromReadThreadsR7OptionsRPN5NBody8ParticleES3_RPiS6_S6_RPxRPP14ompi_request_tS4_+0x2a5)[0x55b35d3cd27b]
[bolano:21822] [11] builds/54/stf(_Z7ReadHDFR7OptionsRSt6vectorIN5NBody8ParticleESaIS3_EExRPS3_x+0xd240)[0x55b35d4df0cc]
[bolano:21822] [12] builds/54/stf(_Z8ReadDataR7OptionsRSt6vectorIN5NBody8ParticleESaIS3_EExRPS3_x+0x378)[0x55b35d36a87c]
[bolano:21822] [13] builds/54/stf(main+0xba7)[0x55b35d2b1311]
[bolano:21822] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf2)[0x7f75ba991cb2]
[bolano:21822] [15] builds/54/stf(_start+0x2e)[0x55b35d2b046e]
[bolano:21822] *** End of error message ***
Hi, I recently started working with cosmological zooms of Milky Way-mass haloes and I wanted to apply VELOCIraptor to the data in the same way as I do it for full cosmological boxes.
When I run VELOCIraptor on zooms to obtain the catalogues, I always find that VR struggles to correctly compute the properties of the central halo and of one of the subhaloes.
What I keep finding instead is that in all the zoom simulations,
Below is a simple python script that shows what is going on in the VR catalogue in one of the zoom simulations. The object with problem A (problem B) is the last one in the last (first) row of the output.
In [1]: import numpy as np
In [2]: import velociraptor as vr
In [3]: data = vr.load("halo_halo_10_0037.properties") # Load VR catalogue
In [4]: halo_mass = data.masses.mass_200crit # Fetch halo masses
In [5]: stellar_mass = data.apertures.mass_star_30_kpc # Fetch stellar masses
In [6]: r200 = data.radii.r_200crit # Get r200
In [7]: sort_idx = np.argsort(halo_mass) # Sort according to halo mass
In [8]: stellar_mass.to("Msun")[sort_idx] # Show stellar masses
Out[8]:
unyt_array([0.00000000e+00, 0.00000000e+00, 5.70441663e+09,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
2.72257719e+06, 0.00000000e+00, 2.57739048e+05,
1.74264642e+06, 2.02618167e+08, 5.25602289e+08], 'Msun') # [The largest halo has too low stellar mass]
In [9]: r200.to("kpc")[sort_idx] # Show r200
Out[9]:
unyt_array([-1.00000000e+00, -1.00000000e+00, -6.91574142e-02,
-1.94654432e+02, 8.12557225e+01, 2.44313447e+01,
1.32149940e+01, 6.28237402e+00, 8.00385528e+01,
2.38922979e+01, 3.74907068e+00, 3.73819116e+01,
1.03565331e+01, 1.64424960e+01, 1.15966419e+01,
4.34248021e+01, 2.39968011e+01, 2.79605862e+01,
3.40829283e+01, 3.33680927e+01, -2.75381537e+01,
1.10373186e+02, -2.33397677e+01, 5.57266117e+02,
3.56556864e+02, 3.61359039e+02, 1.77724087e+03,
1.29139616e+03, 8.10693203e+02, 4.50728670e+03], 'kpc') # [r200 seem to be very large for all (sub)haloes]
In [10]: data.structure_type.structuretype[sort_idx] # Show structure types (to see what is central and what is not)
Out[10]:
unyt_array([10, 10, 10,
10, 20, 20,
20, 25, 20,
35, 20, 30,
20, 30, 30,
20, 20, 30,
30, 30, 10,
30, 10, 20,
25, 25, 15,
15, 15, 20], dtype=int32, units='dimensionless') # [the largest halo is not a central]
Visualisation of the problematic (sub)haloes
The gas surface density and dark matter mass surface density of the object with problem A are
The gas surface density and dark matter mass surface density of the object with problem B are
To reproduce the bug
/cosma7/data/dp004/dc-chai1/zooms/batch1_12_05_21/halo_10/snapshot_halo_10_0037.hdf5
/cosma7/data/dp004/dc-chai1/vrconfig_3dfof_subhalos_SO_hydro_final2_zoom.cfg
/cosma7/data/dp004/dc-chai1/VR_ICRAR_27_March/VELOCIraptor-STF/build/stf
64de17bff6925f47f3ebe8f8195108801d661d95
, March 24, 2021)VR was compiled as:
cmake -DVR_USE_GAS=ON -DVR_USE_STAR=ON -DVR_USE_BH=ON -DVR_MPI=NO
I tried compiling with -DVR_ZOOM_SIM
but that didn't help.
I think that the problems occur because I am using a VR config that was originally designed for full cosmological boxes (not zooms). The solution will therefore most likely involve tweaking a few parameters in the VR config.
As mentioned in #57 (comment) there are a number of datasets that are widely different between runs of VR (standalone, but probably also with SWIFT) that are OpenMP-enabled and OpenMP-disabled.
A full list of the datasets that are different can be derived by comparing the two diffeernt output files mentioned in the comment linked above. I did the following:
vr_compare_datasets() {
file1="$1"
file2="$2"
dataset="$3"
lines=15
diff -Naur <(h5dump -d $dataset "$file1" | head --lines $lines) <(h5dump -d $dataset "$file2" | head --lines $lines)
}
for dataset in `h5dump -n /cosma7/data/dp004/jlvc76/BAHAMAS/Roi_run/halos_omp_0036.properties.0 | grep SO_ | awk '{print $2}'`; do
vr_compare_datasets /cosma7/data/dp004/jlvc76/BAHAMAS/Roi_run/halos_{omp_,}0036.properties.0 $dataset
done > so_quantities.diff
The resulting file shows the differences for all SO_*
datasets between the two files. The following is a summary of the situation:
Only a few values are slightly different, the rest are the same
Most values are different, but only after 2 or 3 decimal places.
Also all equivalent Ly
and Lz
datasets.
Most values are different, even already at the most significant decimal place. Some values are the same though.
Describe the bug
When plotting the FOF halo mass vs. R200 one expected a one-to-one relationship and currently the code returns something that looks far from that (with a lot of scatter - see attached plot from Chris Power https://user-images.githubusercontent.com/27806527/57223099-7fbb3c00-7037-11e9-94f6-d99961ad2f90.png).
To Reproduce
Expected behavior
A clean relation between MFOF and R200 with no scatter
Describe the bug
When using the hdf5 output but not parallel-hdf5, some arrays have a different name in the .0
file compared to all the others.
The three problematic fields are:
SubgridMasses_average_bh
in the 0th file and SubgridMasses_index_0_average_bh
in the others.SubgridMasses_max_bh
in the 0th file and SubgridMasses_index_0_max_bh
in the others.SubgridMasses_min_bh
in the 0th file and SubgridMasses_index_0_min_bh
in the others.These are three of the four "extra BH properties" listed in our VR config file. The fourth quantity
is computed in apertures and somehow ends up with the same name in all the files.
extract of config:
# Collect the BH subgrid masses and compute the max, min, average and total mass in apertures
BH_internal_property_names=SubgridMasses,SubgridMasses,SubgridMasses,SubgridMasses,
BH_internal_property_input_output_unit_conversion_factors=1.0e10,1.0e10,1.0e10,1.0e10,
BH_internal_property_calculation_type=max,min,average,aperture_total,
BH_internal_property_output_units=solar_mass,solar_mass,solar_mass,solar_mass,
So, only extra properties computed at the level of the whole group seem affected.
I don't know whether the same problem appears for gas or star extra properties since all the ones we use
are also computed in apertures, not over the whole group.
To Reproduce
Steps to reproduce the behavior:
VR_HDF5=ON VR_ALLOWPARALLELHDF5=OFF VR_OPENMP=OFF VR_MPI=ON VR_USE_HYDRO=ON
(the openMP bit is likely irrelevant)Expected behavior
The name should be the same in all files. The name in the .0
file is the one that follows the other fields' convention.
Log files
Not relevant.
Environment (please complete the following information):
Not relevant
Describe the bug
I get the following warning when building on my laptop:
/Users/mphf18/Documents/swift/VELOCIraptor-STF/src/ui.cxx:811:22: note: use '=='
to turn this assignment into an equality comparison
if (j=line.find(sep)){
^
==
Using
Apple clang version 11.0.0 (clang-1100.0.33.8)
Target: x86_64-apple-darwin20.2.0
cmake -DVR_USE_SWIFT_INTERFACE=ON -DVR_USE_HYDRO=OND -DCMAKE_CXX_FLAGS="-fPIC" -DCMAKE_BUILD_TYPE=Release -DOpenMP_CXX_FLAGS="-Xpreprocessor -fopenmp -I/usr/local/opt/libomp/include" -DOpenMP_CXX_LIB_NAMES="omp" -DOpenMP_omp_LIBRARY=/usr/local/opt/libomp/lib/libomp.dylib ..
Might be worth checking out.
Would it be possible to time the i/o part of the code? Even better if we could time individually the writing of the properties catalogs, the parttype and the rest.
Describe the bug
In short , when I introduce a new gas internal property field to the VELOCIraptor config file, VELOCIraptor begins to zero all fields with metallicity of star-forming gas.
More preciesly, I have two config files of VR, where the diff
between the two is
177,181c177,181
< Gas_internal_property_names=DensitiesAtLastSupernovaEvent,GraphiteMasses,SilicatesMasses,AtomicHydrogenMasses,IonisedHydrogenMasses,MolecularHydrogenMasses,HydrogenMasses,HeliumMasses,
< Gas_internal_property_index_in_file=0,0,0,0,0,0,0,0,
< Gas_internal_property_input_output_unit_conversion_factors=1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,
< Gas_internal_property_calculation_type=max,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,
< Gas_internal_property_output_units=solar_mass/MPc3,solar_mass,solar_mass,solar_mass,solar_mass,solar_mass,solar_mass,solar_mass,
---
> Gas_internal_property_names=DensitiesAtLastSupernovaEvent,GraphiteMasses,SilicatesMasses,AtomicHydrogenMasses,IonisedHydrogenMasses,MolecularHydrogenMasses,HydrogenMasses,HeliumMasses,IronOverHydrogenMasses,
> Gas_internal_property_index_in_file=0,0,0,0,0,0,0,0,0,
> Gas_internal_property_input_output_unit_conversion_factors=1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,
> Gas_internal_property_calculation_type=max,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,
> Gas_internal_property_output_units=solar_mass/MPc3,solar_mass,solar_mass,solar_mass,solar_mass,solar_mass,solar_mass,solar_mass,solar_mass,
i.e. compared to config 1, config 2 has an additional field IronOverHydrogenMasses
.
After I run VR
../VR_ICRAR_27_March/VELOCIraptor-STF/stf -C vrconfig_3dfof_subhalos_SO_hydro_1.cfg -i colibre_0023 -o halo_v1_0023 -I 2
../VR_ICRAR_27_March/VELOCIraptor-STF/stf -C vrconfig_3dfof_subhalos_SO_hydro_2.cfg -i colibre_0023 -o halo_v2_0023 -I 2
and look into the output fields, I find
IPython 6.3.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from velociraptor import load
In [2]: import numpy as np
In [3]: output_from_config1 = load("halo_v1_0023.properties.0")
In [4]: output_from_config2 = load("halo_v2_0023.properties.0")
In [5]: np.max(output_from_config1.apertures.zmet_gas_sf_100_kpc)
Out[5]: unyt_quantity(0.01275099, '83.33*dimensionless')
In [6]: np.max(output_from_config2.apertures.zmet_gas_sf_100_kpc)
Out[6]: unyt_quantity(0., '83.33*dimensionless')
where one can see that in the latter case (config 2), the field apertures.zmet_gas_sf_100_kpc
has no positive values. And this is true not only for the 100-kpc apertures but for all types of apertures.
To Reproduce
/cosma7/data/dp004/dc-chai1/VR_ICRAR_27_March/VELOCIraptor-STF
(version 64de17bff6925f47f3ebe8f8195108801d661d95
)/cosma7/data/dp004/dc-chai1/test_VR/colibre_0023.hdf5
/cosma7/data/dp004/dc-chai1/test_VR/vrconfig_3dfof_subhalos_SO_hydro_?.cfg
, where ?
is 1
for config 1 and 2
for config 2Follow the exact same instructions in #63, but including the fixes in #64. After VR finishes it reports the following memory leak:
[2157.023] [ info] main.cxx:26 Finished running VR
[2157.023] [ info] main.cxx:27 Memory report at main.cxx:27@void finish_vr(Options&): Average: 20.006 [TiB] Data: 20.006 [TiB] Dirty: 0 [B] Library: 0 [B] Peak: 20.020 [TiB] Resident: 7.437 [GiB] Shared: 8.539 [MiB] Size: 20.006 [TiB] Text: 4.375 [MiB]
=================================================================
==19245==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 4 byte(s) in 1 object(s) allocated from:
#0 0x7f23f9635b07 in operator new[](unsigned long) ../../../../libsanitizer/asan/asan_new_delete.cpp:102
#1 0x79d311 in ReadHDF(Options&, std::vector<NBody::Particle, std::allocator<NBody::Particle> >&, long long, NBody::Particle*&, long long) /cosma/home/dp004/dc-toba1/scm/git/VELOCIraptor-STF/src/hdfio.cxx:739
#2 0x58eae5 in ReadData(Options&, std::vector<NBody::Particle, std::allocator<NBody::Particle> >&, long long, NBody::Particle*&, long long) /cosma/home/dp004/dc-toba1/scm/git/VELOCIraptor-STF/src/io.cxx:113
#3 0x47661e in main /cosma/home/dp004/dc-toba1/scm/git/VELOCIraptor-STF/src/main.cxx:254
#4 0x7f23f7433554 in __libc_start_main (/lib64/libc.so.6+0x22554)
SUMMARY: AddressSanitizer: 4 byte(s) leaked in 1 allocation(s).
(END)
A bit of a mix between bug and feature request.
When compiling the code with MPI all the output files have an .0
appended at the end of their name whilst the version of the code compiled without MPI does not. I think this is a left-over from the pre-parallel-hdf5 era where there were many files in the output.
Maybe the difference is intended behaviour though.
Another useful thing maybe would be to append a .hdf5
and .txt
to the files to make humans (and hdf5 tools!) more happy when looking at files in a directory.
When I run VR in single precision mode, I find that the timers produce negative numbers; for example,
0: finished FOF search in total time of -27.7514
Describe the bug
Code crashes with a segfault apparently in the (empty!) destructor of the hydro particles when computing some of the sub-structure properties.
To Reproduce
cmake ../ -DVR_USE_HYDRO=ON -DCMAKE_BUILD_TYPE=Debug -DVR_OPENMP=OFF
mpirun -np 16 stf -I 2 -i colibre_0023 -o haloes -C test.cfg
Crashes after ~350s. Last message printed (verbose = 0):
[0001] [ 246.855] [ info] search.cxx:3777 Done
[0001] [ 246.855] [ info] main.cxx:439 Baryon search with 1 threads finished in 11.813 [s]
[0001] [ 246.862] [ info] substructureproperties.cxx:5025 Sort particles and compute properties of 1100 objects
NbodyLib version: fcf1c17
Older versions
31ae376 and associated NBodyLib works.
When running with the inputs and configuration from #87/#88 with 4 OpenMP threads, 1 MPI rank and compiling with clang address sanitizer I got the following output:
[0000] [ 87.191] [debug] unbind.cxx:284 Unbinding 1521 groups ...
=================================================================
==182466==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020001e99a8 at pc 0x000000a457dc bp 0x7ff6f2c55470 sp 0x7ff6f2c55468
WRITE of size 8 at 0x6020001e99a8 thread T5
#0 0xa457db in .omp_outlined._debug__.7 /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:1240:19
#1 0xa46cfb in .omp_outlined..8 /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:1228:1
#2 0x7ff7114e3ca2 in __kmp_invoke_microtask (/usr/lib/x86_64-linux-gnu/libomp.so.5+0xabca2)
#3 0x7ff7114789c2 (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x409c2)
#4 0x7ff7114775f9 (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x3f5f9)
#5 0x7ff7114cb149 (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x93149)
#6 0x7ff71153f58f in start_thread nptl/pthread_create.c:463:8
#7 0x7ff71134c222 in clone misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
0x6020001e99a8 is located 16 bytes to the right of 8-byte region [0x6020001e9990,0x6020001e9998)
allocated by thread T0 here:
#0 0x53ca8d in operator new[](unsigned long) (/home/rtobar/scm/git/VELOCIraptor-STF/builds/88-debug-addrsan/stf+0x53ca8d)
#1 0xa40b32 in PotentialTree(Options&, long long, NBody::Particle*&, NBody::KDTree*&) /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:1185:11
#2 0xa3db16 in Potential(Options&, long long, NBody::Particle*) /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:960:5
#3 0xa5357e in CalculatePotentials(Options&, NBody::Particle**, long long&, long long*) /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:434:13
#4 0xa38e98 in Unbind(Options&, NBody::Particle**, long long&, long long*, long long*, long long**, int) /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:793:5
#5 0xa48efd in CheckUnboundGroups(Options, long long, NBody::Particle*, long long&, long long*&, long long*, long long**, int, long long*) /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:350:18
#6 0x8b040f in SearchBaryons(Options&, long long&, NBody::Particle*&, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle> >&, long long*&, long long&, long long&, int, int, PropData*) /home/rtobar/scm/git/VELOCIraptor-STF/src/search.cxx:3840:13
#7 0x547208 in main /home/rtobar/scm/git/VELOCIraptor-STF/src/main.cxx:463:13
#8 0x7ff71125bcb1 in __libc_start_main csu/../csu/libc-start.c:314:16
Thread T5 created by T0 here:
#0 0x4f770a in pthread_create (/home/rtobar/scm/git/VELOCIraptor-STF/builds/88-debug-addrsan/stf+0x4f770a)
#1 0x7ff7114ca823 (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x92823)
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:1240:19 in .omp_outlined._debug__.7
Shadow bytes around the buggy address:
0x0c04800352e0: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
0x0c04800352f0: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
0x0c0480035300: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
0x0c0480035310: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
0x0c0480035320: fa fa fd fa fa fa fd fa fa fa fd fa fa fa 00 fa
=>0x0c0480035330: fa fa 00 fa fa[fa]fa fa fa fa fa fa fa fa fa fa
0x0c0480035340: fa fa fa fa fa fa fa fa fa fa fa fa fa fa 00 fa
0x0c0480035350: fa fa 00 fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c0480035360: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c0480035370: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c0480035380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==182466==ABORTING
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[57249,1],0]
Exit code: 1
--------------------------------------------------------------------------
This define has the same typo as was fixed in a previous issue. Maybe worth fixing here for consistency.
VELOCIraptor-STF/src/hdfitems.h
Line 103 in cb4336d
Also, not sure what that whole block is supposed to achieve in general.
This has been observed in several simulations but one quick way to reproduce it is the examples/SmallCosmoVolume/SmallCosmoVolume_DM example in the Swift repository. The problem appears to have been introduced in VR commit b5e1a8c.
I'm using the latest ICRAR VR master (a497fe3) configured with:
cmake .. \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_C_COMPILER=icc \
-DCMAKE_CXX_COMPILER=icpc \
-DVR_USE_SWIFT_INTERFACE=ON \
-DVR_USE_GAS=OFF
and the latest Swift master (5fab1cdb81e869fd3adf198a7e0509f5e87eb093) configured with:
../configure \
--enable-debug \
--with-velociraptor=`pwd`/../../VELOCIraptor-STF/build/src
The parameter files I'm using are from the Swift repository - see examples/SmallCosmoVolume/SmallCosmoVolume_DM/small_cosmo_volume_dm.yml and examples/SmallCosmoVolume/SmallCosmoVolume_DM/vrconfig_3dfof_subhalos_SO_hydro.cfg.
To run it:
cd swiftsim/examples/SmallCosmoVolume/SmallCosmoVolume_DM
./getIC.sh
mpirun -np 1 ../../../build/examples/swift_mpi --cosmology --self-gravity --velociraptor --threads=16 small_cosmo_volume_dm.yml
It runs for a few minutes and crashes when it gets to a redshift ~3:
0: finished FOF search in total time of 3.17214
[login7c:219197:0:219197] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
/cosma7/data/dp004/jch/Swift/repro_bug/VELOCIraptor-STF/src/search.cxx: [ AdjustStructureForPeriod() ]
...
976 for (i=0;i<nbodies;i++) {
977 if (pfof[i]==0) continue;
978 if (irefpos[pfof[i]] != -1) continue;
==> 979 refpos[pfof[i]] = Coordinate(Part[i].GetPosition());
980 irefpos[pfof[i]] = i;
981 }
982 #ifdef USEOPENMP
==== backtrace (tid: 219197) ====
0 0x000000000071d5c2 AdjustStructureForPeriod() /cosma7/data/dp004/jch/Swift/repro_bug/VELOCIraptor-STF/src/search.cxx:979
1 0x0000000000715e4b SearchFullSet() /cosma7/data/dp004/jch/Swift/repro_bug/VELOCIraptor-STF/src/search.cxx:482
2 0x000000000058e652 InvokeVelociraptorHydro() /cosma7/data/dp004/jch/Swift/repro_bug/VELOCIraptor-STF/src/swiftinterface.cxx:615
3 0x000000000058c837 InvokeVelociraptor() /cosma7/data/dp004/jch/Swift/repro_bug/VELOCIraptor-STF/src/swiftinterface.cxx:398
4 0x00000000005686ba velociraptor_invoke..0() /cosma7/data/dp004/jch/Swift/repro_bug/swiftsim/build/examples/../../src/velociraptor_interface.c:1026
5 0x00000000005824a0 engine_check_for_dumps() /cosma7/data/dp004/jch/Swift/repro_bug/swiftsim/build/examples/../../src/engine.c:2974
6 0x000000000041331a engine_step() /cosma7/data/dp004/jch/Swift/repro_bug/swiftsim/build/examples/../../src/engine.c:2819
7 0x000000000041331a engine_step() /cosma7/data/dp004/jch/Swift/repro_bug/swiftsim/build/examples/../../src/engine.c:2828
8 0x000000000040f1cd main() /cosma7/data/dp004/jch/Swift/repro_bug/swiftsim/build/examples/../../examples/main.c:1490
9 0x0000000000022555 __libc_start_main() ???:0
10 0x000000000040cda9 _start() ???:0
=================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 219197 RUNNING AT login7c.pri.cosma7.alces.network
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
The problem is that on line 979 pfof[i]=4627676946019562717, which is clearly not a valid array index. The index i=262144 at this point. This simulation has 262144 particles. The array pfof has more values than this (I think because particles are duplicated so that threads have whole groups to work on) and it seems to be the extra elements after the first 262144 which have problematic values.
I haven't been able to reproduce this running standalone VR on the same simulation.
The function GetMilliCount() in utilities.cxx makes use of the ftime()
system (POISX) function which is now deprecated. https://man7.org/linux/man-pages/man3/ftime.3.html
GCC 9.x.y reports this as a warning. The function may in principle be absent on some systems.
Hi all, recently I have been running VR on SWIFT output containing black hole particles with subgrid properties. VR crashes while it is computing the BH subgrid properties. The VR output with the crash looks as follows:
Opening group PartType0: Data set Coordinates
Opening group PartType1: Data set Coordinates
Opening group PartType4: Data set Coordinates
Opening group PartType5: Data set Coordinates
Opening group PartType0: Data set Velocities
Opening group PartType1: Data set Velocities
Opening group PartType4: Data set Velocities
Opening group PartType5: Data set Velocities
Opening group PartType0: Data set ParticleIDs
Opening group PartType1: Data set ParticleIDs
Opening group PartType4: Data set ParticleIDs
Opening group PartType5: Data set ParticleIDs
Opening group PartType0: Data set Masses
Opening group PartType1: Data set Masses
Opening group PartType4: Data set Masses
Opening group PartType5: Data set DynamicalMasses
Opening group PartType0: Data set InternalEnergies
Opening group PartType0: Data set StarFormationRates
Opening group PartType0: Data set MetalMassFractions
Opening group PartType4: Data set MetalMassFractions
Opening group PartType4: Data set BirthScaleFactors
Opening group PartType0: Data set ElementMassFractions
Opening group PartType0: Data set SpeciesFractions
Opening group PartType0: Data set SpeciesFractions
Opening group PartType0: Data set SpeciesFractions
Opening group PartType5: Data set SubgridMasses
HDF5-DIAG: Error detected in HDF5 (1.10.3) MPI-process 0:
#000: H5S.c line 921 in H5Sget_simple_extent_ndims(): not a dataspace
major: Invalid arguments to routine
minor: Inappropriate type
terminate called after throwing an instance of 'std::length_error'
what(): vector::_M_default_append
vel_rap_exact.sh: line 123: 185245 Aborted VR_ICRAR/VELOCIraptor-STF/build/stf -C vrconfig_3dfof_subhalos_SO_hydro.cfg -i /snap7/scratch/dp004/dc-chai1/my_cosmological_box/AGN_L006N188_00/colibre_2729 -o /snap7/scratch/dp004/dc-chai1/my_cosmological_box/AGN_L006N188_00/halo_2729 -I 2
If I comment out the lines
BH_internal_property_names=SubgridMasses,SubgridMasses,SubgridMasses,SubgridMasses,
BH_internal_property_input_output_unit_conversion_factors=1.0e10,1.0e10,1.0e10,1.0e10,
BH_internal_property_calculation_type=max,min,average,aperture_total,
BH_internal_property_output_units=solar_mass,solar_mass,solar_mass,solar_mass,
in the VR config file, the bug dissapperas and VR runs smoothly.
cmake -DVR_USE_HYDRO=ON
gsl/2.4, intel_mpi/2018, cmake/3.18.1, intel_comp/2018, parallel_hdf5/1.10.3
/cosma7/data/dp004/dc-chai1/vrconfig_3dfof_subhalos_SO_hydro.cfg
/snap7/scratch/dp004/dc-chai1/my_cosmological_box/AGN_L006N188_00_iso/colibre_2729.hdf5
SubgridMasses
field.Describe the bug
I've been trying to run the latest(ish) master of VR on some SWIFT outputs (on COSMA7), and I've been getting a couple of odd crashes.
To Reproduce
Version cb4336d.
Ran on snapshots under /snap7/scratch/dp004/dc-borr1/new_randomness_runs/runs/Run_*
Log files
STDOUT:
...
[ 528.257] [debug] search.cxx:2716 Substructure at sublevel 1 with 955 particles
[ 528.257] [debug] unbind.cxx:284 Unbinding 1 groups ...
[ 528.257] [debug] unbind.cxx:379 Finished unbinding in 1 [ms]. Number of groups remaining: 2
STDERR:
terminate called after throwing an instance of 'std::runtime_error'
what(): Particle density not positive, cannot continue
Environment (please complete the following information):
cmake .. -DCMAKE_CXX_FLAGS="-O3 -march=native" -DVR_MPI=OFF -DVR_HDF5=ON -DVR_ALLOWPARALLELHDF5=ON -DVR_USE_HYDRO=ON
Currently Loaded Modulefiles:
1) python/3.6.5 5) parallel_hdf5/1.8.20
2) ffmpeg/4.0.2 6) gsl/2.4(default)
3) intel_comp/2018(default) 7) fftw/3.3.7(default)
4) intel_mpi/2018 8) parmetis/4.0.3(default)
We are in the process of upgrading our compilation tool stack to Intel MPI 2020 (2020.2 to be specific). The code compiles without problems (cmake ../ -DVR_USE_HYDRO=ON -DCMAKE_BUILD_TYPE=Release -DVR_MPI=off
) but at runtime, I get the following messages:
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
The code keeps running and seems to produce correct results nevertheless.
This happens in between
First build tree ...
and
0: finished building 64 domains and trees
After fixing #88 and #100 (so same build configuration, inputs, etc.), and when running with parallel HDF5 writing with 8 ranks and 2 nodes, I've been running into the following problem:
[0000] [ 944.345] [ info] io.cxx:1646 Saving property data to lala.properties.0
[0002] [ 944.345] [ info] io.cxx:1646 Saving property data to lala.properties.0
[0001] [ 944.345] [ info] io.cxx:1646 Saving property data to lala.properties.0
[0003] [ 944.345] [ info] io.cxx:1646 Saving property data to lala.properties.0
[0005] [ 944.348] [ info] io.cxx:1646 Saving property data to lala.properties.0
[0004] [ 944.348] [[0006] [ 944.[0007] [ 944.348] [ info] info] io.cxx:1646 Saving property data to lala.properties.0348] [ info] io.cxx:1646 Saving property data to lala.properties.0
io.cxx:1646 Saving property data to lala.properties.0
[0001] [ 954.686] [ info] [0002] [ 954.686] [ info] io.cxx:2907 Wrote lala.properties.0 in 12.316 [s]
[0004] [ 954.686] [ info] io.cxx:2907 Wrote lala.properties.0 in [0003] [ 954.686] [ info] io.cxx:2907 Wrote lala.properties.0 in 16.645 [s]
[0005] [ 954.686] [ info] io.cxx:2907 Wrote lala.properties.0 in 10.341 [s]
io.cxx:2907 Wrote lala.properties.0 in 16.898 [s]
[0006] [ 954.686] [ info] io.cxx:2907 Wrote lala.properties.0 in 13.529 [s]
HDF5-DIAG: Error detected in HDF5 (1.10.3) MPI-process 0:
#000: H5O.c line 120 in H5Oopen(): unable to open object
major: Object header
minor: Can't open object
#001: H5Oint.c line 596 in H5O__open_name(): unable to open object
major: Object header
minor: Can't open object
#002: H5Oint.c line 551 in H5O_open_name(): object not found
major: Object header
minor: Object not found
#003: H5Gloc.c line 422 in H5G_loc_find(): can't find object
major: Symbol table
minor: Object not found
#004: H5Gtraverse.c line 851 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#005: H5Gtraverse.c line 627 in H5G__traverse_real(): traversal operator failed
major: Symbol table
minor: Callback failed
#006: H5Gloc.c line 378 in H5G__loc_find_cb(): object 'ID' doesn't exist
major: Symbol table
minor: Object not found
Unable to open object to write attribute: Dimension_Mass[0007] [ 954.686] [ info] io.cxx:2907 Wrote lala.properties.0 in 12.479 [s]
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
16.452 [s]
This could be a race-condition with rank 0 trying to write an attribute while other ranks haven't closed their HDF5 file descriptors yet.
Lines 2883 to 2902 in 4cdbd52
Indeed, in the (somewhat scrambled) output shown above there are only 7 logs with "Wrote lala.properties.0", which would support this theory.
Describe the bug
Segfault when linking FOF fragments over MPI:
[0000] [ 264.245] [ info] search.cxx:353 Finished linking across MPI domains in 2.409 [min]
*** Error in `/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf': munmap_chunk(): invalid pointer: 0x000000009108a0f0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f474)[0x7fb7e6311474]
/cosma/local/Intel/Parallel_Studio_XE_2018/impi/2018.2.199//lib64/libmpi.so.12(+0x15b2fd)[0x7fb7e83902fd]
/cosma/local/Intel/Parallel_Studio_XE_2018/impi/2018.2.199//lib64/libmpi.so.12(MPI_Sendrecv+0x779)[0x7fb7e869ec69]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(_Z39MPISendReceiveFOFStarInfoBetweenThreadsR7OptionsP8fofid_inRSt6vectorIxSaIxEERS3_IfSaIfEEiiRi+0x2d5)[0x5cbbe5]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(_Z16MPIGroupExchangeR7OptionsxPN5NBody8ParticleERPx+0x1491)[0x5ca311]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(_Z13SearchFullSetR7OptionsxRSt6vectorIN5NBody8ParticleESaIS3_EERx+0x8491)[0x5fd581]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(main+0x10c6)[0x432936]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fb7e62b4555]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf[0x4317a9]
======= Memory map: ========
To Reproduce
Snapshot: /cosma7/data/dp004/jlvc76/SWIFT/May21_runs/swiftsim/examples/EAGLE_ICs/EAGLE_25_mpi/eagle_0036.hdf5
Config file: /cosma7/data/dp004/jlvc76/SWIFT/May21_runs/swiftsim/examples/EAGLE_ICs/EAGLE_25_mpi/vrconfig_3dfof_subhalos_SO_hydro.cfg
Code compilation options:
-DVR_MPI=ON -DVR_OPENMP=ON -VR_USE_HYDRO
(nothing else, e.g. no ZOOM)
Compiler: Intel 2021.1.0
MPI: Intel MPI 2018
Run command:
mpirun -np 8 stf -C vrconfig_3dfof_subhalos_SO_hydro.cfg -i eagle_0036 -o halos_new_VR_0036 -I 2
Note, that leads to 3 OMP threads / rank.
Log files
Last few lines of output:
iB]
[0004] [ 256.663] [ info] [0003] [ 256.667] [[0001] [ 256.669] [ info] [0000] [ 256.673] [mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
[0006] [ 256.676] [ info] mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
info] [0002] [ 256.694] [ info] [0005] [ 256.690] [ info] mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
[0007] [ 256.703] [mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
info] info] mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
[0000] [ 258.263] [ info] [0001] [ 258.263] [ info] [0002] [ 258.271] [[0003] [ 258.266] [ info] search.cxx:316 Finished local search, nexport/nimport = 76718 88213 in 29.965 [s]
[0003] [ 258.266] [ info] search.cxx:317 MPI search will require extra memory of 31.458 [MiB]
[0004] [ 258.263] [ info] [0005] [ 258.262] [ info] [0006] [ 258.262] [ info] [0007] [ 258.271] [ info] info] search.cxx:316 Finished local search, nexport/nimport = 164413 165337 in 1.622 [s]
[0007] [ 258.281] [[0003] [ 258.277] [ info] mpiroutines.cxx:3292 Now building exported particle list for FOF search
search.cxx:316 Finished local search, nexport/nimport = 62422 62962 in 2.309 [min]
[0000] [ 258.277] [ info] info] search.cxx:316 Finished local search, nexport/nimport = 49708 47752 in 2.446 [min]
[0004] [ 258.277] [ info] search.cxx:316 Finished local search, nexport/nimport = 64596 64910 in 2.217 [min]
[0001] [ 258.277] [ info] search.cxx:316 Finished local search, nexport/nimport = 111153 102204 in 1.361 [min]
[0005] [ 258.276] [ info] search.cxx:316 Finished local search, nexport/nimport = 101480 100349 in 1.973 [min]
[0006] [ 258.277] [ info] search.cxx:317 MPI search will require extra memory of 62.895 [MiB]
search.cxx:317 MPI search will require extra memory of 18.589 [MiB]
search.cxx:317 MPI search will require extra memory of 40.695 [MiB]
search.cxx:317 MPI search will require extra memory of 23.915 [MiB]
search.cxx:317 MPI search will require extra memory of 24.701 [MiB]
search.cxx:316 Finished local search, nexport/nimport = 74975 73738 in 56.573 [s]
[0002] [ 258.301] [search.cxx:317 MPI search will require extra memory of 38.496 [MiB]
[0004] [ 258.296] [ info] mpiroutines.cxx:3292 Now building exported particle list for FOF search
info] [0007] [ 258.318] [ info] mpiroutines.cxx:3292 Now building exported particle list for FOF search
search.cxx:317 MPI search will require extra memory of 28.365 [MiB]
[0001] [ 258.346] [ info] [0000] [ 258.349] [ info] [0005] [ 258.347] [ info] mpiroutines.cxx:3292 Now building exported particle list for FOF search
mpiroutines.cxx:3292 Now building exported particle list for FOF search
mpiroutines.cxx:3292 Now building exported particle list for FOF search
[0006] [ 258.363] [ info] mpiroutines.cxx:3292 Now building exported particle list for FOF search
[0002] [ 258.392] [ info] mpiroutines.cxx:3292 Now building exported particle list for FOF search
[0000] [ 259.952] [debug] search.cxx:339 [0001] [ 259.952] [debug] search.cxx:339 [0002] [ 259.960] [debug] search.cxx:339 [0003] [ 259.954] [debug] search.cxx:339 [0004] [ 259.952] [debug] search.cxx:339 [0005] [ 259.951] [debug] search.cxx:339 [0006] [ 259.951] [debug] search.cxx:339 [0007] [ 259.960] [debug] Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 4.911 [GiB] Data: 5.059 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 5.221 [GiB] Resident: 5.028 [GiB] Shared: 13.777 [MiB] Size: 5.221 [GiB] Text: 3.871 [MiB]
[0000] [ 259.952] [ info] search.cxx:341 Starting to linking across MPI domains
Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 4.240 [GiB] Data: 4.538 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 4.701 [GiB] Resident: 4.452 [GiB] Shared: 9.504 [MiB] Size: 4.701 [GiB] Text: 3.871 [MiB]
[0001] [ 259.952] [ info] search.cxx:341 Starting to linking across MPI domains
Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 7.818 [GiB] Data: 8.338 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 8.500 [GiB] Resident: 8.202 [GiB] Shared: 9.680 [MiB] Size: 8.500 [GiB] Text: 3.871 [MiB]
[0002] [ 259.960] [ info] search.cxx:341 Starting to linking across MPI domains
Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 3.484 [GiB] Data: 3.624 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 3.786 [GiB] Resident: 3.503 [GiB] Shared: 9.500 [MiB] Size: 3.786 [GiB] Text: 3.871 [MiB]
[0003] [ 259.955] [ info] search.cxx:341 Starting to linking across MPI domains
Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 2.913 [GiB] Data: 3.004 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 3.195 [GiB] Resident: 2.906 [GiB] Shared: 9.469 [MiB] Size: 3.166 [GiB] Text: 3.871 [MiB]
[0004] [ 259.952] [ info] search.cxx:341 Starting to linking across MPI domains
Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 5.717 [GiB] Data: 6.019 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 6.181 [GiB] Resident: 5.942 [GiB] Shared: 9.676 [MiB] Size: 6.181 [GiB] Text: 3.871 [MiB]
[0005] [ 259.951] [ info] search.cxx:341 Starting to linking across MPI domains
Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 5.911 [GiB] Data: 6.351 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 6.513 [GiB] Resident: 6.236 [GiB] Shared: 9.664 [MiB] Size: 6.513 [GiB] Text: 3.871 [MiB]
[0006] [ 259.951] [ info] search.cxx:341 Starting to linking across MPI domains
search.cxx:339 Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 9.796 [GiB] Data: 10.618 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 10.780 [GiB] Resident: 10.538 [GiB] Shared: 9.707 [MiB] Size: 10.780 [GiB] Text: 3.871 [MiB]
[0007] [ 259.973] [ info] search.cxx:341 Starting to linking across MPI domains
[0000] [ 264.245] [ info] search.cxx:353 Finished linking across MPI domains in 2.409 [min]
*** Error in `/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf': munmap_chunk(): invalid pointer: 0x000000009108a0f0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f474)[0x7fb7e6311474]
/cosma/local/Intel/Parallel_Studio_XE_2018/impi/2018.2.199//lib64/libmpi.so.12(+0x15b2fd)[0x7fb7e83902fd]
/cosma/local/Intel/Parallel_Studio_XE_2018/impi/2018.2.199//lib64/libmpi.so.12(MPI_Sendrecv+0x779)[0x7fb7e869ec69]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(_Z39MPISendReceiveFOFStarInfoBetweenThreadsR7OptionsP8fofid_inRSt6vectorIxSaIxEERS3_IfSaIfEEiiRi+0x2d5)[0x5cbbe5]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(_Z16MPIGroupExchangeR7OptionsxPN5NBody8ParticleERPx+0x1491)[0x5ca311]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(_Z13SearchFullSetR7OptionsxRSt6vectorIN5NBody8ParticleESaIS3_EERx+0x8491)[0x5fd581]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(main+0x10c6)[0x432936]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fb7e62b4555]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf[0x4317a9]
Additional context
The exact same code version, compiled in an identical fashion, runs without any problems with just OMP parallelisation.
When running the code with a similar setup to that from #87 under valgrind I get the following errors:
==25566== Thread 1:
==25566== Conditional jump or move depends on uninitialised value(s)
==25566== at 0x375E3E: CalculateSphericalOverdensityExclusive(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&) (substructureproperties.cxx:7335)
==25566== by 0x388960: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:434)
==25566== by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566== by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566== by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566== by 0x1BA40E: main (main.cxx:555)
==25566==
==25566== Conditional jump or move depends on uninitialised value(s)
==25566== at 0x375FD2: CalculateSphericalOverdensityExclusive(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&) (substructureproperties.cxx:7339)
==25566== by 0x388960: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:434)
==25566== by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566== by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566== by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566== by 0x1BA40E: main (main.cxx:555)
==25566==
==25566== Conditional jump or move depends on uninitialised value(s)
==25566== at 0x376080: CalculateSphericalOverdensityExclusive(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&) (substructureproperties.cxx:7348)
==25566== by 0x388960: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:434)
==25566== by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566== by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566== by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566== by 0x1BA40E: main (main.cxx:555)
==25566==
==25566== Conditional jump or move depends on uninitialised value(s)
==25566== at 0x376572: SetSphericalOverdensityMasstoTotalMassExclusive(Options&, PropData&) (substructureproperties.cxx:7397)
==25566== by 0x388998: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:435)
==25566== by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566== by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566== by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566== by 0x1BA40E: main (main.cxx:555)
==25566==
==25566== Conditional jump or move depends on uninitialised value(s)
==25566== at 0x37657C: SetSphericalOverdensityMasstoTotalMassExclusive(Options&, PropData&) (substructureproperties.cxx:7397)
==25566== by 0x388998: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:435)
==25566== by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566== by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566== by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566== by 0x1BA40E: main (main.cxx:555)
==25566==
==25566== Conditional jump or move depends on uninitialised value(s)
==25566== at 0x52348DC: sqrt (w_sqrt_compat.c:31)
==25566== by 0x3876D7: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:515)
==25566== by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566== by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566== by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566== by 0x1BA40E: main (main.cxx:555)
==25566== Conditional jump or move depends on uninitialised value(s)
==25566== at 0x375314: CalculateSphericalOverdensitySubhalo(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&, int) (substructureproperties.cxx:7236)
==25566== by 0x388A94: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:430)
==25566== by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566== by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566== by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566== by 0x1BA40E: main (main.cxx:555)
==25566==
==25566== Conditional jump or move depends on uninitialised value(s)
==25566== at 0x375362: CalculateSphericalOverdensitySubhalo(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&, int) (substructureproperties.cxx:7237)
==25566== by 0x388A94: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:430)
==25566== by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566== by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566== by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566== by 0x1BA40E: main (main.cxx:555)
==25566==
==25566== Conditional jump or move depends on uninitialised value(s)
==25566== at 0x3753B0: CalculateSphericalOverdensitySubhalo(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&, int) (substructureproperties.cxx:7238)
==25566== by 0x388A94: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:430)
==25566== by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566== by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566== by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566== by 0x1BA40E: main (main.cxx:555)
==25566==
==25566== Conditional jump or move depends on uninitialised value(s)
==25566== at 0x3753FE: CalculateSphericalOverdensitySubhalo(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&, int) (substructureproperties.cxx:7239)
==25566== by 0x388A94: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:430)
==25566== by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566== by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566== by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566== by 0x1BA40E: main (main.cxx:555)
==25566==
==25566== Conditional jump or move depends on uninitialised value(s)
==25566== at 0x37544C: CalculateSphericalOverdensitySubhalo(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&, int) (substructureproperties.cxx:7240)
==25566== by 0x388A94: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:430)
==25566== by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566== by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566== by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566== by 0x1BA40E: main (main.cxx:555)
==25566==
These all seem to stem from the fact that the PropData
class has many members with uninitialised variables that are read in various places:
substructureproperties.cxx:7335
, :7339
and :7397
read gRvir_excl
.substructureproperties.cxx:7236
and :515
read gM200c
.substructureproperties.cxx:7348
reads gMvir_excl
.substructureproperties.cxx:7237
reads gM200m
.substructureproperties.cxx:7238
reads gMvir
.substructureproperties.cxx:7239
and :7240
, so those will require more diagnosis.There seems to be a mismatch between the population of the array hdf_parts->names and its usage.
On line 1588 of hdfio.cxx, the code attempts to access hdf_parts[k]->names[HDFGASTEMP]
.
HDFGASTEMP
is defined to be 99 on line 53 of hdfitems.h
. However, the names
array is only 39 long and the field we actaully want to access is in position 8.
Setting HDFGASTEMP
to 8 solves the immediate problem. However, that seems a bit suspicious as it means the array was not constructed using the named fields of hdfitems.h
and could hence lead to other problems of the same kind down the line.
Hi all,
In a few of my cosmological runs at z=0
, VELOCIraptor produces objects with M_star (100 kpc) / M200crit > 1
. Interestingly, these larger-than-unity ratios seem to only occur if the stellar mass aperture is set to 100 kpc
. For 30-kpc
apertures, I am finding no objects with M_star (30 kpc) / M200crit > 1
The problem can be best illustrated using stellar mass vs. halo mass plots. Below are two such plots, for the same run at z=0
. The top one has 30 kpc
apertures, while the bottom one is with 100 kpc
apertures. I emphasise that in the bottom plot, there are objects above the one-to-one line.
Fig 1. Stellar mass vs halo mass. The stellar mass is computed in 30-kpc apertures
Fig 2. Same as Fig. 1 but for 100-kpc apertures
Below I am displaying one of the objects with M_star (100 kpc) / M200crit > 1
from the plot above. Interestinlgy, this object has R200crit = 2.4 kpc
which is much smaller than R_aperture = 100 kpc
. Can the latter explain M_star (100 kpc) / M200crit >> 1
?
Fig 3. Dark-matter projected density of one of the object with M_star (100 kpc) / M200crit > 1 from Fig. 2
Fig 4. Same as Fig. 3 but the colour traces stellar projected density
The fact that there are SMHM ratios > 1 indicates that there could be a bug in the code. If this is expected behaviour, it would still be great to know what causes these unrealistically high ratios, and also how exactly the 100-kpc-aperture stellar mass is computed in the case that an object has R200crit << 100 kpc
.
Note that the dark-matter particle mass in my runs is 1.2 \times 10^6 M_\odot
.
commit 7e4683963354bcf2b2730d5f8576ac931f7da43d
Author: Rodrigo Tobar <[email protected]>
Date: Sun Oct 25 20:49:02 2020 +0800
All my data, which I used to make the plots above, can be found on cosma
/snap7/scratch/dp004/dc-chai1/my_cosmological_box/AGN5_L006188_00_CHIMES_NoESO_M16/colibre_2729.hdf5
/snap7/scratch/dp004/dcchai1/my_cosmological_box/AGN5_L006188_00_CHIMES_NoESO_M16/halo_2729.*
/cosma7/data/dp004/dc-chai1/vrconfig_m16.cfg
Trying to compile Swift+VR on Cosma7 using the new modules. This is min example that I'm having trouble getting working.
module load intel_comp/2020-update2
module load intel_mpi/2020-update2
module load ucx/1.8.1
module load parmetis/4.0.3-64bit
module load parallel_hdf5/1.10.6
module load fftw/3.3.8cosma7
module load gsl/2.5
Configured VR using:
cmake -DCMAKE_BUILD_TYPE=Release -DVR_USE_SWIFT_INTERFACE=ON -DCMAKE_CXX_FLAGS="-fPIC" ..
Configured Swift using:
./configure --with-velociraptor=/vrdir/
Gives the error:
ipo: error #11021: unresolved gsl_multifit_nlinear_trs_lmaccel
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_trust
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_default_parameters
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_alloc
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_residual
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_position
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_init
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_driver
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_free
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: remark #11001: performing single-file optimizations
ipo: remark #11006: generating object file /tmp/ipo_icc22Mhex.o
icc: error #10014: problem during multi-file optimization compilation (code 1)
make[2]: *** [swift] Error 1
make[2]: *** Waiting for unfinished jobs....
ipo: error #11021: unresolved gsl_multifit_nlinear_trs_lmaccel
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_trust
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_default_parameters
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_alloc
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_residual
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_position
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_init
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_driver
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_free
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: remark #11001: performing single-file optimizations
ipo: remark #11006: generating object file /tmp/ipo_icc2P8zjg.o
icc: error #10014: problem during multi-file optimization compilation (code 1)
make[2]: *** [swift_mpi] Error 1
make[2]: Leaving directory /cosma7/data/dp004/rttw52/swift_runs/runs/Sibelius/swiftsim/examples' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory
/cosma7/data/dp004/rttw52/swift_runs/runs/Sibelius/swiftsim'
make: *** [all] Error 2
After HDF5 reading finishes, the names of the extra output fields are broadcasted from rank 0 to the rest of the MPI communicator. For example:
VELOCIraptor-STF/src/hdfio.cxx
Lines 444 to 452 in 7c66ef3
This code is repeated for each of the lists with extra fields.
This code is also incorrect: the reception buffer is sized (on the stack) using the local value size, but the value sent from rank 0 can be longer, leading to potential stack corruption and crashes.
This indeed happened while trying to verify that the fix for #88. For some reason when running with 8 ranks on a single node nothing happened, but when running on two nodes this broke:
Fatal error in PMPI_Bcast: Invalid buffer pointer, error stack:
PMPI_Bcast(2667).........: MPI_Bcast(buf=0x7ffd4b89a480, count=26, MPI_CHAR, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1804)....: fail failed
MPIR_Bcast(1832).........: fail failed
I_MPIR_Bcast_intra(2056).: Failure during collective
I_MPIR_Bcast_intra(2043).: fail failed
MPIR_Bcast_advanced(2135): fail failed
MPIR_Bcast_intra(1670)...: Failure during collective
MPIR_Bcast_intra(1638)...: fail failed
MPIR_Bcast_knomial(2338).: Failure during collective
I am unable to get VELOCIraptor to successfully compile on either NCI Gadi (and have verified on my MacBook Pro).
I am using
commit 7d185b065b8d8c6f58533bfb612a85e70e1edd84 (HEAD -> master, origin/master, origin/HEAD)
Author: Rodrigo Tobar <[email protected]>
Date: Thu Oct 22 13:35:53 2020 +0800
On Gadi, I have the following modules loaded;
- pbs 6) intel-mkl/2020.0.166
- openmpi/4.0.2(default) 7) intel-compiler/2020.0.166
- fftw3/3.3.8 8) intel-tbb/2020.0.166
- hdf5/1.10.5p 9) parmetis/4.0.3-i8r8
- gsl/2.6 10) metis/5.1.0-i8r8
I am using the following on Gadi;
cmake ../ -DHDF5_C_LIBRARY_hdf5:FILEPATH="/apps/hdf5/1.10.5p/lib/ompi3/libhdf5_hl.a" -DVR_USE_SWIFT_INTERFACE:BOOL=ON -DCMAKE_CXX_FLAGS="-fPIC" -DCMAKE_BUILD_TYPE=Release -DNBODY_SINGLE_PRECISION=ON
This produces the following output;
> -- The CXX compiler identification is Intel 19.1.0.20191121
> -- The C compiler identification is Intel 19.1.0.20191121
> -- Check for working CXX compiler: /apps/intel-ct/wrapper/icpc
> -- Check for working CXX compiler: /apps/intel-ct/wrapper/icpc -- works
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Detecting CXX compile features
> -- Detecting CXX compile features - done
> -- Check for working C compiler: /apps/intel-ct/wrapper/icc
> -- Check for working C compiler: /apps/intel-ct/wrapper/icc -- works
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Found PkgConfig: /bin/pkg-config (found version "1.4.2")
> -- Found GSL: /apps/gsl/2.6/include (found version "2.6")
> -- Found MPI_C: /apps/openmpi-mofed4.7-pbs19.2/4.0.2/lib/libmpi.so (found version "3.1")
> -- Found MPI_CXX: /apps/openmpi-mofed4.7-pbs19.2/4.0.2/lib/libmpi_cxx.so (found version "3.1")
> -- Found MPI: TRUE (found version "3.1")
> -- HDF5: Using hdf5 compiler wrapper to determine C configuration
> -- Found HDF5: /apps/hdf5/1.10.5p/lib/ompi3/libhdf5_hl.a;/apps/szip/2.1.1/lib/libsz.so;/usr/lib64/libz.so;/usr/lib64/libdl.so;/usr/lib64/libm.so (found version "1.10.5") found components: C
> -- Found OpenMP_CXX: -qopenmp (found version "5.0")
> -- Found OpenMP: TRUE (found version "5.0") found components: CXX
>
> NBodyLib successfully configured with the following settings:
>
> Dependencies
> ------------
>
> OpenMP Yes
>
> Types
> -----
>
> All calculations/properties stored as float Yes
> All integeres are long int Yes
>
> Particle data
> -------------
>
> Do not store mass, all particles are the same mass No
> Use single precision to store positions, velocities, other props No
> Use unsigned particle PIDs No
> Use unsigned particle IDs No
> Activate gas No
> Activate stars No
> Activate black holes/sinks No
> Activate extra dm properties No
> Extra input info stored No
> Extra FOF info stored No
> Large memory KDTree No
> Particle compiled for SWIFT Yes
>
> Compilation
> -----------
>
> Include directories: /apps/gsl/2.6/include;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/Analysis;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/Cosmology;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/InitCond;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/KDTree;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/Math;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/NBody
> Macros defined: SINGLEPRECISION;LONGINT;SWIFTINTERFACE;HAVE_GSL22;USEOPENMP;USEOMP
> Libs: /apps/gsl/2.6/lib/libgsl.so;/apps/gsl/2.6/lib/libgslcblas.so
> C++ flags: -qopenmp
> Link flags: -qopenmp
>
>
> VELOCIraptor successfully configured with the following settings:
>
> File formats
> ------------
>
> HDF5 Yes
> Compressed HDF5 No
> Parallel HDF5 Yes
> nchilada No
>
> Precision-specifics
> -------------------
>
> Long Integers Yes
>
> OpenMP-specifics
> ----------------
>
> OpenMP support Yes
>
> MPI-specifics
> -------------
>
> MPI support Yes
> Reduce MPI memory overhead at the cost of extra CPU cycles Yes
> Use huge MPI domains No
>
> Gadget
> ------
>
> Use longs IDs No
> Use double precision pos and vel No
> Use single precision mass No
> Use header type 2 No
> Use extra SPH information No
> Use extra star information No
> Use extra black hole information No
>
> Particle-specifics
> ------------------
>
> Activate gas (& associated physics, properties calculated) No
> Activate stars (& associated physics, properties calculated) No
> Activate black holes (& associated physics, properties calculated) No
> Activate extra dark matter properties (& associated properties) No
> Mass not stored (for uniform N-Body sims, reduce mem footprint) No
> Large memory KDTree to handle > max 32-bit integer entries per tree No
>
> Simulation-specifics
> --------------------
>
> Used to run against simulations with a high resolution region No
> Build library for integration into SWIFT Sim code Yes
>
> Others
> ------
>
> Calculate local density dist. only for particles in field objects Yes
> Like above, but use particles inside field objects only for calclation No
>
> Compilation
> -----------
>
> Include dirs: /home/571/cxp571/Codes/VELOCIraptor-STF/src;/apps/gsl/2.6/include;/apps/openmpi-mofed4.7-pbs19.2/4.0.2/include/openmpi;/apps/openmpi-mofed4.7-pbs19.2/4.0.2/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include;/apps/openmpi-mofed4.7-pbs19.2/4.0.2/include/openmpi/opal/mca/event/libevent2022/libevent;/apps/openmpi-mofed4.7-pbs19.2/4.0.2/include/openmpi/opal/mca/event/libevent2022/libevent/include;/apps/openmpi-mofed4.7-pbs19.2/4.0.2/include;/apps/hdf5/1.10.5p/include;/apps/szip/2.1.1/include;/apps/gsl/2.6/include;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/Analysis;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/Cosmology;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/InitCond;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/KDTree;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/Math;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/NBody
> Macros defined: LONGINT;USEOPENMP;MPIREDUCEMEM;STRUCDEN;SWIFTINTERFACE;USEMPI;USEHDF;USEPARALLELHDF;SINGLEPRECISION;LONGINT;SWIFTINTERFACE;HAVE_GSL22;USEOPENMP;USEOMP
> Libs: /apps/gsl/2.6/lib/libgsl.so;/apps/gsl/2.6/lib/libgslcblas.so;/apps/openmpi-mofed4.7-pbs19.2/4.0.2/lib/libmpi_cxx.so;/apps/openmpi-mofed4.7-pbs19.2/4.0.2/lib/libmpi.so;/apps/hdf5/1.10.5p/lib/ompi3/libhdf5_hl.a;/apps/szip/2.1.1/lib/libsz.so;/usr/lib64/libz.so;/usr/lib64/libdl.so;/usr/lib64/libm.so;/apps/gsl/2.6/lib/libgsl.so;/apps/gsl/2.6/lib/libgslcblas.so
> C++ flags: -qopenmp -fPIC
> Link flags: -qopenmp
>
> -- Adding doc target for directories: /home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/doc;/home/571/cxp571/Codes/VELOCIraptor-STF/doc
> -- Configuring done
> -- Generating done
> -- Build files have been written to: /home/571/cxp571/Codes/VELOCIraptor-STF/build_swift
>
This results in the following error during compilation;
[ 50%] Building CXX object src/CMakeFiles/velociraptor.dir/endianutils.cxx.o
In file included from /home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.cxx(5):
/home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.h(226): error: a value of type "double (*)(double)" cannot be assigned to an entity of type "Double_t={float} (*)(Double_t={float})"
LittleDouble_t=DoubleNoSwap;
^
In file included from /home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.cxx(5):
/home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.h(227): error: a value of type "double (*)(double)" cannot be assigned to an entity of type "Double_t={float} (*)(Double_t={float})"
BigDouble_t=DoubleSwap;
^
In file included from /home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.cxx(5):
/home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.h(268): error: a value of type "double (*)(double)" cannot be assigned to an entity of type "Double_t={float} (*)(Double_t={float})"
LittleDouble_t=DoubleSwap;
^
In file included from /home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.cxx(5):
/home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.h(269): error: a value of type "double (*)(double)" cannot be assigned to an entity of type "Double_t={float} (*)(Double_t={float})"
BigDouble_t=DoubleNoSwap;
^
compilation aborted for /home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.cxx (code 2)
make[2]: *** [src/CMakeFiles/velociraptor.dir/build.make:102: src/CMakeFiles/velociraptor.dir/endianutils.cxx.o] Error 2
make[1]: *** [CMakeFiles/Makefile2:597: src/CMakeFiles/velociraptor.dir/all] Error 2
make: *** [Makefile:84: all] Error 2
I can get past this point if I edit out the lines 226, 277, and 268, 269 in endianutils.h
;
LittleDouble_t=DoubleNoSwap;
BigDouble_t=DoubleSwap;
LittleDouble_t=DoubleSwap;
BigDouble_t=DoubleNoSwap;
As far as I can see, these aren't used anywhere else in the source code.
However, I then hit the issue,
[ 45%] Building CXX object src/CMakeFiles/velociraptor.dir/substructureproperties.cxx.o
/home/571/cxp571/Codes/VELOCIraptor-STF/src/substructureproperties.cxx(4886): error: cannot overload functions distinguished by return type alone
double CalcGravitationalConstant(Options &opt) {
^
/home/571/cxp571/Codes/VELOCIraptor-STF/src/substructureproperties.cxx(4890): error: cannot overload functions distinguished by return type alone
double CalcHubbleUnit(Options &opt) {
^
compilation aborted for /home/571/cxp571/Codes/VELOCIraptor-STF/src/substructureproperties.cxx (code 2)
make[2]: *** [src/CMakeFiles/velociraptor.dir/build.make:375: src/CMakeFiles/velociraptor.dir/substructureproperties.cxx.o] Error 2
make[1]: *** [CMakeFiles/Makefile2:597: src/CMakeFiles/velociraptor.dir/all] Error 2
make: *** [Makefile:84: all] Error 2
There is a mismatch between the definitions in substructureproperties.cxx
and proto.h
proto.h:Double_t CalcGravitationalConstant(Options &opt);
substructureproperties.cxx:double CalcGravitationalConstant(Options &opt) {
proto.h:Double_t CalcHubbleUnit(Options &opt);
substructureproperties.cxx:double CalcHubbleUnit(Options &opt) {
and so I have made the following changes;
substructureproperties.cxx:Double_t CalcGravitationalConstant(Options &opt) {
substructureproperties.cxx:Double_t CalcHubbleUnit(Options &opt) {
The code now compiles.
Can you please verify that you see similar behaviour, and that these changes are valid?
Describe the bug
The offsets stored in the SO list output seem not consistent with the SO sizes in the same file. My (possibly wrong) expectation is that the particle IDs belonging to SO i
are stored in pIDs[offset[i] : offset[i]+size[i]]
, but that is not the case.
To Reproduce
The problem can be best illustrated through the following Python snippet that can be applied to any HDF5 SO list output:
import h5py
SOfile = h5py.File("<RANDOM SO_LIST FILE.hdf5>", "r")
ofs = SOfile["Offset"][:]
siz = SOfile["SO_size"][:]
pIDs = SOfile["Particle_IDs"][:]
if not ofs[-1]+siz[-1] == pIDs.shape[0]:
print("Wrong final size ({0}=/={1})!".format(ofs[-1]+siz[-1], pIDs.shape[0]))
for i in range(1,ofs.shape[0]):
if not ofs[i-1]+siz[i-1] == ofs[i]:
print("Wrong offset ({0} {1}, {2} {3})!".format(ofs[i-1], siz[i-1], ofs[i], siz[i]))
exit(1)
This will produce two error messages. The first one because the final offset is not separated from the end of the particle ID list by the size of the final SO, and the second because the next offset does not match the previous offset plus the previous size.
Expected behavior
ofs[i] == ofs[i-1]+siz[i-1]
Put differently, I would expect ofs
to be equivalent to
ofs = numpy.cumsum(siz)
ofs[1:] = ofs[:-1]
ofs[0] = 0
Log files
Not applicable.
Environment (please complete the following information):
Irrelevant for this problem.
Additional context
I think the problem is situated in io.cxx:1450
, where the value of SOpids[0]
seems to be (incorrectly) omitted from the loop.
Describe the bug
Latest master. Configure with VR_USE_GAS
but not VR_USE_HYDRO
.
Error
In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/stf.h:8,
from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/buildandsortarrays.cxx:5:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
2630 | aperture_M_gas_highT[i]*=opt.h;
| ^~~~~~~~~~~~~~~~~~~~
| aperture_Z_gas
In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/stf.h:8,
from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/haloproperties.cxx:7:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
2630 | aperture_M_gas_highT[i]*=opt.h;
| ^~~~~~~~~~~~~~~~~~~~
| aperture_Z_gas
In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.cxx:5:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
2630 | aperture_M_gas_highT[i]*=opt.h;
| ^~~~~~~~~~~~~~~~~~~~
| aperture_Z_gas
In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/logging.h:7,
from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/bgfield.cxx:7:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
2630 | aperture_M_gas_highT[i]*=opt.h;
| ^~~~~~~~~~~~~~~~~~~~
| aperture_Z_gas
In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/logging.h:7,
from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/hdfio.cxx:26:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
2630 | aperture_M_gas_highT[i]*=opt.h;
| ^~~~~~~~~~~~~~~~~~~~
| aperture_Z_gas
In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/stf.h:8,
from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/fofalgo.cxx:5:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
2630 | aperture_M_gas_highT[i]*=opt.h;
| ^~~~~~~~~~~~~~~~~~~~
| aperture_Z_gas
In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/stf.h:8,
from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/io.cxx:7:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
2630 | aperture_M_gas_highT[i]*=opt.h;
| ^~~~~~~~~~~~~~~~~~~~
| aperture_Z_gas
In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/stf.h:8,
from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/gadgetio.cxx:7:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
2630 | aperture_M_gas_highT[i]*=opt.h;
| ^~~~~~~~~~~~~~~~~~~~
Should be easy to fix; most likely an incorrect #ifdef choice in these files when treating the newly added quantities from #57.
When using the inputs and configuration from #87, and after running with the fix for the original issue, VR crashes with the following problem:
[0000] [1123.397] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
[0001] [1123.397] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
[0003] [1123.403] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
[0004] [1123.395] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
[0005] [1123.395] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
[0006] [1123.403] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
[0007] [1123.403] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
[0002] [1123.409] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
HDF5-DIAG: Error detected in HDF5 (1.8.20) MPI-process 2:
#000: H5Dio.c line 322 in H5Dwrite(): can't prepare for writing data
major: Dataset
minor: Write failed
#001: H5Dio.c line 403 in H5D__pre_write(): can't write data
major: Dataset
minor: Write failed
#002: H5Dio.c line 846 in H5D__write(): can't write data
major: Dataset
minor: Write failed
#003: H5Dmpio.c line 527 in H5D__contig_collective_write(): couldn't finish shared collective MPI-IO
major: Low-level I/O
minor: Write failed
#004: H5Dmpio.c line 1397 in H5D__inter_collective_io(): couldn't finish collective MPI-IO
major: Low-level I/O
minor: Can't get value
#005: H5Dmpio.c line 1441 in H5D__final_collective_io(): optimized write failed
major: Dataset
minor: Write failed
#006: H5Dmpio.c line 295 in H5D__mpio_select_write(): can't finish collective parallel write
major: Low-level I/O
minor: Write failed
#007: H5Fio.c line 169 in H5F_block_write(): write through metadata accumulator failed
major: Low-level I/O
minor: Write failed
#008: H5Faccum.c line 823 in H5F__accum_write(): file write failed
major: Low-level I/O
minor: Write failed
#009: H5FDint.c line 254 in H5FD_write(): addr overflow, addr = 572724, size=18446744071687627232, eoa=12286439660
major: Invalid arguments to routine
minor: Address overflowed
HDF5-DIAG: Error detected in HDF5 (1.8.20) MPI-process 7:
#000: H5Dio.c line 322 in H5Dwrite(): can't prepare for writing data
major: Dataset
minor: Write failed
#001: H5Dio.c line 403 in H5D__pre_write(): can't write data
major: Dataset
minor: Write failed
#002: H5Dio.c line 846 in H5D__write(): can't write data
major: Dataset
minor: Write failed
#003: H5Dmpio.c line 527 in H5D__contig_collective_write(): couldn't finish shared collective MPI-IO
major: Low-level I/O
minor: Write failed
#004: H5Dmpio.c line 1397 in H5D__inter_collective_io(): couldn't finish collective MPI-IO
major: Low-level I/O
minor: Can't get value
#005: H5Dmpio.c line 1441 in H5D__final_collective_io(): optimized write failed
major: Dataset
minor: Write failed
#006: H5Dmpio.c line 295 in H5D__mpio_select_write(): can't finish collective parallel write
major: Low-level I/O
minor: Write failed
#007: H5Fio.c line 169 in H5F_block_write(): write through metadata accumulator failed
major: Low-level I/O
minor: Write failed
#008: H5Faccum.c line 823 in H5F__accum_write(): file write failed
major: Low-level I/O
minor: Write failed
#009: H5FDint.c line 254 in H5FD_write(): addr overflow, addr = 572724, size=18446744072925879424, eoa=12286439660
major: Invalid arguments to routine
minor: Address overflowed
Failed to write dataset: Particle_IDsFailed to write dataset: Particle_IDs
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 7
Describe the bug
I am trying to run stf
stand-alone on a DMO snapshot, using the zoom configuration. The process fails with the error
0 Beginning substructure search
Error, net size 0 with row,col=0,0
terminate called after throwing an instance of 'std::length_error'
what(): vector::_M_default_append
Aborted
The same behavior appears also when running stf
on the fly with SWIFT. Note: SWIFT itself completed the run successfully and the snapshots do not deem to be corrupted in any apparent way (checked for existing datasets and datasets shapes).
To Reproduce
Steps to reproduce the behavior:
/cosma/home/dp004/dc-alta2/data7/xl-zooms/dmo/L0300N0564_VR93
on Cosma 7.module purge
module load intel_comp/2020-update2
module load intel_mpi/2020-update2
module load ucx/1.8.1
module load parmetis/4.0.3-64bit
module load parallel_hdf5/1.10.6
module load fftw/3.3.8cosma7
module load gsl/2.5
../VELOCIraptor-STF/stf -I 2 -i snapshots/L0300N0564_VR93_0199 -o L0300N0564_VR93_0199 -C config/vr_config_zoom_dmo.cfg
0 Beginning substructure search
Error, net size 0 with row,col=0,0
terminate called after throwing an instance of 'std::length_error'
what(): vector::_M_default_append
Aborted
Expected behavior
Given the arguments parsed, expected to generate the usual output files in the pwd
(e.g. the L0300N0564_VR93_0199.properties
file).
Log files
Logs can be displayed to console, but they are also available in the $(pwd)/stf
directory.
Environment (please complete the following information):
module load intel_comp/2020-update2
module load intel_mpi/2020-update2
module load ucx/1.8.1
module load parmetis/4.0.3-64bit
module load parallel_hdf5/1.10.6
module load fftw/3.3.8cosma7
module load gsl/2.5
**Additional context**
I also tried running with higher verbosity in the `.cfg` file, but no further info is shown.
Thanks in advance for your help!
Describe the bug
This is the next step in the leak-finding exercise. Now running the stand-alone code on larger boxes to identify problems in the sub-structure search.
I get a segfault when running the code with -O0 -fsanitize=address
using GCC 10.
To Reproduce
Steps to reproduce the behavior:
/cosma7/data/dp004/jlvc76/SWIFT/master/swiftsim/examples/EAGLE_DMO_low_z/EAGLE_DMO_50/vrconfig_3dfof_subhalos_SO_hydro.cfg
(i.e. our standard EAGLE setup but with hydro switched off)stf -C vrconfig_3dfof_subhalos_SO_hydro.cfg -i /cosma7/data/dp004/jlvc76/SWIFT/master/swiftsim/examples/EAGLE_ICs/EAGLE_25/eagle_0036 -o haloes -I 2
(The input is a very standard box)** Crash **
We get this error:
==244350==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000030 (pc 0x00000068f8fa bp 0x7fff6ff33a00 sp 0x7fff6ff32e80 T0)
==244350==The signal is caused by a READ memory access.
==244350==Hint: address points to the zero page.
#0 0x68f8fa in GetSOMasses(Options&, long long, NBody::Particle*, long long, long long*&, PropData*&) /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/substructureproperties.cxx:3583
#1 0x6a1709 in SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/substructureproperties.cxx:5100
#2 0x4731bd in main /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/main.cxx:530
#3 0x7f8ea5371554 in __libc_start_main (/lib64/libc.so.6+0x22554)
#4 0x46e7c8 (/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf+0x46e7c8)
If I configure with OpenMP, I get a crash at basically the same place:
==72314==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000050 (pc 0x0000006f8328 bp 0x7fe48e4ebdd0 sp 0x7fe48e4eb5e0 T105)
==72314==The signal is caused by a READ memory access.
==72314==Hint: address points to the zero page.
AddressSanitizer:DEADLYSIGNAL
#0 0x6f8328 in GetSOMasses(Options&, long long, NBody::Particle*, long long, long long*&, PropData*&) [clone ._omp_fn.1] /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/substructureproperties.cxx:3583
#1 0x7fe62f2e6a05 in gomp_thread_start ../../../libgomp/team.c:123
#2 0x7fe62eea0ea4 in start_thread (/lib64/libpthread.so.0+0x7ea4)
#3 0x7fe62ebc996c in clone (/lib64/libc.so.6+0xfe96c)
I don't know what to make of this quite yet. If I switch of the sanitizer, the code runs happily. It may hence be a false-positive but cleaning this might help run through and identify proper leaks.
What is also interesting is that it's happening in a section of code related to our good friend the SO_xxx properties (#62).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.