Git Product home page Git Product logo

velociraptor-stf's People

Contributors

cdplagos avatar jborrow avatar jchelly avatar jesmigel avatar matthieuschaller avatar nikthecosmoguy avatar pelahi avatar rhyspoulton avatar rtobar avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

velociraptor-stf's Issues

VR does not run in MPI+hydro mode

Describe the bug

Running VR with MPI switched on, OMP switched off and hydro switched on breaks on EAGLE boxes.

To Reproduce

  1. Compile the latest master (cb4336d) with VR_USE_HYDRO=ON, VR_MPI=ON and VR_OPENMP=OFF.
  2. Run mpirun -np 16 stf -C vrconfig_3dfof_subhalos_SO_hydro.cfg -i eagle_0036 -o halos_mpi_0036 -I 2

This is a standard XL snapshot with our standard config file.
The code segfaults after printing

[0000] [ 346.430] [ info] search.cxx:352 Finished linking across MPI domains in 1.258 [min]

somewhere in a MPI call trying to clear a vector of size 10^16 (!!).

The input can be found here if necessary: /snap7/scratch/dp004/jlvc76/SWIFT/EoS_tests/swiftsim/examples/EAGLE_ICs/EAGLE_25/eagle_0036.hdf5 and /snap7/scratch/dp004/jlvc76/SWIFT/EoS_tests/swiftsim/examples/EAGLE_ICs/EAGLE_25/vrconfig_3dfof_subhalos_SO_hydro.cfg.

The same setup works either:

  • On 1 MPI rank
  • Without VR_USE_HYDRO

Note that the snapshot is made of one single file if that is relevant.

Get an option for mass-weighted aperture-based extra fields

(duplicated from the public repo)

From the config file we can currently request (among others):

Aperture totals
Average mass weighted

It would be great if we could add an "aperture mass-weighted average" to the list. This would let us then trivially construct HI mass functions for instance.

Uninitialised gas SFR threshold

While reading the code I found that Option.gas_sft_threshold is never initialised: it doesn't have a default value (so it cannot be assumed it takes a fixed value at startup), and there's no place in the code that writes into it (either from the command line, configuration file, or input data files). On the other hand the value is used in a few places in the code, particularly in few of the functions in substructureproperties.cxx.

$> grep -RIn gas_sfr_threshold src/
src/substructureproperties.cxx:616:                if (SFR>opt.gas_sfr_threshold) pdata[i].M_gas_sf+=mval;
src/substructureproperties.cxx:641:                if (SFR>opt.gas_sfr_threshold) {
src/substructureproperties.cxx:693:                if (SFR>opt.gas_sfr_threshold) {
src/substructureproperties.cxx:871:                    if (SFR>opt.gas_sfr_threshold){
src/substructureproperties.cxx:1473:                if (SFR>opt.gas_sfr_threshold) pdata[i].M_gas_sf+=mval;
src/substructureproperties.cxx:1546:                if (SFR > opt.gas_sfr_threshold) {
src/substructureproperties.cxx:1754:                    if (SFR>opt.gas_sfr_threshold) {
src/substructureproperties.cxx:1809:            if (SFR>opt.gas_sfr_threshold) {
src/substructureproperties.cxx:5637:            if (SFR>opt.gas_sfr_threshold) {
src/substructureproperties.cxx:5841:            if (SFR>opt.gas_sfr_threshold) EncMassGasSF+=mass;
src/substructureproperties.cxx:5891:            if (SFR>opt.gas_sfr_threshold) oldrc_gas_sf=rc;
src/substructureproperties.cxx:5993:                if (SFR>opt.gas_sfr_threshold) {
src/substructureproperties.cxx:6071:                if (SFR>opt.gas_sfr_threshold) EncMassGasSF+=mass;
src/substructureproperties.cxx:6116:                if (SFR>opt.gas_sfr_threshold) oldrc_gas_sf=rc;
src/substructureproperties.cxx:6202:        if (Pval->GetSFR()>opt.gas_sfr_threshold)
src/substructureproperties.cxx:6238:        if (sfrval>opt.gas_sfr_threshold)
src/substructureproperties.cxx:6274:        if (Pval->GetSFR()>opt.gas_sfr_threshold)
src/substructureproperties.cxx:6311:        if (sfrval>opt.gas_sfr_threshold)
src/allvars.h:775:    Double_t gas_sfr_threshold;

Given the variable is uninitialised, this potentially means some results (basically whatever happens within those ifs) cannot be guaranteed to be consistent across different compilations/settings/platforms/etc.

The fix should be easy: firstly, make sure the variable has a default value, and secondly, we'll probably need a way to assign arbitrary values for this threshold.

@MatthieuSchaller, I have two questions about this:

  • Is this something that might explain some of the issues found lately (#75, #78)?
  • I assume 0 is a good default value, so I'll go with that. For setting arbitrary values: do you reckon they are supposed to be given via the configuration file (via a new setting), or should they somehow be computed from the input data?

Strange units and values for `tage_star`

(copied over from the public github repo)

There seems to be something not quite correct with the field tage_star in the output catalogues.

The data is in the range 10^8 - 10^9 but the units are reported as being internal time units, which would typically be something like 9*10^9 years already.

I don't know whether the issue is in the calculation of the values or whether the units displayed are incorrect.

Also, just to be sure, how are the ages defined? Is there any weighting of any kind for instance?

Hot gas properties compilation errors

Compilation of hot_gas_properties branch fails with the following traceback:

/cosma/home/dp004/dc-alta2/data7/xl-zooms/hydro/VELOCIraptor-STF/src/substructureproperties.cxx(593): error: class "NBody::Particle" has no member "GetTemperature"
                  temp = Pval->GetTemperature();
                               ^

/cosma/home/dp004/dc-alta2/data7/xl-zooms/hydro/VELOCIraptor-STF/src/substructureproperties.cxx(1418): error: class "NBody::Particle" has no member "GetTemperature"
                temp=Pval->GetTemperature();
                           ^

/cosma/home/dp004/dc-alta2/data7/xl-zooms/hydro/VELOCIraptor-STF/src/substructureproperties.cxx(5140): error: class "NBody::Particle" has no member "GetTemperature"
        temp = Pval->GetTemperature()*mass;
                     ^

/cosma/home/dp004/dc-alta2/data7/xl-zooms/hydro/VELOCIraptor-STF/src/hdfio.cxx(1609): error: class "NBody::Particle" has no member "SetTemperature"
                      for (int nn=0;nn<nchunk;nn++) Part[count++].SetTemperature(doublebuff[nn]);
                                                                  ^

/cosma/home/dp004/dc-alta2/data7/xl-zooms/hydro/VELOCIraptor-STF/src/hdfio.cxx(1627): error: class "NBody::Particle" has no member "SetTemperature"
                        for (int nn=0;nn<nchunk;nn++) Pbaryons[bcount++].SetTemperature(doublebuff[nn]);
                                                                         ^

compilation aborted for /cosma/home/dp004/dc-alta2/data7/xl-zooms/hydro/VELOCIraptor-STF/src/substructureproperties.cxx (code 2)
make[2]: *** [src/CMakeFiles/velociraptor.dir/substructureproperties.cxx.o] Error 2
make[2]: *** Waiting for unfinished jobs....
compilation aborted for /cosma/home/dp004/dc-alta2/data7/xl-zooms/hydro/VELOCIraptor-STF/src/hdfio.cxx (code 2)
make[2]: *** [src/CMakeFiles/velociraptor.dir/hdfio.cxx.o] Error 2
make[1]: *** [src/CMakeFiles/velociraptor.dir/all] Error 2
make: *** [all] Error 2

I am using a fresh and updated clone of the repo and compiling with the following modules on cosma7:

module purge
module load cmake/3.18.1
module load intel_comp/2020-update2
module load intel_mpi/2020-update2
module load ucx/1.8.1
module load parmetis/4.0.3-64bit
module load parallel_hdf5/1.10.6
module load fftw/3.3.8cosma7
module load gsl/2.5

I am using the following cmake flags:

  cmake . -DVR_USE_HYDRO=ON \
    -DVR_USE_SWIFT_INTERFACE=OFF \
    -DCMAKE_CXX_FLAGS="-fPIC" \
    -DCMAKE_BUILD_TYPE=Release \
    -DVR_ZOOM_SIM=ON \
    -DVR_MPI=OFF
  make -j

Thank you in advance for looking into this! Please, let me know if I can provide further info for understanding the issue.

Discussion: How best to add cold gas props.

I am looking at adding H_2 and HI masses in the catalogs, and especially to the aperture measurements.

Some advice would be nice here.

I could:

  • Add a H_2 mass to the Particle object and just fill it in when reading in the snapshot. This is made of a combination of snapshot fields (mass * H_frac * H2_frac * 2).
  • Add a species map<> to the HydroProperties in the Particle object and store in there the H2_frac. Then when summing the mass of H2 for a given structure compute H2_mass from the species and already existing mass and H element fraction.

In spirit, it's similar to the SFR we accumulate but is more specific to SWIFT-EAGLE++.

@pelahi any thoughts on what the best choice would be to keep things tidy before I start typing?

VR crashes when trying to output extra properties

As mentioned in a couple of comments in #15, when trying to write data for extra properties (gas/bh/stars) VR crashes. This problem happens not only for BH or star data, but also for gas data.

To reproduce follow the same steps outlined in #15. Locally I'm running an OpenMP-disabled build with the following command line:

./stf -C ~/icrar/vr/EAGLE-XL/vrconfig_3dfofbound_subhalos_SO_hydro.cfg -i ~/icrar/vr/EAGLE-XL/colibre_2729 -o output -I 2 -s 16

In this example I have only the following extra properties on:

Gas_internal_property_names=ElementMassFractions,SpeciesFractions,SpeciesFractions,SpeciesFractions,                                                                                                                                                                                                                           
Gas_internal_property_index_in_file=0,0,1,2,
Gas_internal_property_input_output_unit_conversion_factors=1.0,1.0,1.0,1.0
Gas_internal_property_calculation_type =averagemassweighted,averagemassweighted,averagemassweighted,averagemassweighted,
Gas_internal_property_output_units=unitless,unitless,unitless,unitless,

Almost at the end of the VR execution the program fails:

[...]
0 Sort particles and compute properties of 5502 objects 
0 Calculate properties using minimum potential particle as reference 
0 Sort particles by binding energy
Memory report, func = SortAccordingtoBindingEnergy--line--4661 task = 0 : Average = 6.480034 GB, Data = 6.343040 GB, Dirty = 0.000000 GB, Library = 0.000000 GB, Peak = 6.958450 GB, Resident = 6.329998 GB, Shared = 0.004063 GB, Size = 6.531044 GB, Text = 0.002087 GB, 
0 getting CM
0 Done getting CM in 0.095405
Done FOF masses 0.000417
0 getting energy
0 Have calculated potentials 208.703257
0Done getting energy in 208.885781
0 getting bulk properties
terminate called after throwing an instance of 'std::bad_array_new_length'
  what():  std::bad_array_new_length

Thread 1 "stf" received signal SIGSEGV, Segmentation fault.
__libc_signal_block_app (set=0x7fffffffaad8) at ../sysdeps/unix/sysv/linux/internal-signals.h:75
75      ../sysdeps/unix/sysv/linux/internal-signals.h: No such file or directory.
(gdb) bt
#0  __libc_signal_block_app (set=0x7fffffffaad8) at ../sysdeps/unix/sysv/linux/internal-signals.h:75
#1  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:40
#2  0x00007ffff7118859 in __GI_abort () at abort.c:79
#3  0x00007ffff739e951 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff73aa47c in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff73aa4e7 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff73aa799 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff739e426 in __cxa_throw_bad_array_new_length () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x0000555555590b5a in Math::GMatrix::operator= (m=<error reading variable: Cannot access memory at address 0xffffffff7fffffff>, this=0x80000000) at /home/rtobar/scm/git/VELOCIraptor-STF/src/substructureproperties.cxx:5906
#9  CalcPhaseSigmaTensor (n=<optimised out>, p=<optimised out>, I=..., itype=<optimised out>) at /home/rtobar/scm/git/VELOCIraptor-STF/src/substructureproperties.cxx:3718
#10 0x00005555556fa7c2 in GetProperties (opt=..., nbodies=<optimised out>, Part=0x7fff69434010, ngroup=5502, pfof=<optimised out>, numingroup=@0x7fffffffb398: 0x5556607bcb90, pdata=@0x7fffffffb390: 0x5556554b8ed8, noffset=@0x7fffffffb3a8: 0x555660396210)
    at /home/rtobar/scm/git/VELOCIraptor-STF/src/substructureproperties.cxx:1050
#11 0x000055555570396f in SortAccordingtoBindingEnergy (opt=..., nbodies=13289344, Part=0x7fff69434010, ngroup=5502, pfof=@0x7fffffffb508: 0x7fff5c96c010, numingroup=<optimised out>, pdata=<optimised out>, ioffset=0) at /home/rtobar/scm/git/VELOCIraptor-STF/src/substructureproperties.cxx:4671
#12 0x000055555559efc5 in main (argc=<optimised out>, argv=<optimised out>) at /home/rtobar/scm/git/VELOCIraptor-STF/src/main.cxx:516

So far the only detail I can add is that in my environment this does not occur when using low levels of optimization (cmake -DCMAKE_BUILD_TYPE=Debug). Because of this it's getting a bit difficult to debug: when running with optimizations on many variables/stackframes are not properly visible or get mixed up, but when building for debugging the problem goes away. I'm currently trying to find a middle ground that lets me get more information.

Aperture quantities for extra quantity broken for `aperture_total`

Describe the bug
Latest master.

Adding

Gas_internal_property_names=XXX,                                                                                 
Gas_internal_property_index_in_file=0,                                                                                                                                                                                                                                       
Gas_internal_property_input_output_unit_conversion_factors=1.0e10,                                                                                                                                                                                        
Gas_internal_property_calculation_type=aperture_total,                                                                                                                                                                         
Gas_internal_property_output_units=solar_mass,

for any existing field XXX breaks. The code does not read in the field and dies rapidly when trying to write the config.

If instead I use

Gas_internal_property_calculation_type=max,                                                                                                                                                                         

then everything works. The same is true for any of the options apart from aperture_total and aperture_average which breaks in the same way.

If I change the value of CALCQUANTITYAPERTURETOTAL to 19 in allvars.h (was -1) then the fields get read in.
(That whole section feels like it should really be an enum type!)

I don't know whether that is a proper fix though as further down the line, the extra calculation type is used modulo some quantity (e.g. in ExtraPropInitValue() from substructureproperties.cxx).

Won't compile with HDF5 1.12 (with proposed fix).

This is an issue on my MacBook pro after upgrading to HDF5 1.12. The compiler is clang12. It appears that an additional parameter is now required by H5Oget_info and H5Oget_info_by_name.

After the most basic invocation of cmake (to make sure the code compiled after updating a number of libraries), I ran make and hit the following error;

[ 45%] Building CXX object src/CMakeFiles/velociraptor.dir/hdfio.cxx.o
In file included from /Users/cpower/Codes/VELOCIraptor-STF/src/hdfio.cxx:28:
/Users/cpower/Codes/VELOCIraptor-STF/src/hdfitems.h:161:9: error: no matching
function for call to 'H5Oget_info_by_name3'
H5Oget_info_by_name(ids.back(), parts[0].c_str(), &object_info, ...
^~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/hdf5/1.12.0_1/include/H5version.h:772:31: note: expanded from
macro 'H5Oget_info_by_name'
#define H5Oget_info_by_name H5Oget_info_by_name3
^~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/hdf5/1.12.0_1/include/H5Opublic.h:188:15: note: candidate
function not viable: requires 5 arguments, but 4 were provided
H5_DLL herr_t H5Oget_info_by_name3(hid_t loc_id, const char *name, H5O_i...

I fixed this by updating the instances in hdfitems.h - namely, at line 161, changing

H5Oget_info_by_name(ids.back(), parts[0].c_str(), &object_info, H5P_DEFAULT);
to

H5Oget_info_by_name(ids.back(), parts[0].c_str(), &object_info, H5O_INFO_ALL, H5P_DEFAULT);

and at line 203, changing

H5Oget_info(id, &object_info);

to

H5Oget_info(id, &object_info,H5O_INFO_ALL);

I used the following to identify the appropriate fixes;
https://portal.hdfgroup.org/display/HDF5/H5O_GET_INFO_BY_NAME3
https://portal.hdfgroup.org/display/HDF5/H5O_GET_INFO3

This hasn't affected versions of VR running on e.g. NCI Gadi - there the version of HDF5 is still 1.10.

std::bad_alloc in WriteSOCatalog

VR is crashing in WriteSOCatalog with the following message:

[0000] [1445.514] [ info] io.cxx:1292 Saving SO particle lists to halos-2.catalog_SOlist.0
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
/var/slurm/slurmd/job2998665/slurm_script: Zeile 4: 16915 Abgebrochen             builds/53/stf -i /cosma/home/dc-borr1/c7dataspace/XL_wave_1/runs_correct_dt/Run_0/eagle_0000 -C /cosma/home/dc-borr1/c7dataspace/XL_wave_1/runs_correct_dt/vrconfig_3dfof_subhalos_SO_hydro.cfg -I 2 -o halos-2

This feels like a bug introduced with #57, but it will require a bit of investigation.

More potential issues hidden in MPISendReceive*InfoBetweenThreads functions

In #87 it was found that the four MPISendReceiveFOF*InfoBetweenThreads functions all had a bug in their logic (which was duplicated across all functions) where an input buffer was not sized correctly for the amount of data it received via MPI. A fix was issued for this that both solved the buffer size issue and also removed the code duplication by providing a single function that performed the data exchange. It was also found that #54 had fixed one of those functions already, but had failed to identify the broader problem affecting all four functions.

After fixing #87 I went and had another look at the rest of the functions in this file (mpiroutines.cxx). I realised there are several MPISendReceive*InfoBetweenThreads families of functions, namely:

  • MPISendReceive<component>InfoBetweenThreads
  • MPISendReceiveBuffWith<component>InfoBetweenThreads
  • MPISendReceiveFOF<component>InfoBetweenThreads

(<component> are Hydro, Star, BH and ExtraDM)

From this list, the last item are the functions fixed in #87. The rest however seem to also follow a similar structure, and from a quick overview they also contain a copy of the same data exchange pattern (some with and some without the same buffer sizing bug) that was fixed and consolidated as a reusable function in #87. We should revisit these functions and try to reuse the new function where possible, thus minimizing the chances of running into memory corruption issues again.

Memory leaks when running on-the-fly in SWIFT

Describe the bug
There appears to be some memory leaks when running VR via on-the-fly calls within SWIFT.
I have been using GCC's address sanitizer to get more info about possible allocations that
are not being freed.

To Reproduce
Steps to reproduce the behavior:

  1. Get latest VR master and SWIFT master
  2. Compile VR using GCC with -DVR_USE_SWIFT_INTERFACE=ON -DCMAKE_CXX_FLAGS="-fPIC" and the additional CFLAG -fsanitize=address
  3. Compile SWIFT and link VR to it. Note the configure.ac line 1283 needs to get -fsanitize=address added.
  4. Run examples/SmallCosmoVolume/SmallCosmoVolume_VELOCIraptor/run.sh
    (Can also remove the --hydro in the run.sh to reduce the problem to a pure gravity-only run)

Results
Without MPI, the run completes and the memory sanitizer reports this:

=================================================================
==169982==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 344 byte(s) in 2 object(s) allocated from:
    #0 0x7fbf1aad5b47 in operator new[](unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10fb47)
    #1 0x564ec73ad936 in BuildPGList(long long, long long, long long*, long long*, NBody::Particle*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/buildandsortarrays.cxx:71
    #2 0x564ec746f09d in SearchSubset(Options&, long long, long long, NBody::Particle*, long long&, long long, long long*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:1400
    #3 0x564ec748e1df in SearchSubSub(Options&, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle> >&, long long*&, long long&, long long&, PropData*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:2991
    #4 0x564ec71b164d in InvokeVelociraptorHydro /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:537
    #5 0x564ec71adf23 in InvokeVelociraptor /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:283
    #6 0x564ec711c969 in velociraptor_invoke /home/matthieu/Desktop/Swift-git/io/swiftsim/src/velociraptor_interface.c:1018
    #7 0x564ec70dce49 in engine_check_for_dumps /home/matthieu/Desktop/Swift-git/io/swiftsim/src/engine_io.c:453

Direct leak of 192 byte(s) in 1 object(s) allocated from:
    #0 0x7fbf1aad5b47 in operator new[](unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10fb47)
    #1 0x564ec73ad936 in BuildPGList(long long, long long, long long*, long long*, NBody::Particle*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/buildandsortarrays.cxx:71
    #2 0x564ec746f09d in SearchSubset(Options&, long long, long long, NBody::Particle*, long long&, long long, long long*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:1400
    #3 0x564ec748e1df in SearchSubSub(Options&, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle> >&, long long*&, long long&, long long&, PropData*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:2991
    #4 0x564ec71b164d in InvokeVelociraptorHydro /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:537
    #5 0x564ec71adf23 in InvokeVelociraptor /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:283
    #6 0x564ec711c969 in velociraptor_invoke /home/matthieu/Desktop/Swift-git/io/swiftsim/src/velociraptor_interface.c:1018
    #7 0x564ec70dced4 in engine_check_for_dumps /home/matthieu/Desktop/Swift-git/io/swiftsim/src/engine_io.c:400

Direct leak of 128 byte(s) in 8 object(s) allocated from:
    #0 0x7fbf1aad5b47 in operator new[](unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10fb47)
    #1 0x564ec74ba063 in CleanAndUpdateGroupsFromSubSearch(Options&, long long&, NBody::Particle*, long long*&, long long&, long long*&, long long**&, long long&, long long*&, long long*&, long long&, long long&) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:\
2772
    #2 0x564ec748e383 in SearchSubSub(Options&, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle> >&, long long*&, long long&, long long&, PropData*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:2993
    #3 0x564ec71b164d in InvokeVelociraptorHydro /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:537
    #4 0x564ec71adf23 in InvokeVelociraptor /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:283
    #5 0x564ec711c969 in velociraptor_invoke /home/matthieu/Desktop/Swift-git/io/swiftsim/src/velociraptor_interface.c:1018
    #6 0x564ec70dced4 in engine_check_for_dumps /home/matthieu/Desktop/Swift-git/io/swiftsim/src/engine_io.c:400

Direct leak of 80 byte(s) in 5 object(s) allocated from:
    #0 0x7fbf1aad5b47 in operator new[](unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10fb47)
    #1 0x564ec74ba063 in CleanAndUpdateGroupsFromSubSearch(Options&, long long&, NBody::Particle*, long long*&, long long&, long long*&, long long**&, long long&, long long*&, long long*&, long long&, long long&) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:\
2772
    #2 0x564ec748e383 in SearchSubSub(Options&, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle> >&, long long*&, long long&, long long&, PropData*) /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/search.cxx:2993
    #3 0x564ec71b164d in InvokeVelociraptorHydro /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:537
    #4 0x564ec71adf23 in InvokeVelociraptor /home/matthieu/Desktop/VELOCIraptor/VELOCIraptor-STF/src/swiftinterface.cxx:283
    #5 0x564ec711c969 in velociraptor_invoke /home/matthieu/Desktop/Swift-git/io/swiftsim/src/velociraptor_interface.c:1018
    #6 0x564ec70dce49 in engine_check_for_dumps /home/matthieu/Desktop/Swift-git/io/swiftsim/src/engine_io.c:453

(And also two SWIFT-related calls which are not leaks. These are just not easy to free so we don't clean up at the end of a run.)

Not much is leaked here by the VR invokations but it could nevertheless be a sign of a larger problem in bigger runs.

I don't know enough about these code sections in VR to know whether these are big problems or not.

Possible problem with M_200_star

I am running VR (version 1.60) on low resolution SWIFT runs, this particular box is 100^3Mpc with 180^3 particles (M_dm = 5.45e9, M_gas = 1.02e9). When analysing one of the observables we are interested in, namely baryon/gas fractions of halos, the baryon fraction was a lot higher than expected.

Upon further inspection it appears the values of the values obtained for M_200_crit_star (And possibly also M_200_mean_star, M_500_star etc.) are systematically higher than what would be expected.
To show this I compared the M_200 obtained from VR for the different components with the naive method of summing the mass of all particles of that component found within R_200_crit obtained from VR as can be seen in the figure below
MvsM
For DM the results have some difference but are relatively close to the black line (these are also a lot more likely to be sufficiently sampled) For stars, and to a much lesser extent also the gas, the values obtained from the sum are systematically lower. For gas the difference might just be the systematics of my naive method but for stars the difference is very large.

Here are some examples of halos with very high baryons fractions and large differences between the sum of stellar mass in R_200_crit and M_200_crit_star
High_SM_halo_1
log(M_200) [Msun] of this halo: 12.182202934684451
log(M_200_gas) [Msun] of this halo: 11.629046093260513
log(M_200_stars) [Msun] of this halo: 11.64550349357618
r_200 [kpc] of this halo: 237.24077771516048
Dark matter mass (Sum) [log10 Msun] = 12.249682
Gas mass (Sum) [log10 Msun] = 11.431889
Stellar mass (Sum) [log10 Msun] = 11.05762
High_SM_halo_2
log(M_200) [Msun] of this halo: 11.282095177295409
log(M_200_gas) [Msun] of this halo: 10.755906558809153
log(M_200_stars) [Msun] of this halo: 10.783129379073438
r_200 [kpc] of this halo: 118.89221547906023
Dark matter mass (Sum) [log10 Msun] = 11.440651
Gas mass (Sum) [log10 Msun] = 10.576609
Stellar mass (Sum) [log10 Msun] = 9.912582

If you need more information please let me know.

Compiling with DVR_USE_GAS=ON but no other options doesn't work

Describe the bug
Compiling with VR_USE_GAS=ON (but not the rest of VR_USE_HYDRO=ON) doesn't work.

To Reproduce
cmake .. -DVR_USE_GAS=ON
Latest master.
make -j

Expected behavior
The build works.

Log files

                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/bgfield.cxx(7):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/haloproperties.cxx(9):
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/fofalgo.cxx(5):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/mpihdfio.cxx(9):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/mpigadgetio.cxx(9):
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/mpiramsesio.cxx(9):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.cxx(5):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/ramsesio.cxx(20):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/localbgcomp.cxx(7):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/exceptions.cxx(9):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/hdfio.cxx(26):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(7):
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.cxx(5):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/mpiroutines.cxx(14):
In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/nchiladaio.cxx(7):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/mpinchiladaio.cxx(11):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/omproutines.cxx(11):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/unbind.cxx(8):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/mpitipsyio.cxx(10):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/swiftinterface.cxx(6):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/localfield.cxx(8):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/io.cxx(7):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/mpivar.cxx(9):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/gadgetio.cxx(7):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/tipsyio.cxx(7):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/buildandsortarrays.cxx(5):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/utilities.cxx(6):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/logging.h(7),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/ui.cxx(6):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

In file included from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/stf.h(8),
                 from /cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/search.cxx(9):
/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/allvars.h(2488): error: identifier "aperture_M_gas_highT" is undefined
  		aperture_M_gas_highT[i]*=opt.h;
  		^

/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3106): error: class "NBody::Particle" has no member "GetSFR"
  		   auto sfr = Pval->GetSFR();
  		                    ^

/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3119): error: class "NBody::Particle" has no member "GetZmet"
                         pdata[i].Z_mean_gas_highT_incl += massval * Pval->GetZmet();
                                                                           ^

/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3171): error: class "PropData" has no member "M_gas_nsf"
  	    pdata[hostindex].M_gas_nsf_incl += pdata[i].M_gas_nsf;
  	                                                ^

/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3172): error: class "PropData" has no member "M_gas_sf"
  	    pdata[hostindex].M_gas_sf_incl += pdata[i].M_gas_sf;
  	                                               ^

/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3189): error: class "PropData" has no member "M_gas_nsf"
              pdata[i].M_gas_nsf_incl += pdata[i].M_gas_nsf;
                                                  ^

/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3190): error: class "PropData" has no member "M_gas_sf"
              pdata[i].M_gas_sf_incl += pdata[i].M_gas_sf;
                                                 ^

/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3415): error: class "NBody::Particle" has no member "GetSFR"
                 sfr[j] = Part[taggedparts[j]].GetSFR();
                                               ^

/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3417): error: class "NBody::Particle" has no member "GetZmet"
                 Zgas[j] = Part[taggedparts[j]].GetZmet();
                                                ^

/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3483): error: class "NBody::Particle" has no member "GetSFR"
                              sfr[offset+j] = PartDataGet[taggedparts[j]].GetSFR();
                                                                          ^

/cosma/home/dc-borr1/c6dataspace/SPHENIX_Tests/swiftsim/curl_3d/examples/nIFTyCluster/Baryonic/VELOCIraptor-STF/src/substructureproperties.cxx(3485): error: class "NBody::Particle" has no member "GetZmet"
                              Zgas[offset+j] = PartDataGet[taggedparts[j]].GetZmet();
                                                                           ^```

SO list output too large and possibly wrong

Describe the bug
The SO lists produced by the current master appear to be about a factor 10 larger than those produced by older versions for the same snapshot. I have a particular example where the number of particles in the SO list is almost 8e8, for a simulation that has less than 1e8 particles. Unsurprisingly, the majority of the particles is reported to be part of multiple SOs (on average 9, with an extreme case of a particle being part of 53 SOs).

I manually checked the distances of the reported SO particles to the SO centre and found that the maximum distance is a factor 10-15 larger than the maximum SO radius (in this case R_100_rhocrit). I wonder whether this has an impact on the reported SO properties, since I am unable to reproduce the various mass values using the particles from the SO list within the various SO radii (they typically are off by a factor 3).

By comparing legacy snapshots processed with different versions, I managed to constrain the problem to the following diff: ICRAR:64348794522c16f96ef4890ee94a08615fbd06c4...ICRAR:a43325cec3108d40f61214bab0cab50069dcb258. The oldest version produces sensible SO lists for which we were able to manually confirm that the SO properties are correct, while the newest one produces the same results as current master which might or might not be correct, but definitely have excessively large SO lists. This is based on the commit hashes reported in the .configuration file; I was never actually able to run any of these versions myself due to various errors.

I tried this both with and without MPI enabled (all single node, single thread runs) and the results look similar (not exactly the same). The MPI-enabled output file is 3 times larger, but I guess this is due to compression.

Crash when running with parallel HDF5 and Calculate_radial_profiles=1

If Velociraptor is built with parallel HDF5 support and we have MPI_number_of_tasks_per_write>1 and Calculate_radial_profiles=1 in the .cfg file then the code crashes with the following message:

HDF5-DIAG: Error detected in HDF5 (1.10.3) MPI-process 15:
  #000: H5Dio.c line 322 in H5Dwrite(): could not get a validated dataspace from file_space_id
    major: Invalid arguments to routine
    minor: Bad value
  #001: H5S.c line 254 in H5S_get_validated_dataspace(): selection + offset not within extent
    major: Dataspace
    minor: Out of range
Failed to write dataset: Npart_profile

The problem seems to be in write_dataset_nd() in hdfitems.h. When parallel HDF5 is enabled one MPI communicator is created for each output file. All of the tasks in one communicator write to the same output file. Each task needs to calculate the offset at which it should write its data and this offset calculation is wrong - it results in tasks trying to write beyond the bounds of the dataset.

The offset is stored in dims_offset and calculated as follows (this occurs several times in hdfitems.h):

            MPI_Allgather(dims_single.data(), rank, MPI_UNSIGNED_LONG_LONG, mpi_hdf_dims.data(), rank, MPI_UNSIGNED_LONG_LONG, comm);
            MPI_Allreduce(dims_single.data(), mpi_hdf_dims_tot.data(), rank, MPI_UNSIGNED_LONG_LONG, MPI_SUM, comm);
            for (auto i=0;i<rank;i++) {
                dims_offset[i] = 0;
                if (flag_first_dim_parallel && i > 0) continue;
                for (auto j=1;j<=ThisWriteTask;j++) {
                    dims_offset[i] += mpi_hdf_dims[i*NProcs+j-1];
                }
            }

The dimensions of the arrays on all tasks are gathered in mpi_hdf_dims and used to compute the offset in each dimension for the data to be written by this task (dims_offset). Here I think the index into mpi_hdf_dims is wrong. I think

                for (auto j=1;j<=ThisWriteTask;j++) {
                    dims_offset[i] += mpi_hdf_dims[i*NProcs+j-1];
                }

should really be

                for (auto j=0;j<ThisWriteTask;j++) {
                    dims_offset[i] += mpi_hdf_dims[j*rank+i];
                }

because after the allgather mpi_hdf_dims contains rank elements from each MPI task. j is looping over the lower numbered MPI tasks and i is the index within the block received from each task.

This would mean that any multidimensional output arrays will either be corrupted or cause a crash.

Crash in on-the-fly Swift hydro run (extra property output not fully implemented?)

I'm running Swift with on the fly velociraptor on the EAGLE_low_z/EAGLE_12 example in the Swift repository using the vrconfig_3dfof_subhalos_SO_hydro.cfg parameter file. My VR configuration is

cmake .. \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_CXX_FLAGS_RELEASE="-O3 -xAVX -g" \
    -DCMAKE_C_FLAGS_RELEASE="-O3 -xAVX -g" \
    -DCMAKE_POSITION_INDEPENDENT_CODE=ON \
    -DCMAKE_C_COMPILER=icc \
    -DCMAKE_CXX_COMPILER=icpc \
    -DVR_USE_SWIFT_INTERFACE=ON \
    -DVR_USE_HYDRO=ON

and for Swift I use

../configure \
    --enable-ipo \
    --enable-debug \
    --with-hdf5 \
    --with-fftw \
    --with-parmetis \
    --with-gsl \
    --with-tbbmalloc \
    --with-hydro=sphenix --with-kernel=wendland-C2 --with-subgrid=EAGLE-XL \
    --with-velociraptor=`pwd`/../../VELOCIraptor-STF/build/src

and I run it with

mpirun -np 2 ../../../build/examples/swift_mpi \
    --cosmology --eagle --velociraptor \
    --threads=16 eagle_12.yml

This crashes on the first VR invocation in substructureproperties.cxx line 6024:

        x = Pval[i].GetHydroProperties();

At this point Pval[i].hydro is a null pointer, which I think is what causes the crash because GetHydroProperties just does "return *hydro".

Looking at swiftinterface.cxx, Part.hydro is only set if the parameter swift_gas_parts was passed to InvokeVelociraptorHydro() and was not null:

#ifdef GASON
    if (swift_gas_parts != NULL)
    {
        for (auto i=0; i<num_hydro_parts; i++)
        {
            index = swift_gas_parts[i].index;
            parts[index].SetHydroProperties(hydro);
        }
        free(swift_gas_parts);
    }
#endif

It looks like output of extra properties has been only partially implemented for on the fly runs. Things that are missing:

  • InvokeVelociraptorHydro() is never called with with swift_[gas/bh/star]_parts set in Swift or VR
  • There's no code in Swift's velociraptor_interface.c to generate the contents of swift_[gas/star/bh]_parts
  • VR's swiftinterface.cxx calls Part.SetHydroProperties() with an uninitialised HydroProperties instance so the data wouldn't get copied to the velociraptor particles anyway

If I comment out the extra properties in the .cfg file then it survives a VR invocation without crashing.

Running with VR_MPI_REDUCE=OFF, VR_USE_HYDRO=ON crashes

As described in #54 (comment) by @MatthieuSchaller:

If it helps, there is smaller test case here:

/snap7/scratch/dp004/jlvc76/SWIFT/EoS_tests/swiftsim/examples/EAGLE_low_z/EAGLE_6/eagle_0000.hdf5
/snap7/scratch/dp004/jlvc76/SWIFT/EoS_tests/swiftsim/examples/EAGLE_low_z/EAGLE_6/vrconfig_3dfof_subhalos_SO_hydro.cfg
This one crashes about 20s after start so might be easier.

Config is: cmake ../ -DVR_USE_HYDRO=ON -DCMAKE_BUILD_TYPE=Debug
Run command line is: stf -C vrconfig_3dfof_subhalos_SO_hydro.cfg -i eagle_0000 -o halos_0000 -I 2

Problem happens with gcc or ICC,
Running with -DVR_OPENMP=OFF also crashes in the same way,
Running without VR_MPI_REDUCE crashes in a different way. There the crash happens when reading in stuff.

This issue is to keep track of the last sentence. Indeed when running with -DVR_MPI_REDUCE=OFF the following crash happens:

[bolano:21822] Read -1, expected 50000000, errno = 14
[bolano:21822] *** Process received signal ***
[bolano:21822] Signal: Segmentation fault (11)
[bolano:21822] Signal code: Invalid permissions (2)
[bolano:21822] Failing at address: 0x7f75ac021000
[bolano:21822] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14bb0)[0x7f75baef6bb0]
[bolano:21822] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x1851d3)[0x7f75baaee1d3]
[bolano:21822] [ 2] /usr/lib/x86_64-linux-gnu/libopen-pal.so.40(opal_convertor_unpack+0x85)[0x7f75ba7f01c5]
[bolano:21822] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv_request_progress_frag+0x1bf)[0x7f75b8c1c5df]
[bolano:21822] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x95)[0x7f75b8c42ed5]
[bolano:21822] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x53a3)[0x7f75b8c433a3]
[bolano:21822] [ 6] /usr/lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x34)[0x7f75ba7de854]
[bolano:21822] [ 7] /usr/lib/x86_64-linux-gnu/libopen-pal.so.40(ompi_sync_wait_mt+0xb5)[0x7f75ba7e5315]
[bolano:21822] [ 8] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv+0x833)[0x7f75b8c0eff3]
[bolano:21822] [ 9] /usr/lib/x86_64-linux-gnu/libmpi.so.40(MPI_Recv+0xf5)[0x7f75bb2f93e5]
[bolano:21822] [10] builds/54/stf(_Z34MPIReceiveParticlesFromReadThreadsR7OptionsRPN5NBody8ParticleES3_RPiS6_S6_RPxRPP14ompi_request_tS4_+0x2a5)[0x55b35d3cd27b]
[bolano:21822] [11] builds/54/stf(_Z7ReadHDFR7OptionsRSt6vectorIN5NBody8ParticleESaIS3_EExRPS3_x+0xd240)[0x55b35d4df0cc]
[bolano:21822] [12] builds/54/stf(_Z8ReadDataR7OptionsRSt6vectorIN5NBody8ParticleESaIS3_EExRPS3_x+0x378)[0x55b35d36a87c]
[bolano:21822] [13] builds/54/stf(main+0xba7)[0x55b35d2b1311]
[bolano:21822] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf2)[0x7f75ba991cb2]
[bolano:21822] [15] builds/54/stf(_start+0x2e)[0x55b35d2b046e]
[bolano:21822] *** End of error message ***

Unexpected behaviour when VR is applied to cosmological zooms with baryons

Hi, I recently started working with cosmological zooms of Milky Way-mass haloes and I wanted to apply VELOCIraptor to the data in the same way as I do it for full cosmological boxes.

When I run VELOCIraptor on zooms to obtain the catalogues, I always find that VR struggles to correctly compute the properties of the central halo and of one of the subhaloes.

What I keep finding instead is that in all the zoom simulations,

  • Problem A. an object with the largest halo mass in the simulation, which is for some reason not classified as central, has too low stellar mass and too large r200.
  • Problem B. an object with a very small halo mass, which should not but is classified as central, has far too large stellar mass. This object also has a negative r200. [UPD: this problem is now gone]

Below is a simple python script that shows what is going on in the VR catalogue in one of the zoom simulations. The object with problem A (problem B) is the last one in the last (first) row of the output.

In [1]: import numpy as np

In [2]: import velociraptor as vr

In [3]: data = vr.load("halo_halo_10_0037.properties") # Load VR catalogue

In [4]: halo_mass = data.masses.mass_200crit  # Fetch halo masses

In [5]: stellar_mass = data.apertures.mass_star_30_kpc # Fetch stellar masses

In [6]: r200 = data.radii.r_200crit # Get r200

In [7]: sort_idx = np.argsort(halo_mass) # Sort according to halo mass

In [8]: stellar_mass.to("Msun")[sort_idx] # Show stellar masses
Out[8]: 
unyt_array([0.00000000e+00, 0.00000000e+00, 5.70441663e+09,  
            0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
            0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
            0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
            0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
            0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
            0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
            0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
            2.72257719e+06, 0.00000000e+00, 2.57739048e+05,
            1.74264642e+06, 2.02618167e+08, 5.25602289e+08], 'Msun')  # [The largest halo has too low stellar mass]

In [9]: r200.to("kpc")[sort_idx] # Show r200
Out[9]:
unyt_array([-1.00000000e+00, -1.00000000e+00, -6.91574142e-02,
            -1.94654432e+02,  8.12557225e+01,  2.44313447e+01,
             1.32149940e+01,  6.28237402e+00,  8.00385528e+01,
             2.38922979e+01,  3.74907068e+00,  3.73819116e+01,
             1.03565331e+01,  1.64424960e+01,  1.15966419e+01,
             4.34248021e+01,  2.39968011e+01,  2.79605862e+01,
             3.40829283e+01,  3.33680927e+01, -2.75381537e+01,
             1.10373186e+02, -2.33397677e+01,  5.57266117e+02,
             3.56556864e+02,  3.61359039e+02,  1.77724087e+03,
             1.29139616e+03,  8.10693203e+02, 4.50728670e+03], 'kpc') # [r200 seem to be very large for all (sub)haloes]

In [10]: data.structure_type.structuretype[sort_idx] # Show structure types (to see what is central and what is not)
Out[10]: 
unyt_array([10, 10, 10, 
            10, 20, 20, 
            20, 25, 20, 
            35, 20, 30, 
            20, 30, 30,
            20, 20, 30, 
            30, 30, 10, 
            30, 10, 20, 
            25, 25, 15, 
            15, 15, 20],  dtype=int32, units='dimensionless')  # [the largest halo is not a central]

Visualisation of the problematic (sub)haloes

The gas surface density and dark matter mass surface density of the object with problem A are

6_dens_faceon
6_dm_faceon

The gas surface density and dark matter mass surface density of the object with problem B are

0_dens_faceon
0_dm_faceon

To reproduce the bug

  • Path to the snapshot /cosma7/data/dp004/dc-chai1/zooms/batch1_12_05_21/halo_10/snapshot_halo_10_0037.hdf5
  • Path to VR config /cosma7/data/dp004/dc-chai1/vrconfig_3dfof_subhalos_SO_hydro_final2_zoom.cfg
  • Path to VR /cosma7/data/dp004/dc-chai1/VR_ICRAR_27_March/VELOCIraptor-STF/build/stf
  • VR version: master branch (64de17bff6925f47f3ebe8f8195108801d661d95, March 24, 2021)

VR was compiled as:

cmake -DVR_USE_GAS=ON -DVR_USE_STAR=ON -DVR_USE_BH=ON -DVR_MPI=NO

I tried compiling with -DVR_ZOOM_SIM but that didn't help.

I think that the problems occur because I am using a VR config that was originally designed for full cosmological boxes (not zooms). The solution will therefore most likely involve tweaking a few parameters in the VR config.

@MatthieuSchaller

OpenMP version of SO quantities

As mentioned in #57 (comment) there are a number of datasets that are widely different between runs of VR (standalone, but probably also with SWIFT) that are OpenMP-enabled and OpenMP-disabled.

A full list of the datasets that are different can be derived by comparing the two diffeernt output files mentioned in the comment linked above. I did the following:

vr_compare_datasets() {
        file1="$1"
        file2="$2"
        dataset="$3"
        lines=15
        diff -Naur <(h5dump -d $dataset "$file1" | head --lines $lines) <(h5dump -d $dataset "$file2" | head --lines $lines)
}

for dataset in `h5dump -n /cosma7/data/dp004/jlvc76/BAHAMAS/Roi_run/halos_omp_0036.properties.0 | grep SO_ | awk '{print $2}'`; do
    vr_compare_datasets /cosma7/data/dp004/jlvc76/BAHAMAS/Roi_run/halos_{omp_,}0036.properties.0 $dataset
done > so_quantities.diff

The resulting file shows the differences for all SO_* datasets between the two files. The following is a summary of the situation:

Equal

  • /SO_Mass_highT_0.100000_times_500.000000_rhocrit
  • /SO_Mass_highT_0.250000_times_500.000000_rhocrit
  • /SO_Mass_highT_0.500000_times_500.000000_rhocrit
  • /SO_Mass_highT_0.750000_times_500.000000_rhocrit
  • /SO_Mass_highT_1.000000_times_500.000000_rhocrit

Almost equal

Only a few values are slightly different, the rest are the same

  • /SO_Mass_1000_rhocrit
  • /SO_Mass_100_rhocrit
  • /SO_Mass_200_rhocrit
  • /SO_Mass_2500_rhocrit
  • /SO_Mass_500_rhocrit
  • /SO_R_1000_rhocrit
  • /SO_R_100_rhocrit
  • /SO_R_200_rhocrit
  • /SO_R_2500_rhocrit
  • /SO_R_500_rhocrit

Minor differences

Most values are different, but only after 2 or 3 decimal places.

  • /SO_Lx_1000_rhocrit
  • /SO_Lx_100_rhocrit
  • /SO_Lx_200_rhocrit
  • /SO_Lx_2500_rhocrit
  • /SO_Lx_500_rhocrit
  • /SO_Lx_gas_1000_rhocrit
  • /SO_Lx_gas_100_rhocrit
  • /SO_Lx_gas_200_rhocrit
  • /SO_Lx_gas_2500_rhocrit
  • /SO_Lx_gas_500_rhocrit
  • /SO_Lx_star_1000_rhocrit
  • /SO_Lx_star_100_rhocrit
  • /SO_Lx_star_200_rhocrit
  • /SO_Lx_star_2500_rhocrit
  • /SO_Lx_star_500_rhocrit

Also all equivalent Ly and Lz datasets.

Huge differences

Most values are different, even already at the most significant decimal place. Some values are the same though.

  • /SO_Mass_gas_1000_rhocrit
  • /SO_Mass_gas_100_rhocrit
  • /SO_Mass_gas_200_rhocrit
  • /SO_Mass_gas_2500_rhocrit
  • /SO_Mass_gas_500_rhocrit
  • /SO_Mass_gas_highT_0.100000_times_500.000000_rhocrit
  • /SO_Mass_gas_highT_0.250000_times_500.000000_rhocrit
  • /SO_Mass_gas_highT_0.500000_times_500.000000_rhocrit
  • /SO_Mass_gas_highT_0.750000_times_500.000000_rhocrit
  • /SO_Mass_gas_highT_1.000000_times_500.000000_rhocrit
  • /SO_Mass_star_1000_rhocrit
  • /SO_Mass_star_100_rhocrit
  • /SO_Mass_star_200_rhocrit
  • /SO_Mass_star_2500_rhocrit
  • /SO_Mass_star_500_rhocrit
  • /SO_T_gas_highT_0.100000_times_500.000000_rhocrit
  • /SO_T_gas_highT_0.500000_times_500.000000_rhocrit
  • /SO_T_gas_highT_0.750000_times_500.000000_rhocrit
  • /SO_T_gas_highT_1.000000_times_500.000000_rhocrit
  • /SO_Zmet_gas_highT_0.100000_times_500.000000_rhocrit
  • /SO_Zmet_gas_highT_0.250000_times_500.000000_rhocrit
  • /SO_Zmet_gas_highT_0.500000_times_500.000000_rhocrit
  • /SO_Zmet_gas_highT_0.750000_times_500.000000_rhocrit
  • /SO_Zmet_gas_highT_1.000000_times_500.000000_rhocrit

R200 doesn't make sense

Describe the bug
When plotting the FOF halo mass vs. R200 one expected a one-to-one relationship and currently the code returns something that looks far from that (with a lot of scatter - see attached plot from Chris Power https://user-images.githubusercontent.com/27806527/57223099-7fbb3c00-7037-11e9-94f6-d99961ad2f90.png).

To Reproduce

  1. Run the latest version of VR
  2. Using the config file vrconfig_3dfof_dmonly.cfg and Halo_linking_length_factor=1

Expected behavior
A clean relation between MFOF and R200 with no scatter

Inconsistent array names between properties files

Describe the bug

When using the hdf5 output but not parallel-hdf5, some arrays have a different name in the .0 file compared to all the others.

The three problematic fields are:

  • SubgridMasses_average_bh in the 0th file and SubgridMasses_index_0_average_bh in the others.
  • SubgridMasses_max_bh in the 0th file and SubgridMasses_index_0_max_bh in the others.
  • SubgridMasses_min_bh in the 0th file and SubgridMasses_index_0_min_bh in the others.

These are three of the four "extra BH properties" listed in our VR config file. The fourth quantity
is computed in apertures and somehow ends up with the same name in all the files.

extract of config:

# Collect the BH subgrid masses and compute the max, min, average and total mass in apertures                                                                                                                                                                                             
BH_internal_property_names=SubgridMasses,SubgridMasses,SubgridMasses,SubgridMasses,
BH_internal_property_input_output_unit_conversion_factors=1.0e10,1.0e10,1.0e10,1.0e10,
BH_internal_property_calculation_type=max,min,average,aperture_total,
BH_internal_property_output_units=solar_mass,solar_mass,solar_mass,solar_mass,

So, only extra properties computed at the level of the whole group seem affected.

I don't know whether the same problem appears for gas or star extra properties since all the ones we use
are also computed in apertures, not over the whole group.

To Reproduce
Steps to reproduce the behavior:

  1. Latest master
  2. Compile with VR_HDF5=ON VR_ALLOWPARALLELHDF5=OFF VR_OPENMP=OFF VR_MPI=ON VR_USE_HYDRO=ON (the openMP bit is likely irrelevant)
  3. Run over MPI on a swift-eagle snapshot
  4. The distributed properties file use different names.

Expected behavior
The name should be the same in all files. The name in the .0 file is the one that follows the other fields' convention.

Log files
Not relevant.

Environment (please complete the following information):
Not relevant

Concerning warning when building

Describe the bug
I get the following warning when building on my laptop:

/Users/mphf18/Documents/swift/VELOCIraptor-STF/src/ui.cxx:811:22: note: use '=='
      to turn this assignment into an equality comparison
                if (j=line.find(sep)){
                     ^
                     ==

Using

Apple clang version 11.0.0 (clang-1100.0.33.8)
Target: x86_64-apple-darwin20.2.0
cmake -DVR_USE_SWIFT_INTERFACE=ON -DVR_USE_HYDRO=OND -DCMAKE_CXX_FLAGS="-fPIC" -DCMAKE_BUILD_TYPE=Release -DOpenMP_CXX_FLAGS="-Xpreprocessor -fopenmp -I/usr/local/opt/libomp/include" -DOpenMP_CXX_LIB_NAMES="omp" -DOpenMP_omp_LIBRARY=/usr/local/opt/libomp/lib/libomp.dylib ..

Might be worth checking out.

Time the i/o section

Would it be possible to time the i/o part of the code? Even better if we could time individually the writing of the properties catalogs, the parttype and the rest.

VELOCIraptor zeros all fields with metallicity of star-forming gas

Describe the bug
In short , when I introduce a new gas internal property field to the VELOCIraptor config file, VELOCIraptor begins to zero all fields with metallicity of star-forming gas.

More preciesly, I have two config files of VR, where the diff between the two is

177,181c177,181
< Gas_internal_property_names=DensitiesAtLastSupernovaEvent,GraphiteMasses,SilicatesMasses,AtomicHydrogenMasses,IonisedHydrogenMasses,MolecularHydrogenMasses,HydrogenMasses,HeliumMasses,
< Gas_internal_property_index_in_file=0,0,0,0,0,0,0,0,
< Gas_internal_property_input_output_unit_conversion_factors=1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,
< Gas_internal_property_calculation_type=max,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,
< Gas_internal_property_output_units=solar_mass/MPc3,solar_mass,solar_mass,solar_mass,solar_mass,solar_mass,solar_mass,solar_mass,
---
> Gas_internal_property_names=DensitiesAtLastSupernovaEvent,GraphiteMasses,SilicatesMasses,AtomicHydrogenMasses,IonisedHydrogenMasses,MolecularHydrogenMasses,HydrogenMasses,HeliumMasses,IronOverHydrogenMasses,
> Gas_internal_property_index_in_file=0,0,0,0,0,0,0,0,0,
> Gas_internal_property_input_output_unit_conversion_factors=1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,1.0e10,
> Gas_internal_property_calculation_type=max,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,aperture_total,
> Gas_internal_property_output_units=solar_mass/MPc3,solar_mass,solar_mass,solar_mass,solar_mass,solar_mass,solar_mass,solar_mass,solar_mass,

i.e. compared to config 1, config 2 has an additional field IronOverHydrogenMasses.

After I run VR

../VR_ICRAR_27_March/VELOCIraptor-STF/stf -C vrconfig_3dfof_subhalos_SO_hydro_1.cfg  -i colibre_0023 -o halo_v1_0023 -I 2
../VR_ICRAR_27_March/VELOCIraptor-STF/stf -C vrconfig_3dfof_subhalos_SO_hydro_2.cfg  -i colibre_0023 -o halo_v2_0023 -I 2

and look into the output fields, I find


IPython 6.3.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from velociraptor import load

In [2]: import numpy as np

In [3]: output_from_config1 = load("halo_v1_0023.properties.0")

In [4]: output_from_config2 = load("halo_v2_0023.properties.0")

In [5]: np.max(output_from_config1.apertures.zmet_gas_sf_100_kpc)
Out[5]: unyt_quantity(0.01275099, '83.33*dimensionless')

In [6]: np.max(output_from_config2.apertures.zmet_gas_sf_100_kpc)
Out[6]: unyt_quantity(0., '83.33*dimensionless')

where one can see that in the latter case (config 2), the field apertures.zmet_gas_sf_100_kpc has no positive values. And this is true not only for the 100-kpc apertures but for all types of apertures.

To Reproduce

  • Path to VR I use /cosma7/data/dp004/dc-chai1/VR_ICRAR_27_March/VELOCIraptor-STF (version 64de17bff6925f47f3ebe8f8195108801d661d95)
  • Path to SWIFT snapshot /cosma7/data/dp004/dc-chai1/test_VR/colibre_0023.hdf5
  • Path to the two VR configs: /cosma7/data/dp004/dc-chai1/test_VR/vrconfig_3dfof_subhalos_SO_hydro_?.cfg, where ? is 1 for config 1 and 2 for config 2
  • The VR output files I used to draw this conclusion can be found in the same folder.

@MatthieuSchaller

Minor memory leak reported in VR

Follow the exact same instructions in #63, but including the fixes in #64. After VR finishes it reports the following memory leak:

[2157.023] [ info] main.cxx:26 Finished running VR
[2157.023] [ info] main.cxx:27 Memory report at main.cxx:27@void finish_vr(Options&): Average: 20.006 [TiB] Data: 20.006 [TiB] Dirty: 0 [B] Library: 0 [B] Peak: 20.020 [TiB] Resident: 7.437 [GiB] Shared: 8.539 [MiB] Size: 20.006 [TiB] Text: 4.375 [MiB] 

=================================================================
==19245==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x7f23f9635b07 in operator new[](unsigned long) ../../../../libsanitizer/asan/asan_new_delete.cpp:102
    #1 0x79d311 in ReadHDF(Options&, std::vector<NBody::Particle, std::allocator<NBody::Particle> >&, long long, NBody::Particle*&, long long) /cosma/home/dp004/dc-toba1/scm/git/VELOCIraptor-STF/src/hdfio.cxx:739
    #2 0x58eae5 in ReadData(Options&, std::vector<NBody::Particle, std::allocator<NBody::Particle> >&, long long, NBody::Particle*&, long long) /cosma/home/dp004/dc-toba1/scm/git/VELOCIraptor-STF/src/io.cxx:113
    #3 0x47661e in main /cosma/home/dp004/dc-toba1/scm/git/VELOCIraptor-STF/src/main.cxx:254
    #4 0x7f23f7433554 in __libc_start_main (/lib64/libc.so.6+0x22554)

SUMMARY: AddressSanitizer: 4 byte(s) leaked in 1 allocation(s).
(END)

Output file names inconsistent between MPI and non-MPI runs

A bit of a mix between bug and feature request.

When compiling the code with MPI all the output files have an .0 appended at the end of their name whilst the version of the code compiled without MPI does not. I think this is a left-over from the pre-parallel-hdf5 era where there were many files in the output.

Maybe the difference is intended behaviour though.

Another useful thing maybe would be to append a .hdf5 and .txt to the files to make humans (and hdf5 tools!) more happy when looking at files in a directory.

Segfault in destructor when computing sub-structure properties

Describe the bug
Code crashes with a segfault apparently in the (empty!) destructor of the hydro particles when computing some of the sub-structure properties.

To Reproduce

  • Input file: /cosma7/data/dp004/jlvc76/COLIBRE/cold_gas/colibre_0023.hdf5
  • Config file: /cosma7/data/dp004/jlvc76/COLIBRE/cold_gas/test.cfg
  • Version: 5eb6351
  • Compilation line: cmake ../ -DVR_USE_HYDRO=ON -DCMAKE_BUILD_TYPE=Debug -DVR_OPENMP=OFF
  • Runtime command: mpirun -np 16 stf -I 2 -i colibre_0023 -o haloes -C test.cfg

Crashes after ~350s. Last message printed (verbose = 0):

[0001] [ 246.855] [ info] search.cxx:3777 Done
[0001] [ 246.855] [ info] main.cxx:439 Baryon search with 1 threads finished in 11.813 [s]
[0001] [ 246.862] [ info] substructureproperties.cxx:5025 Sort particles and compute properties of 1100 objects

NbodyLib version: fcf1c17

Older versions
31ae376 and associated NBodyLib works.

Buffer overflow in PotentialTree with OpenMP

When running with the inputs and configuration from #87/#88 with 4 OpenMP threads, 1 MPI rank and compiling with clang address sanitizer I got the following output:

[0000] [  87.191] [debug] unbind.cxx:284 Unbinding 1521 groups ...
=================================================================
==182466==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020001e99a8 at pc 0x000000a457dc bp 0x7ff6f2c55470 sp 0x7ff6f2c55468
WRITE of size 8 at 0x6020001e99a8 thread T5
    #0 0xa457db in .omp_outlined._debug__.7 /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:1240:19
    #1 0xa46cfb in .omp_outlined..8 /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:1228:1
    #2 0x7ff7114e3ca2 in __kmp_invoke_microtask (/usr/lib/x86_64-linux-gnu/libomp.so.5+0xabca2)
    #3 0x7ff7114789c2  (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x409c2)
    #4 0x7ff7114775f9  (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x3f5f9)
    #5 0x7ff7114cb149  (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x93149)
    #6 0x7ff71153f58f in start_thread nptl/pthread_create.c:463:8
    #7 0x7ff71134c222 in clone misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95

0x6020001e99a8 is located 16 bytes to the right of 8-byte region [0x6020001e9990,0x6020001e9998)
allocated by thread T0 here:
    #0 0x53ca8d in operator new[](unsigned long) (/home/rtobar/scm/git/VELOCIraptor-STF/builds/88-debug-addrsan/stf+0x53ca8d)
    #1 0xa40b32 in PotentialTree(Options&, long long, NBody::Particle*&, NBody::KDTree*&) /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:1185:11
    #2 0xa3db16 in Potential(Options&, long long, NBody::Particle*) /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:960:5
    #3 0xa5357e in CalculatePotentials(Options&, NBody::Particle**, long long&, long long*) /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:434:13
    #4 0xa38e98 in Unbind(Options&, NBody::Particle**, long long&, long long*, long long*, long long**, int) /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:793:5
    #5 0xa48efd in CheckUnboundGroups(Options, long long, NBody::Particle*, long long&, long long*&, long long*, long long**, int, long long*) /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:350:18
    #6 0x8b040f in SearchBaryons(Options&, long long&, NBody::Particle*&, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle> >&, long long*&, long long&, long long&, int, int, PropData*) /home/rtobar/scm/git/VELOCIraptor-STF/src/search.cxx:3840:13
    #7 0x547208 in main /home/rtobar/scm/git/VELOCIraptor-STF/src/main.cxx:463:13
    #8 0x7ff71125bcb1 in __libc_start_main csu/../csu/libc-start.c:314:16

Thread T5 created by T0 here:
    #0 0x4f770a in pthread_create (/home/rtobar/scm/git/VELOCIraptor-STF/builds/88-debug-addrsan/stf+0x4f770a)
    #1 0x7ff7114ca823  (/usr/lib/x86_64-linux-gnu/libomp.so.5+0x92823)

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/rtobar/scm/git/VELOCIraptor-STF/src/unbind.cxx:1240:19 in .omp_outlined._debug__.7
Shadow bytes around the buggy address:
  0x0c04800352e0: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
  0x0c04800352f0: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
  0x0c0480035300: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
  0x0c0480035310: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
  0x0c0480035320: fa fa fd fa fa fa fd fa fa fa fd fa fa fa 00 fa
=>0x0c0480035330: fa fa 00 fa fa[fa]fa fa fa fa fa fa fa fa fa fa
  0x0c0480035340: fa fa fa fa fa fa fa fa fa fa fa fa fa fa 00 fa
  0x0c0480035350: fa fa 00 fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0480035360: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0480035370: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0480035380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==182466==ABORTING
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[57249,1],0]
  Exit code:    1
--------------------------------------------------------------------------

Segmentation fault in AdjustStructureForPeriod when running on the fly in Swift

This has been observed in several simulations but one quick way to reproduce it is the examples/SmallCosmoVolume/SmallCosmoVolume_DM example in the Swift repository. The problem appears to have been introduced in VR commit b5e1a8c.

I'm using the latest ICRAR VR master (a497fe3) configured with:

cmake .. \
    -DCMAKE_BUILD_TYPE=Debug \
    -DCMAKE_C_COMPILER=icc \
    -DCMAKE_CXX_COMPILER=icpc \
    -DVR_USE_SWIFT_INTERFACE=ON \
    -DVR_USE_GAS=OFF

and the latest Swift master (5fab1cdb81e869fd3adf198a7e0509f5e87eb093) configured with:

../configure \
    --enable-debug \
    --with-velociraptor=`pwd`/../../VELOCIraptor-STF/build/src

The parameter files I'm using are from the Swift repository - see examples/SmallCosmoVolume/SmallCosmoVolume_DM/small_cosmo_volume_dm.yml and examples/SmallCosmoVolume/SmallCosmoVolume_DM/vrconfig_3dfof_subhalos_SO_hydro.cfg.

To run it:

cd swiftsim/examples/SmallCosmoVolume/SmallCosmoVolume_DM
./getIC.sh
mpirun -np 1 ../../../build/examples/swift_mpi --cosmology --self-gravity --velociraptor --threads=16 small_cosmo_volume_dm.yml

It runs for a few minutes and crashes when it gets to a redshift ~3:

0: finished FOF search in total time of 3.17214
[login7c:219197:0:219197] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))

/cosma7/data/dp004/jch/Swift/repro_bug/VELOCIraptor-STF/src/search.cxx: [ AdjustStructureForPeriod() ]
      ...
      976     for (i=0;i<nbodies;i++) {
      977         if (pfof[i]==0) continue;
      978         if (irefpos[pfof[i]] != -1) continue;
==>   979         refpos[pfof[i]] = Coordinate(Part[i].GetPosition());
      980         irefpos[pfof[i]] = i;
      981     }
      982 #ifdef USEOPENMP

==== backtrace (tid: 219197) ====
 0 0x000000000071d5c2 AdjustStructureForPeriod()  /cosma7/data/dp004/jch/Swift/repro_bug/VELOCIraptor-STF/src/search.cxx:979
 1 0x0000000000715e4b SearchFullSet()  /cosma7/data/dp004/jch/Swift/repro_bug/VELOCIraptor-STF/src/search.cxx:482
 2 0x000000000058e652 InvokeVelociraptorHydro()  /cosma7/data/dp004/jch/Swift/repro_bug/VELOCIraptor-STF/src/swiftinterface.cxx:615
 3 0x000000000058c837 InvokeVelociraptor()  /cosma7/data/dp004/jch/Swift/repro_bug/VELOCIraptor-STF/src/swiftinterface.cxx:398
 4 0x00000000005686ba velociraptor_invoke..0()  /cosma7/data/dp004/jch/Swift/repro_bug/swiftsim/build/examples/../../src/velociraptor_interface.c:1026
 5 0x00000000005824a0 engine_check_for_dumps()  /cosma7/data/dp004/jch/Swift/repro_bug/swiftsim/build/examples/../../src/engine.c:2974
 6 0x000000000041331a engine_step()  /cosma7/data/dp004/jch/Swift/repro_bug/swiftsim/build/examples/../../src/engine.c:2819
 7 0x000000000041331a engine_step()  /cosma7/data/dp004/jch/Swift/repro_bug/swiftsim/build/examples/../../src/engine.c:2828
 8 0x000000000040f1cd main()  /cosma7/data/dp004/jch/Swift/repro_bug/swiftsim/build/examples/../../examples/main.c:1490
 9 0x0000000000022555 __libc_start_main()  ???:0
10 0x000000000040cda9 _start()  ???:0
=================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 219197 RUNNING AT login7c.pri.cosma7.alces.network
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

The problem is that on line 979 pfof[i]=4627676946019562717, which is clearly not a valid array index. The index i=262144 at this point. This simulation has 262144 particles. The array pfof has more values than this (I think because particles are duplicated so that threads have whole groups to work on) and it seems to be the extra elements after the first 262144 which have problematic values.

I haven't been able to reproduce this running standalone VR on the same simulation.

VR crashes when computing black hole subgrid masses

Hi all, recently I have been running VR on SWIFT output containing black hole particles with subgrid properties. VR crashes while it is computing the BH subgrid properties. The VR output with the crash looks as follows:

Opening group PartType0: Data set Coordinates
Opening group PartType1: Data set Coordinates
Opening group PartType4: Data set Coordinates
Opening group PartType5: Data set Coordinates
Opening group PartType0: Data set Velocities
Opening group PartType1: Data set Velocities
Opening group PartType4: Data set Velocities
Opening group PartType5: Data set Velocities
Opening group PartType0: Data set ParticleIDs
Opening group PartType1: Data set ParticleIDs
Opening group PartType4: Data set ParticleIDs
Opening group PartType5: Data set ParticleIDs
Opening group PartType0: Data set Masses
Opening group PartType1: Data set Masses
Opening group PartType4: Data set Masses
Opening group PartType5: Data set DynamicalMasses
Opening group PartType0: Data set InternalEnergies
Opening group PartType0: Data set StarFormationRates
Opening group PartType0: Data set MetalMassFractions
Opening group PartType4: Data set MetalMassFractions
Opening group PartType4: Data set BirthScaleFactors
Opening group PartType0: Data set ElementMassFractions
Opening group PartType0: Data set SpeciesFractions
Opening group PartType0: Data set SpeciesFractions
Opening group PartType0: Data set SpeciesFractions
Opening group PartType5: Data set SubgridMasses
HDF5-DIAG: Error detected in HDF5 (1.10.3) MPI-process 0:
  #000: H5S.c line 921 in H5Sget_simple_extent_ndims(): not a dataspace
    major: Invalid arguments to routine
    minor: Inappropriate type
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_default_append
vel_rap_exact.sh: line 123: 185245 Aborted                 VR_ICRAR/VELOCIraptor-STF/build/stf -C vrconfig_3dfof_subhalos_SO_hydro.cfg -i /snap7/scratch/dp004/dc-chai1/my_cosmological_box/AGN_L006N188_00/colibre_2729 -o /snap7/scratch/dp004/dc-chai1/my_cosmological_box/AGN_L006N188_00/halo_2729 -I 2

If I comment out the lines

BH_internal_property_names=SubgridMasses,SubgridMasses,SubgridMasses,SubgridMasses,
BH_internal_property_input_output_unit_conversion_factors=1.0e10,1.0e10,1.0e10,1.0e10,
BH_internal_property_calculation_type=max,min,average,aperture_total,
BH_internal_property_output_units=solar_mass,solar_mass,solar_mass,solar_mass,

in the VR config file, the bug dissapperas and VR runs smoothly.

  • I am using the latest (18.08.20) version of VR from the Master branch of ICRAR
  • I compiled VR as: cmake -DVR_USE_HYDRO=ON
  • On cosma7, I loaded the following modules gsl/2.4, intel_mpi/2018, cmake/3.18.1, intel_comp/2018, parallel_hdf5/1.10.3
  • On cosma, the VR config file I am using can be accessed via /cosma7/data/dp004/dc-chai1/vrconfig_3dfof_subhalos_SO_hydro.cfg
  • The SWIFT snapshot file I am running VR on resides at: /snap7/scratch/dp004/dc-chai1/my_cosmological_box/AGN_L006N188_00_iso/colibre_2729.hdf5
  • I double-checked that the snapshot contains the SubgridMasses field.

Negative densities / other crashes with latest master

Describe the bug

I've been trying to run the latest(ish) master of VR on some SWIFT outputs (on COSMA7), and I've been getting a couple of odd crashes.

To Reproduce

Version cb4336d.

Ran on snapshots under /snap7/scratch/dp004/dc-borr1/new_randomness_runs/runs/Run_*

Log files

STDOUT:

...
[ 528.257] [debug] search.cxx:2716 Substructure at sublevel 1 with 955 particles
[ 528.257] [debug] unbind.cxx:284 Unbinding 1 groups ...
[ 528.257] [debug] unbind.cxx:379 Finished unbinding in 1 [ms]. Number of groups remaining: 2

STDERR:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Particle density not positive, cannot continue

Environment (please complete the following information):

cmake .. -DCMAKE_CXX_FLAGS="-O3 -march=native" -DVR_MPI=OFF -DVR_HDF5=ON -DVR_ALLOWPARALLELHDF5=ON -DVR_USE_HYDRO=ON

Currently Loaded Modulefiles:
 1) python/3.6.5               5) parallel_hdf5/1.8.20     
 2) ffmpeg/4.0.2               6) gsl/2.4(default)         
 3) intel_comp/2018(default)   7) fftw/3.3.7(default)      
 4) intel_mpi/2018             8) parmetis/4.0.3(default)  

OMP warning at runtime when compiling with Intel MPI 2020.2

We are in the process of upgrading our compilation tool stack to Intel MPI 2020 (2020.2 to be specific). The code compiles without problems (cmake ../ -DVR_USE_HYDRO=ON -DCMAKE_BUILD_TYPE=Release -DVR_MPI=off) but at runtime, I get the following messages:

OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.

The code keeps running and seems to produce correct results nevertheless.

This happens in between

First build tree ... 

and

0: finished building 64 domains and trees

Writing parallel properties file in hydro builds is broken

After fixing #88 and #100 (so same build configuration, inputs, etc.), and when running with parallel HDF5 writing with 8 ranks and 2 nodes, I've been running into the following problem:

[0000] [ 944.345] [ info] io.cxx:1646 Saving property data to lala.properties.0
[0002] [ 944.345] [ info] io.cxx:1646 Saving property data to lala.properties.0
[0001] [ 944.345] [ info] io.cxx:1646 Saving property data to lala.properties.0
[0003] [ 944.345] [ info] io.cxx:1646 Saving property data to lala.properties.0
[0005] [ 944.348] [ info] io.cxx:1646 Saving property data to lala.properties.0
[0004] [ 944.348] [[0006] [ 944.[0007] [ 944.348] [ info]  info] io.cxx:1646 Saving property data to lala.properties.0348] [ info] io.cxx:1646 Saving property data to lala.properties.0
io.cxx:1646 Saving property data to lala.properties.0

[0001] [ 954.686] [ info] [0002] [ 954.686] [ info] io.cxx:2907 Wrote lala.properties.0 in 12.316 [s]
[0004] [ 954.686] [ info] io.cxx:2907 Wrote lala.properties.0 in [0003] [ 954.686] [ info] io.cxx:2907 Wrote lala.properties.0 in 16.645 [s]
[0005] [ 954.686] [ info] io.cxx:2907 Wrote lala.properties.0 in 10.341 [s]
io.cxx:2907 Wrote lala.properties.0 in 16.898 [s]
[0006] [ 954.686] [ info] io.cxx:2907 Wrote lala.properties.0 in 13.529 [s]
HDF5-DIAG: Error detected in HDF5 (1.10.3) MPI-process 0:
  #000: H5O.c line 120 in H5Oopen(): unable to open object
    major: Object header
    minor: Can't open object
  #001: H5Oint.c line 596 in H5O__open_name(): unable to open object
    major: Object header
    minor: Can't open object
  #002: H5Oint.c line 551 in H5O_open_name(): object not found
    major: Object header
    minor: Object not found
  #003: H5Gloc.c line 422 in H5G_loc_find(): can't find object
    major: Symbol table
    minor: Object not found
  #004: H5Gtraverse.c line 851 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #005: H5Gtraverse.c line 627 in H5G__traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
  #006: H5Gloc.c line 378 in H5G__loc_find_cb(): object 'ID' doesn't exist
    major: Symbol table
    minor: Object not found
Unable to open object to write attribute: Dimension_Mass[0007] [ 954.686] [ info] io.cxx:2907 Wrote lala.properties.0 in 12.479 [s]

application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
16.452 [s]

This could be a race-condition with rank 0 trying to write an attribute while other ranks haven't closed their HDF5 file descriptors yet.

VELOCIraptor-STF/src/io.cxx

Lines 2883 to 2902 in 4cdbd52

#endif
if (opt.ibinaryout!=OUTHDF) Fout.close();
#ifdef USEHDF
else Fhdf.close();
#endif
//write the units as metadata for each data set
#ifdef USEHDF
Fhdf.append(string(fname), H5F_ACC_RDWR, 0, false);
#ifdef USEPARALLELHDF
if (ThisWriteTask==0) {
#endif
for (auto ientry=0;ientry<head.headerdatainfo.size();ientry++) {
WriteHeaderUnitEntry(opt, Fhdf, head.headerdatainfo[ientry], head.unitdatainfo[ientry]);
}
#ifdef USEPARALLELHDF
}
#endif
Fhdf.close();
#endif

Indeed, in the (somewhat scrambled) output shown above there are only 7 logs with "Wrote lala.properties.0", which would support this theory.

Crash in latest master with full hydro run over MPI

Describe the bug

Segfault when linking FOF fragments over MPI:

[0000] [ 264.245] [ info] search.cxx:353 Finished linking across MPI domains in 2.409 [min]
*** Error in `/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf': munmap_chunk(): invalid pointer: 0x000000009108a0f0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f474)[0x7fb7e6311474]
/cosma/local/Intel/Parallel_Studio_XE_2018/impi/2018.2.199//lib64/libmpi.so.12(+0x15b2fd)[0x7fb7e83902fd]
/cosma/local/Intel/Parallel_Studio_XE_2018/impi/2018.2.199//lib64/libmpi.so.12(MPI_Sendrecv+0x779)[0x7fb7e869ec69]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(_Z39MPISendReceiveFOFStarInfoBetweenThreadsR7OptionsP8fofid_inRSt6vectorIxSaIxEERS3_IfSaIfEEiiRi+0x2d5)[0x5cbbe5]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(_Z16MPIGroupExchangeR7OptionsxPN5NBody8ParticleERPx+0x1491)[0x5ca311]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(_Z13SearchFullSetR7OptionsxRSt6vectorIN5NBody8ParticleESaIS3_EERx+0x8491)[0x5fd581]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(main+0x10c6)[0x432936]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fb7e62b4555]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf[0x4317a9]
======= Memory map: ========

To Reproduce
Snapshot: /cosma7/data/dp004/jlvc76/SWIFT/May21_runs/swiftsim/examples/EAGLE_ICs/EAGLE_25_mpi/eagle_0036.hdf5
Config file: /cosma7/data/dp004/jlvc76/SWIFT/May21_runs/swiftsim/examples/EAGLE_ICs/EAGLE_25_mpi/vrconfig_3dfof_subhalos_SO_hydro.cfg

Code compilation options:
-DVR_MPI=ON -DVR_OPENMP=ON -VR_USE_HYDRO (nothing else, e.g. no ZOOM)

Compiler: Intel 2021.1.0
MPI: Intel MPI 2018

Run command:

mpirun -np 8 stf -C vrconfig_3dfof_subhalos_SO_hydro.cfg -i eagle_0036 -o halos_new_VR_0036 -I 2

Note, that leads to 3 OMP threads / rank.

Log files
Last few lines of output:

iB] 
[0004] [ 256.663] [ info] [0003] [ 256.667] [[0001] [ 256.669] [ info] [0000] [ 256.673] [mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
[0006] [ 256.676] [ info] mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
 info] [0002] [ 256.694] [ info] [0005] [ 256.690] [ info] mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
[0007] [ 256.703] [mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
 info]  info] mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
mpiroutines.cxx:3118 Finding number of particles to export to other MPI domains...
[0000] [ 258.263] [ info] [0001] [ 258.263] [ info] [0002] [ 258.271] [[0003] [ 258.266] [ info] search.cxx:316 Finished local search, nexport/nimport = 76718 88213 in 29.965 [s]
[0003] [ 258.266] [ info] search.cxx:317 MPI search will require extra memory of 31.458 [MiB]
[0004] [ 258.263] [ info] [0005] [ 258.262] [ info] [0006] [ 258.262] [ info] [0007] [ 258.271] [ info]  info] search.cxx:316 Finished local search, nexport/nimport = 164413 165337 in 1.622 [s]
[0007] [ 258.281] [[0003] [ 258.277] [ info] mpiroutines.cxx:3292 Now building exported particle list for FOF search 
search.cxx:316 Finished local search, nexport/nimport = 62422 62962 in 2.309 [min]
[0000] [ 258.277] [ info]  info] search.cxx:316 Finished local search, nexport/nimport = 49708 47752 in 2.446 [min]
[0004] [ 258.277] [ info] search.cxx:316 Finished local search, nexport/nimport = 64596 64910 in 2.217 [min]
[0001] [ 258.277] [ info] search.cxx:316 Finished local search, nexport/nimport = 111153 102204 in 1.361 [min]
[0005] [ 258.276] [ info] search.cxx:316 Finished local search, nexport/nimport = 101480 100349 in 1.973 [min]
[0006] [ 258.277] [ info] search.cxx:317 MPI search will require extra memory of 62.895 [MiB]
search.cxx:317 MPI search will require extra memory of 18.589 [MiB]
search.cxx:317 MPI search will require extra memory of 40.695 [MiB]
search.cxx:317 MPI search will require extra memory of 23.915 [MiB]
search.cxx:317 MPI search will require extra memory of 24.701 [MiB]
search.cxx:316 Finished local search, nexport/nimport = 74975 73738 in 56.573 [s]
[0002] [ 258.301] [search.cxx:317 MPI search will require extra memory of 38.496 [MiB]
[0004] [ 258.296] [ info] mpiroutines.cxx:3292 Now building exported particle list for FOF search 
 info] [0007] [ 258.318] [ info] mpiroutines.cxx:3292 Now building exported particle list for FOF search 
search.cxx:317 MPI search will require extra memory of 28.365 [MiB]
[0001] [ 258.346] [ info] [0000] [ 258.349] [ info] [0005] [ 258.347] [ info] mpiroutines.cxx:3292 Now building exported particle list for FOF search 
mpiroutines.cxx:3292 Now building exported particle list for FOF search 
mpiroutines.cxx:3292 Now building exported particle list for FOF search 
[0006] [ 258.363] [ info] mpiroutines.cxx:3292 Now building exported particle list for FOF search 
[0002] [ 258.392] [ info] mpiroutines.cxx:3292 Now building exported particle list for FOF search 
[0000] [ 259.952] [debug] search.cxx:339 [0001] [ 259.952] [debug] search.cxx:339 [0002] [ 259.960] [debug] search.cxx:339 [0003] [ 259.954] [debug] search.cxx:339 [0004] [ 259.952] [debug] search.cxx:339 [0005] [ 259.951] [debug] search.cxx:339 [0006] [ 259.951] [debug] search.cxx:339 [0007] [ 259.960] [debug] Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 4.911 [GiB] Data: 5.059 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 5.221 [GiB] Resident: 5.028 [GiB] Shared: 13.777 [MiB] Size: 5.221 [GiB] Text: 3.871 [MiB] 
[0000] [ 259.952] [ info] search.cxx:341 Starting to linking across MPI domains
Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 4.240 [GiB] Data: 4.538 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 4.701 [GiB] Resident: 4.452 [GiB] Shared: 9.504 [MiB] Size: 4.701 [GiB] Text: 3.871 [MiB] 
[0001] [ 259.952] [ info] search.cxx:341 Starting to linking across MPI domains
Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 7.818 [GiB] Data: 8.338 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 8.500 [GiB] Resident: 8.202 [GiB] Shared: 9.680 [MiB] Size: 8.500 [GiB] Text: 3.871 [MiB] 
[0002] [ 259.960] [ info] search.cxx:341 Starting to linking across MPI domains
Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 3.484 [GiB] Data: 3.624 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 3.786 [GiB] Resident: 3.503 [GiB] Shared: 9.500 [MiB] Size: 3.786 [GiB] Text: 3.871 [MiB] 
[0003] [ 259.955] [ info] search.cxx:341 Starting to linking across MPI domains
Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 2.913 [GiB] Data: 3.004 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 3.195 [GiB] Resident: 2.906 [GiB] Shared: 9.469 [MiB] Size: 3.166 [GiB] Text: 3.871 [MiB] 
[0004] [ 259.952] [ info] search.cxx:341 Starting to linking across MPI domains
Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 5.717 [GiB] Data: 6.019 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 6.181 [GiB] Resident: 5.942 [GiB] Shared: 9.676 [MiB] Size: 6.181 [GiB] Text: 3.871 [MiB] 
[0005] [ 259.951] [ info] search.cxx:341 Starting to linking across MPI domains
Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 5.911 [GiB] Data: 6.351 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 6.513 [GiB] Resident: 6.236 [GiB] Shared: 9.664 [MiB] Size: 6.513 [GiB] Text: 3.871 [MiB] 
[0006] [ 259.951] [ info] search.cxx:341 Starting to linking across MPI domains
search.cxx:339 Memory report at search.cxx:339@long long *SearchFullSet(Options &, long long, std::vector<NBody::Particle, std::allocator<NBody::Particle>> &, long long &): Average: 9.796 [GiB] Data: 10.618 [GiB] Dirty: 0 [B] Library: 0 [B] Peak: 10.780 [GiB] Resident: 10.538 [GiB] Shared: 9.707 [MiB] Size: 10.780 [GiB] Text: 3.871 [MiB] 
[0007] [ 259.973] [ info] search.cxx:341 Starting to linking across MPI domains
[0000] [ 264.245] [ info] search.cxx:353 Finished linking across MPI domains in 2.409 [min]
*** Error in `/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf': munmap_chunk(): invalid pointer: 0x000000009108a0f0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f474)[0x7fb7e6311474]
/cosma/local/Intel/Parallel_Studio_XE_2018/impi/2018.2.199//lib64/libmpi.so.12(+0x15b2fd)[0x7fb7e83902fd]
/cosma/local/Intel/Parallel_Studio_XE_2018/impi/2018.2.199//lib64/libmpi.so.12(MPI_Sendrecv+0x779)[0x7fb7e869ec69]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(_Z39MPISendReceiveFOFStarInfoBetweenThreadsR7OptionsP8fofid_inRSt6vectorIxSaIxEERS3_IfSaIfEEiiRi+0x2d5)[0x5cbbe5]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(_Z16MPIGroupExchangeR7OptionsxPN5NBody8ParticleERPx+0x1491)[0x5ca311]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(_Z13SearchFullSetR7OptionsxRSt6vectorIN5NBody8ParticleESaIS3_EERx+0x8491)[0x5fd581]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf(main+0x10c6)[0x432936]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fb7e62b4555]
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf[0x4317a9]

Additional context
The exact same code version, compiled in an identical fashion, runs without any problems with just OMP parallelisation.

Uninitialised variables in PropData class

When running the code with a similar setup to that from #87 under valgrind I get the following errors:

==25566== Thread 1:
==25566== Conditional jump or move depends on uninitialised value(s)
==25566==    at 0x375E3E: CalculateSphericalOverdensityExclusive(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&) (substructureproperties.cxx:7335)
==25566==    by 0x388960: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:434)
==25566==    by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566==    by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566==    by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566==    by 0x1BA40E: main (main.cxx:555)
==25566== 
==25566== Conditional jump or move depends on uninitialised value(s)
==25566==    at 0x375FD2: CalculateSphericalOverdensityExclusive(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&) (substructureproperties.cxx:7339)
==25566==    by 0x388960: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:434)
==25566==    by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566==    by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566==    by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566==    by 0x1BA40E: main (main.cxx:555)
==25566== 
==25566== Conditional jump or move depends on uninitialised value(s)
==25566==    at 0x376080: CalculateSphericalOverdensityExclusive(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&) (substructureproperties.cxx:7348)
==25566==    by 0x388960: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:434)
==25566==    by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566==    by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566==    by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566==    by 0x1BA40E: main (main.cxx:555)
==25566== 
==25566== Conditional jump or move depends on uninitialised value(s)
==25566==    at 0x376572: SetSphericalOverdensityMasstoTotalMassExclusive(Options&, PropData&) (substructureproperties.cxx:7397)
==25566==    by 0x388998: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:435)
==25566==    by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566==    by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566==    by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566==    by 0x1BA40E: main (main.cxx:555)
==25566== 
==25566== Conditional jump or move depends on uninitialised value(s)
==25566==    at 0x37657C: SetSphericalOverdensityMasstoTotalMassExclusive(Options&, PropData&) (substructureproperties.cxx:7397)
==25566==    by 0x388998: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:435)
==25566==    by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566==    by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566==    by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566==    by 0x1BA40E: main (main.cxx:555)
==25566== 
==25566== Conditional jump or move depends on uninitialised value(s)
==25566==    at 0x52348DC: sqrt (w_sqrt_compat.c:31)
==25566==    by 0x3876D7: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:515)
==25566==    by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566==    by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566==    by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566==    by 0x1BA40E: main (main.cxx:555)
==25566== Conditional jump or move depends on uninitialised value(s)
==25566==    at 0x375314: CalculateSphericalOverdensitySubhalo(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&, int) (substructureproperties.cxx:7236)
==25566==    by 0x388A94: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:430)
==25566==    by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566==    by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566==    by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566==    by 0x1BA40E: main (main.cxx:555)
==25566== 
==25566== Conditional jump or move depends on uninitialised value(s)
==25566==    at 0x375362: CalculateSphericalOverdensitySubhalo(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&, int) (substructureproperties.cxx:7237)
==25566==    by 0x388A94: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:430)
==25566==    by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566==    by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566==    by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566==    by 0x1BA40E: main (main.cxx:555)
==25566== 
==25566== Conditional jump or move depends on uninitialised value(s)
==25566==    at 0x3753B0: CalculateSphericalOverdensitySubhalo(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&, int) (substructureproperties.cxx:7238)
==25566==    by 0x388A94: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:430)
==25566==    by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566==    by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566==    by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566==    by 0x1BA40E: main (main.cxx:555)
==25566== 
==25566== Conditional jump or move depends on uninitialised value(s)
==25566==    at 0x3753FE: CalculateSphericalOverdensitySubhalo(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&, int) (substructureproperties.cxx:7239)
==25566==    by 0x388A94: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:430)
==25566==    by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566==    by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566==    by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566==    by 0x1BA40E: main (main.cxx:555)
==25566== 
==25566== Conditional jump or move depends on uninitialised value(s)
==25566==    at 0x37544C: CalculateSphericalOverdensitySubhalo(Options&, PropData&, long long&, NBody::Particle*, double&, double&, double&, double&, double&, std::vector<double, std::allocator<double> >&, int) (substructureproperties.cxx:7240)
==25566==    by 0x388A94: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) [clone ._omp_fn.1] (substructureproperties.cxx:430)
==25566==    by 0x55648E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==25566==    by 0x353ECA: GetProperties(Options&, long long, NBody::Particle*, long long, long long*&, long long*&, PropData*&, long long*&) (substructureproperties.cxx:414)
==25566==    by 0x3674B5: SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) (substructureproperties.cxx:5069)
==25566==    by 0x1BA40E: main (main.cxx:555)
==25566== 

These all seem to stem from the fact that the PropData class has many members with uninitialised variables that are read in various places:

  • substructureproperties.cxx:7335, :7339 and :7397 read gRvir_excl.
  • substructureproperties.cxx:7236 and :515 read gM200c.
  • substructureproperties.cxx:7348 reads gMvir_excl.
  • substructureproperties.cxx:7237 reads gM200m.
  • substructureproperties.cxx:7238 reads gMvir.
  • It's not immediately obvious what's wrong with substructureproperties.cxx:7239 and :7240, so those will require more diagnosis.

Crash when reading in temperatures in the branch `hot_gas_properties`

There seems to be a mismatch between the population of the array hdf_parts->names and its usage.

On line 1588 of hdfio.cxx, the code attempts to access hdf_parts[k]->names[HDFGASTEMP].
HDFGASTEMP is defined to be 99 on line 53 of hdfitems.h. However, the names array is only 39 long and the field we actaully want to access is in position 8.
Setting HDFGASTEMP to 8 solves the immediate problem. However, that seems a bit suspicious as it means the array was not constructed using the named fields of hdfitems.h and could hence lead to other problems of the same kind down the line.

Haloes with Mstar (100 kpc apertures) > M200crit

1. The problem

Hi all,

In a few of my cosmological runs at z=0, VELOCIraptor produces objects with M_star (100 kpc) / M200crit > 1. Interestingly, these larger-than-unity ratios seem to only occur if the stellar mass aperture is set to 100 kpc. For 30-kpc apertures, I am finding no objects with M_star (30 kpc) / M200crit > 1

The problem can be best illustrated using stellar mass vs. halo mass plots. Below are two such plots, for the same run at z=0. The top one has 30 kpc apertures, while the bottom one is with 100 kpc apertures. I emphasise that in the bottom plot, there are objects above the one-to-one line.

Fig 1. Stellar mass vs halo mass. The stellar mass is computed in 30-kpc apertures

stellar_mass_halo_mass_M200_all_30
Fig 2. Same as Fig. 1 but for 100-kpc apertures
stellar_mass_halo_mass_M200_all_100

Below I am displaying one of the objects with M_star (100 kpc) / M200crit > 1 from the plot above. Interestinlgy, this object has R200crit = 2.4 kpc which is much smaller than R_aperture = 100 kpc . Can the latter explain M_star (100 kpc) / M200crit >> 1?

Fig 3. Dark-matter projected density of one of the object with M_star (100 kpc) / M200crit > 1 from Fig. 2

4_dm_faceon
Fig 4. Same as Fig. 3 but the colour traces stellar projected density

4_star_faceon

The fact that there are SMHM ratios > 1 indicates that there could be a bug in the code. If this is expected behaviour, it would still be great to know what causes these unrealistically high ratios, and also how exactly the 100-kpc-aperture stellar mass is computed in the case that an object has R200crit << 100 kpc.

Note that the dark-matter particle mass in my runs is 1.2 \times 10^6 M_\odot.

2. I am using

  • VR from ICRAR repository
  • Master branch
  • Version
commit 7e4683963354bcf2b2730d5f8576ac931f7da43d
Author: Rodrigo Tobar <[email protected]>
Date:   Sun Oct 25 20:49:02 2020 +0800

3. Reproduce this behaviour

All my data, which I used to make the plots above, can be found on cosma

  • Path to the SWIFT snapshot that I fed on VR
    /snap7/scratch/dp004/dc-chai1/my_cosmological_box/AGN5_L006188_00_CHIMES_NoESO_M16/colibre_2729.hdf5
  • Path to VR output
    /snap7/scratch/dp004/dcchai1/my_cosmological_box/AGN5_L006188_00_CHIMES_NoESO_M16/halo_2729.*
  • Path to VR config
    /cosma7/data/dp004/dc-chai1/vrconfig_m16.cfg

@MatthieuSchaller

Swift+VR on Cosma7 with new modules

Trying to compile Swift+VR on Cosma7 using the new modules. This is min example that I'm having trouble getting working.

module load intel_comp/2020-update2
module load intel_mpi/2020-update2
module load ucx/1.8.1
module load parmetis/4.0.3-64bit
module load parallel_hdf5/1.10.6
module load fftw/3.3.8cosma7
module load gsl/2.5

Configured VR using:

cmake -DCMAKE_BUILD_TYPE=Release -DVR_USE_SWIFT_INTERFACE=ON -DCMAKE_CXX_FLAGS="-fPIC" ..

Configured Swift using:

./configure --with-velociraptor=/vrdir/

Gives the error:

ipo: error #11021: unresolved gsl_multifit_nlinear_trs_lmaccel
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_trust
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_default_parameters
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_alloc
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_residual
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_position
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_init
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_driver
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_free
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: remark #11001: performing single-file optimizations
ipo: remark #11006: generating object file /tmp/ipo_icc22Mhex.o
icc: error #10014: problem during multi-file optimization compilation (code 1)
make[2]: *** [swift] Error 1
make[2]: *** Waiting for unfinished jobs....
ipo: error #11021: unresolved gsl_multifit_nlinear_trs_lmaccel
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_trust
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_default_parameters
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_alloc
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_residual
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_position
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_init
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_driver
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: error #11021: unresolved gsl_multifit_nlinear_free
Referenced in libvelociraptor.a(Fitting.cxx.o)
ipo: remark #11001: performing single-file optimizations
ipo: remark #11006: generating object file /tmp/ipo_icc2P8zjg.o
icc: error #10014: problem during multi-file optimization compilation (code 1)
make[2]: *** [swift_mpi] Error 1
make[2]: Leaving directory /cosma7/data/dp004/rttw52/swift_runs/runs/Sibelius/swiftsim/examples' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory /cosma7/data/dp004/rttw52/swift_runs/runs/Sibelius/swiftsim'
make: *** [all] Error 2

Incorrectly sized buffer given for MPI_Bcast reception

After HDF5 reading finishes, the names of the extra output fields are broadcasted from rank 0 to the rest of the MPI communicator. For example:

if (opt.gas_internalprop_names.size()>0){
for (auto &x:opt.gas_internalprop_output_names) {
size_t n = x.size();
char char_array[n+1];
strcpy(char_array, x.c_str());
MPI_Bcast(char_array, n+1, MPI_CHAR, 0, MPI_COMM_WORLD);
if (ThisTask > 0) x=string(char_array);
}
}

This code is repeated for each of the lists with extra fields.

This code is also incorrect: the reception buffer is sized (on the stack) using the local value size, but the value sent from rank 0 can be longer, leading to potential stack corruption and crashes.

This indeed happened while trying to verify that the fix for #88. For some reason when running with 8 ranks on a single node nothing happened, but when running on two nodes this broke:

Fatal error in PMPI_Bcast: Invalid buffer pointer, error stack:
PMPI_Bcast(2667).........: MPI_Bcast(buf=0x7ffd4b89a480, count=26, MPI_CHAR, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1804)....: fail failed
MPIR_Bcast(1832).........: fail failed
I_MPIR_Bcast_intra(2056).: Failure during collective
I_MPIR_Bcast_intra(2043).: fail failed
MPIR_Bcast_advanced(2135): fail failed
MPIR_Bcast_intra(1670)...: Failure during collective
MPIR_Bcast_intra(1638)...: fail failed
MPIR_Bcast_knomial(2338).: Failure during collective

Will not compile with -DNBODY_SINGLE_PRECISION=ON

I am unable to get VELOCIraptor to successfully compile on either NCI Gadi (and have verified on my MacBook Pro).

I am using

commit 7d185b065b8d8c6f58533bfb612a85e70e1edd84 (HEAD -> master, origin/master, origin/HEAD)
Author: Rodrigo Tobar <[email protected]>
Date:   Thu Oct 22 13:35:53 2020 +0800

On Gadi, I have the following modules loaded;

  1. pbs 6) intel-mkl/2020.0.166
  2. openmpi/4.0.2(default) 7) intel-compiler/2020.0.166
  3. fftw3/3.3.8 8) intel-tbb/2020.0.166
  4. hdf5/1.10.5p 9) parmetis/4.0.3-i8r8
  5. gsl/2.6 10) metis/5.1.0-i8r8

I am using the following on Gadi;

cmake ../ -DHDF5_C_LIBRARY_hdf5:FILEPATH="/apps/hdf5/1.10.5p/lib/ompi3/libhdf5_hl.a" -DVR_USE_SWIFT_INTERFACE:BOOL=ON -DCMAKE_CXX_FLAGS="-fPIC" -DCMAKE_BUILD_TYPE=Release -DNBODY_SINGLE_PRECISION=ON

This produces the following output;

> -- The CXX compiler identification is Intel 19.1.0.20191121
> -- The C compiler identification is Intel 19.1.0.20191121
> -- Check for working CXX compiler: /apps/intel-ct/wrapper/icpc
> -- Check for working CXX compiler: /apps/intel-ct/wrapper/icpc -- works
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Detecting CXX compile features
> -- Detecting CXX compile features - done
> -- Check for working C compiler: /apps/intel-ct/wrapper/icc
> -- Check for working C compiler: /apps/intel-ct/wrapper/icc -- works
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Found PkgConfig: /bin/pkg-config (found version "1.4.2") 
> -- Found GSL: /apps/gsl/2.6/include (found version "2.6") 
> -- Found MPI_C: /apps/openmpi-mofed4.7-pbs19.2/4.0.2/lib/libmpi.so (found version "3.1") 
> -- Found MPI_CXX: /apps/openmpi-mofed4.7-pbs19.2/4.0.2/lib/libmpi_cxx.so (found version "3.1") 
> -- Found MPI: TRUE (found version "3.1")  
> -- HDF5: Using hdf5 compiler wrapper to determine C configuration
> -- Found HDF5: /apps/hdf5/1.10.5p/lib/ompi3/libhdf5_hl.a;/apps/szip/2.1.1/lib/libsz.so;/usr/lib64/libz.so;/usr/lib64/libdl.so;/usr/lib64/libm.so (found version "1.10.5") found components:  C 
> -- Found OpenMP_CXX: -qopenmp (found version "5.0") 
> -- Found OpenMP: TRUE (found version "5.0") found components:  CXX 
> 
> NBodyLib successfully configured with the following settings:
> 
> Dependencies
> ------------
> 
>  OpenMP                                                                      Yes
> 
> Types
> -----
> 
>  All calculations/properties stored as float                                 Yes
>  All integeres are long int                                                  Yes
> 
> Particle data
> -------------
> 
>  Do not store mass, all particles are the same mass                          No
>  Use single precision to store positions, velocities, other props            No
>  Use unsigned particle PIDs                                                  No
>  Use unsigned particle IDs                                                   No
>  Activate gas                                                                No
>  Activate stars                                                              No
>  Activate black holes/sinks                                                  No
>  Activate extra dm properties                                                No
>  Extra input info stored                                                     No
>  Extra FOF info stored                                                       No
>  Large memory KDTree                                                         No
>  Particle compiled for SWIFT                                                 Yes
> 
> Compilation
> -----------
> 
>  Include directories: /apps/gsl/2.6/include;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/Analysis;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/Cosmology;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/InitCond;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/KDTree;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/Math;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/NBody
>  Macros defined: SINGLEPRECISION;LONGINT;SWIFTINTERFACE;HAVE_GSL22;USEOPENMP;USEOMP
>  Libs: /apps/gsl/2.6/lib/libgsl.so;/apps/gsl/2.6/lib/libgslcblas.so
>  C++ flags: -qopenmp
>  Link flags: -qopenmp
> 
> 
> VELOCIraptor successfully configured with the following settings:
> 
> File formats
> ------------
> 
>  HDF5                                                                        Yes
>  Compressed HDF5                                                             No
>  Parallel HDF5                                                               Yes
>  nchilada                                                                    No
> 
> Precision-specifics
> -------------------
> 
>  Long Integers                                                               Yes
> 
> OpenMP-specifics
> ----------------
> 
>  OpenMP support                                                              Yes
> 
> MPI-specifics
> -------------
> 
>  MPI support                                                                 Yes
>  Reduce MPI memory overhead at the cost of extra CPU cycles                  Yes
>  Use huge MPI domains                                                        No
> 
> Gadget
> ------
> 
>  Use longs IDs                                                               No
>  Use double precision pos and vel                                            No
>  Use single precision mass                                                   No
>  Use header type 2                                                           No
>  Use extra SPH information                                                   No
>  Use extra star information                                                  No
>  Use extra black hole information                                            No
> 
> Particle-specifics
> ------------------
> 
>  Activate gas (& associated physics, properties calculated)                  No
>  Activate stars (& associated physics, properties calculated)                No
>  Activate black holes (& associated physics, properties calculated)          No
>  Activate extra dark matter properties (& associated properties)             No
>  Mass not stored (for uniform N-Body sims, reduce mem footprint)             No
>  Large memory KDTree to handle > max 32-bit integer entries per tree         No
> 
> Simulation-specifics
> --------------------
> 
>  Used to run against simulations with a high resolution region               No
>  Build library for integration into SWIFT Sim code                           Yes
> 
> Others
> ------
> 
>  Calculate local density dist. only for particles in field objects           Yes
>  Like above, but use particles inside field objects only for calclation      No
> 
> Compilation
> -----------
> 
>  Include dirs: /home/571/cxp571/Codes/VELOCIraptor-STF/src;/apps/gsl/2.6/include;/apps/openmpi-mofed4.7-pbs19.2/4.0.2/include/openmpi;/apps/openmpi-mofed4.7-pbs19.2/4.0.2/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include;/apps/openmpi-mofed4.7-pbs19.2/4.0.2/include/openmpi/opal/mca/event/libevent2022/libevent;/apps/openmpi-mofed4.7-pbs19.2/4.0.2/include/openmpi/opal/mca/event/libevent2022/libevent/include;/apps/openmpi-mofed4.7-pbs19.2/4.0.2/include;/apps/hdf5/1.10.5p/include;/apps/szip/2.1.1/include;/apps/gsl/2.6/include;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/Analysis;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/Cosmology;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/InitCond;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/KDTree;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/Math;/home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/src/NBody
>  Macros defined: LONGINT;USEOPENMP;MPIREDUCEMEM;STRUCDEN;SWIFTINTERFACE;USEMPI;USEHDF;USEPARALLELHDF;SINGLEPRECISION;LONGINT;SWIFTINTERFACE;HAVE_GSL22;USEOPENMP;USEOMP
>  Libs: /apps/gsl/2.6/lib/libgsl.so;/apps/gsl/2.6/lib/libgslcblas.so;/apps/openmpi-mofed4.7-pbs19.2/4.0.2/lib/libmpi_cxx.so;/apps/openmpi-mofed4.7-pbs19.2/4.0.2/lib/libmpi.so;/apps/hdf5/1.10.5p/lib/ompi3/libhdf5_hl.a;/apps/szip/2.1.1/lib/libsz.so;/usr/lib64/libz.so;/usr/lib64/libdl.so;/usr/lib64/libm.so;/apps/gsl/2.6/lib/libgsl.so;/apps/gsl/2.6/lib/libgslcblas.so
>  C++ flags: -qopenmp -fPIC
>  Link flags: -qopenmp
> 
> -- Adding doc target for directories: /home/571/cxp571/Codes/VELOCIraptor-STF/NBodylib/doc;/home/571/cxp571/Codes/VELOCIraptor-STF/doc
> -- Configuring done
> -- Generating done
> -- Build files have been written to: /home/571/cxp571/Codes/VELOCIraptor-STF/build_swift
> 

This results in the following error during compilation;

[ 50%] Building CXX object src/CMakeFiles/velociraptor.dir/endianutils.cxx.o
In file included from /home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.cxx(5):
/home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.h(226): error: a value of type "double (*)(double)" cannot be assigned to an entity of type "Double_t={float} (*)(Double_t={float})"
      LittleDouble_t=DoubleNoSwap;
                    ^

In file included from /home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.cxx(5):
/home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.h(227): error: a value of type "double (*)(double)" cannot be assigned to an entity of type "Double_t={float} (*)(Double_t={float})"
      BigDouble_t=DoubleSwap;
                 ^

In file included from /home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.cxx(5):
/home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.h(268): error: a value of type "double (*)(double)" cannot be assigned to an entity of type "Double_t={float} (*)(Double_t={float})"
      LittleDouble_t=DoubleSwap;
                    ^

In file included from /home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.cxx(5):
/home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.h(269): error: a value of type "double (*)(double)" cannot be assigned to an entity of type "Double_t={float} (*)(Double_t={float})"
      BigDouble_t=DoubleNoSwap;
                 ^

compilation aborted for /home/571/cxp571/Codes/VELOCIraptor-STF/src/endianutils.cxx (code 2)
make[2]: *** [src/CMakeFiles/velociraptor.dir/build.make:102: src/CMakeFiles/velociraptor.dir/endianutils.cxx.o] Error 2
make[1]: *** [CMakeFiles/Makefile2:597: src/CMakeFiles/velociraptor.dir/all] Error 2
make: *** [Makefile:84: all] Error 2

I can get past this point if I edit out the lines 226, 277, and 268, 269 in endianutils.h;

    LittleDouble_t=DoubleNoSwap;
    BigDouble_t=DoubleSwap;

   LittleDouble_t=DoubleSwap;
   BigDouble_t=DoubleNoSwap;

As far as I can see, these aren't used anywhere else in the source code.

However, I then hit the issue,

[ 45%] Building CXX object src/CMakeFiles/velociraptor.dir/substructureproperties.cxx.o
/home/571/cxp571/Codes/VELOCIraptor-STF/src/substructureproperties.cxx(4886): error: cannot overload functions distinguished by return type alone
  double CalcGravitationalConstant(Options &opt) {
         ^

/home/571/cxp571/Codes/VELOCIraptor-STF/src/substructureproperties.cxx(4890): error: cannot overload functions distinguished by return type alone
  double CalcHubbleUnit(Options &opt) {
         ^

compilation aborted for /home/571/cxp571/Codes/VELOCIraptor-STF/src/substructureproperties.cxx (code 2)
make[2]: *** [src/CMakeFiles/velociraptor.dir/build.make:375: src/CMakeFiles/velociraptor.dir/substructureproperties.cxx.o] Error 2
make[1]: *** [CMakeFiles/Makefile2:597: src/CMakeFiles/velociraptor.dir/all] Error 2
make: *** [Makefile:84: all] Error 2

There is a mismatch between the definitions in substructureproperties.cxx and proto.h

proto.h:Double_t CalcGravitationalConstant(Options &opt);
substructureproperties.cxx:double CalcGravitationalConstant(Options &opt) {

proto.h:Double_t CalcHubbleUnit(Options &opt);
substructureproperties.cxx:double CalcHubbleUnit(Options &opt) {

and so I have made the following changes;

substructureproperties.cxx:Double_t CalcGravitationalConstant(Options &opt) {
substructureproperties.cxx:Double_t CalcHubbleUnit(Options &opt) {

The code now compiles.

Can you please verify that you see similar behaviour, and that these changes are valid?

SO list offsets are wrong/counterintuitive

Describe the bug
The offsets stored in the SO list output seem not consistent with the SO sizes in the same file. My (possibly wrong) expectation is that the particle IDs belonging to SO i are stored in pIDs[offset[i] : offset[i]+size[i]], but that is not the case.

To Reproduce
The problem can be best illustrated through the following Python snippet that can be applied to any HDF5 SO list output:

import h5py
SOfile = h5py.File("<RANDOM SO_LIST FILE.hdf5>", "r")
ofs = SOfile["Offset"][:]
siz = SOfile["SO_size"][:]
pIDs = SOfile["Particle_IDs"][:]
if not ofs[-1]+siz[-1] == pIDs.shape[0]:
  print("Wrong final size ({0}=/={1})!".format(ofs[-1]+siz[-1], pIDs.shape[0]))
for i in range(1,ofs.shape[0]):
  if not ofs[i-1]+siz[i-1] == ofs[i]:
    print("Wrong offset ({0} {1}, {2} {3})!".format(ofs[i-1], siz[i-1], ofs[i], siz[i]))
    exit(1)

This will produce two error messages. The first one because the final offset is not separated from the end of the particle ID list by the size of the final SO, and the second because the next offset does not match the previous offset plus the previous size.

Expected behavior
ofs[i] == ofs[i-1]+siz[i-1]
Put differently, I would expect ofs to be equivalent to

ofs = numpy.cumsum(siz)
ofs[1:] = ofs[:-1]
ofs[0] = 0

Log files
Not applicable.

Environment (please complete the following information):
Irrelevant for this problem.

Additional context
I think the problem is situated in io.cxx:1450, where the value of SOpids[0] seems to be (incorrectly) omitted from the loop.

Compilation without HYDRO but with GAS broken

Describe the bug
Latest master. Configure with VR_USE_GAS but not VR_USE_HYDRO.

Error

In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/stf.h:8,
                 from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/buildandsortarrays.cxx:5:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
 2630 |   aperture_M_gas_highT[i]*=opt.h;
      |   ^~~~~~~~~~~~~~~~~~~~
      |   aperture_Z_gas
In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/stf.h:8,
                 from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/haloproperties.cxx:7:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
 2630 |   aperture_M_gas_highT[i]*=opt.h;
      |   ^~~~~~~~~~~~~~~~~~~~
      |   aperture_Z_gas
In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.cxx:5:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
 2630 |   aperture_M_gas_highT[i]*=opt.h;
      |   ^~~~~~~~~~~~~~~~~~~~
      |   aperture_Z_gas
In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/logging.h:7,
                 from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/bgfield.cxx:7:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
 2630 |   aperture_M_gas_highT[i]*=opt.h;
      |   ^~~~~~~~~~~~~~~~~~~~
      |   aperture_Z_gas
In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/logging.h:7,
                 from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/hdfio.cxx:26:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
 2630 |   aperture_M_gas_highT[i]*=opt.h;
      |   ^~~~~~~~~~~~~~~~~~~~
      |   aperture_Z_gas
In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/stf.h:8,
                 from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/fofalgo.cxx:5:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
 2630 |   aperture_M_gas_highT[i]*=opt.h;
      |   ^~~~~~~~~~~~~~~~~~~~
      |   aperture_Z_gas
In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/stf.h:8,
                 from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/io.cxx:7:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
 2630 |   aperture_M_gas_highT[i]*=opt.h;
      |   ^~~~~~~~~~~~~~~~~~~~
      |   aperture_Z_gas
In file included from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/stf.h:8,
                 from /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/gadgetio.cxx:7:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h: In member function ‘void PropData::ConverttoComove(Options&)’:
/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/allvars.h:2630:3: error: ‘aperture_M_gas_highT’ was not declared in this scope; did you mean ‘aperture_Z_gas’?
 2630 |   aperture_M_gas_highT[i]*=opt.h;
      |   ^~~~~~~~~~~~~~~~~~~~

Should be easy to fix; most likely an incorrect #ifdef choice in these files when treating the newly added quantities from #57.

Can't write more than 2GB HDF5 datasets in parallel

When using the inputs and configuration from #87, and after running with the fix for the original issue, VR crashes with the following problem:

[0000] [1123.397] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
[0001] [1123.397] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
[0003] [1123.403] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
[0004] [1123.395] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
[0005] [1123.395] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
[0006] [1123.403] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
[0007] [1123.403] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
[0002] [1123.409] [ info] io.cxx:1292 Saving SO particle lists to lala.catalog_SOlist.0
HDF5-DIAG: Error detected in HDF5 (1.8.20) MPI-process 2:
  #000: H5Dio.c line 322 in H5Dwrite(): can't prepare for writing data
    major: Dataset
    minor: Write failed
  #001: H5Dio.c line 403 in H5D__pre_write(): can't write data
    major: Dataset
    minor: Write failed
  #002: H5Dio.c line 846 in H5D__write(): can't write data
    major: Dataset
    minor: Write failed
  #003: H5Dmpio.c line 527 in H5D__contig_collective_write(): couldn't finish shared collective MPI-IO
    major: Low-level I/O
    minor: Write failed
  #004: H5Dmpio.c line 1397 in H5D__inter_collective_io(): couldn't finish collective MPI-IO
    major: Low-level I/O
    minor: Can't get value
  #005: H5Dmpio.c line 1441 in H5D__final_collective_io(): optimized write failed
    major: Dataset
    minor: Write failed
  #006: H5Dmpio.c line 295 in H5D__mpio_select_write(): can't finish collective parallel write
    major: Low-level I/O
    minor: Write failed
  #007: H5Fio.c line 169 in H5F_block_write(): write through metadata accumulator failed
    major: Low-level I/O
    minor: Write failed
  #008: H5Faccum.c line 823 in H5F__accum_write(): file write failed
    major: Low-level I/O
    minor: Write failed
  #009: H5FDint.c line 254 in H5FD_write(): addr overflow, addr = 572724, size=18446744071687627232, eoa=12286439660
    major: Invalid arguments to routine
    minor: Address overflowed
HDF5-DIAG: Error detected in HDF5 (1.8.20) MPI-process 7:
  #000: H5Dio.c line 322 in H5Dwrite(): can't prepare for writing data
    major: Dataset
    minor: Write failed
  #001: H5Dio.c line 403 in H5D__pre_write(): can't write data
    major: Dataset
    minor: Write failed
  #002: H5Dio.c line 846 in H5D__write(): can't write data
    major: Dataset
    minor: Write failed
  #003: H5Dmpio.c line 527 in H5D__contig_collective_write(): couldn't finish shared collective MPI-IO
    major: Low-level I/O
    minor: Write failed
  #004: H5Dmpio.c line 1397 in H5D__inter_collective_io(): couldn't finish collective MPI-IO
    major: Low-level I/O
    minor: Can't get value
  #005: H5Dmpio.c line 1441 in H5D__final_collective_io(): optimized write failed
    major: Dataset
    minor: Write failed
  #006: H5Dmpio.c line 295 in H5D__mpio_select_write(): can't finish collective parallel write
    major: Low-level I/O
    minor: Write failed
  #007: H5Fio.c line 169 in H5F_block_write(): write through metadata accumulator failed
    major: Low-level I/O
    minor: Write failed
  #008: H5Faccum.c line 823 in H5F__accum_write(): file write failed
    major: Low-level I/O
    minor: Write failed
  #009: H5FDint.c line 254 in H5FD_write(): addr overflow, addr = 572724, size=18446744072925879424, eoa=12286439660
    major: Invalid arguments to routine
    minor: Address overflowed
Failed to write dataset: Particle_IDsFailed to write dataset: Particle_IDs

application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 7

DMO breaks on substructure search

Describe the bug
I am trying to run stf stand-alone on a DMO snapshot, using the zoom configuration. The process fails with the error

0 Beginning substructure search
Error, net size 0 with row,col=0,0
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_default_append
Aborted

The same behavior appears also when running stf on the fly with SWIFT. Note: SWIFT itself completed the run successfully and the snapshots do not deem to be corrupted in any apparent way (checked for existing datasets and datasets shapes).

To Reproduce
Steps to reproduce the behavior:

  1. Go to /cosma/home/dp004/dc-alta2/data7/xl-zooms/dmo/L0300N0564_VR93 on Cosma 7.
  2. Load the following modules:
module purge
module load intel_comp/2020-update2
module load intel_mpi/2020-update2
module load ucx/1.8.1
module load parmetis/4.0.3-64bit
module load parallel_hdf5/1.10.6
module load fftw/3.3.8cosma7
module load gsl/2.5
  1. Run VR as
../VELOCIraptor-STF/stf -I 2 -i snapshots/L0300N0564_VR93_0199 -o L0300N0564_VR93_0199 -C config/vr_config_zoom_dmo.cfg
  1. See error
0 Beginning substructure search
Error, net size 0 with row,col=0,0
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_default_append
Aborted

Expected behavior
Given the arguments parsed, expected to generate the usual output files in the pwd (e.g. the L0300N0564_VR93_0199.properties file).

Log files
Logs can be displayed to console, but they are also available in the $(pwd)/stf directory.

Environment (please complete the following information):

  • VR version: fresh installation (yesterday) from the master branch, compiled and run with the following modules:
  • Libraries:
module load intel_comp/2020-update2
module load intel_mpi/2020-update2
module load ucx/1.8.1
module load parmetis/4.0.3-64bit
module load parallel_hdf5/1.10.6
module load fftw/3.3.8cosma7
module load gsl/2.5

**Additional context**
I also tried running with higher verbosity in the `.cfg` file, but no further info is shown.

Thanks in advance for your help!

Memory segfault in GetSOMasses()

Describe the bug
This is the next step in the leak-finding exercise. Now running the stand-alone code on larger boxes to identify problems in the sub-structure search.

I get a segfault when running the code with -O0 -fsanitize=address using GCC 10.

To Reproduce
Steps to reproduce the behavior:

  1. Code: Latest master
  2. Config: /cosma7/data/dp004/jlvc76/SWIFT/master/swiftsim/examples/EAGLE_DMO_low_z/EAGLE_DMO_50/vrconfig_3dfof_subhalos_SO_hydro.cfg (i.e. our standard EAGLE setup but with hydro switched off)
  3. command line: stf -C vrconfig_3dfof_subhalos_SO_hydro.cfg -i /cosma7/data/dp004/jlvc76/SWIFT/master/swiftsim/examples/EAGLE_ICs/EAGLE_25/eagle_0036 -o haloes -I 2 (The input is a very standard box)

** Crash **
We get this error:

==244350==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000030 (pc 0x00000068f8fa bp 0x7fff6ff33a00 sp 0x7fff6ff32e80 T0)
==244350==The signal is caused by a READ memory access.
==244350==Hint: address points to the zero page.
    #0 0x68f8fa in GetSOMasses(Options&, long long, NBody::Particle*, long long, long long*&, PropData*&) /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/substructureproperties.cxx:3583
    #1 0x6a1709 in SortAccordingtoBindingEnergy(Options&, long long, NBody::Particle*, long long, long long*&, long long*, PropData*, long long) /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/substructureproperties.cxx:5100
    #2 0x4731bd in main /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/main.cxx:530
    #3 0x7f8ea5371554 in __libc_start_main (/lib64/libc.so.6+0x22554)
    #4 0x46e7c8  (/cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/build/stf+0x46e7c8)

If I configure with OpenMP, I get a crash at basically the same place:

==72314==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000050 (pc 0x0000006f8328 bp 0x7fe48e4ebdd0 sp 0x7fe48e4eb5e0 T105)
==72314==The signal is caused by a READ memory access.
==72314==Hint: address points to the zero page.
AddressSanitizer:DEADLYSIGNAL
    #0 0x6f8328 in GetSOMasses(Options&, long long, NBody::Particle*, long long, long long*&, PropData*&) [clone ._omp_fn.1] /cosma7/data/dp004/jlvc76/VELOCIraptor/VELOCIraptor-STF/src/substructureproperties.cxx:3583
    #1 0x7fe62f2e6a05 in gomp_thread_start ../../../libgomp/team.c:123
    #2 0x7fe62eea0ea4 in start_thread (/lib64/libpthread.so.0+0x7ea4)
    #3 0x7fe62ebc996c in clone (/lib64/libc.so.6+0xfe96c)

I don't know what to make of this quite yet. If I switch of the sanitizer, the code runs happily. It may hence be a false-positive but cleaning this might help run through and identify proper leaks.
What is also interesting is that it's happening in a section of code related to our good friend the SO_xxx properties (#62).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.