abacusorg / abacusutils Goto Github PK

View Code? Open in Web Editor NEW

14.0 7.0 13.0 31.48 MB

Python code to interface with halo catalogs and other Abacus N-body data products

Home Page: https://abacusutils.readthedocs.io

License: GNU General Public License v3.0

Python 92.60% Makefile 0.09% C 2.14% Shell 0.28% Haskell 3.24% Fortran 0.52% HTML 1.13%

n-body cosmology large-scale-structure abacus

abacusutils's Introduction

abacusutils

abacusutils is a package for reading and manipulating data products from the Abacus N-body project. In particular, these utilities are intended for use with the AbacusSummit suite of simulations. The package focuses on the Python 3 API, but there is also a language-agnostic Unix pipe interface to some of the functionality.

These interfaces are documented here: https://abacusutils.readthedocs.io

Press the GitHub "Watch" button in the top right and select "Custom->Releases" to be notified about bug fixes and new features!

Installation

The Python abacusutils package is hosted on PyPI and can be installed by installing "abacusutils":

pip install abacusutils

pip install abacusutils[all]

For more information, see https://abacusutils.readthedocs.io/en/latest/installation.html.

Usage

abacusutils has multiple interfaces, summarized here and at https://abacusutils.readthedocs.io/en/latest/usage.html.

Specific examples of how to use abacusutils to work with AbacusSummit data will soon be given at the AbacusSummit website: https://abacussummit.readthedocs.io

Python

The abacusutils PyPI package contains a Python package called abacusnbody. This is the name to import (not abacusutils, which is just the name of the PyPI package). For example, to import the compaso_halo_catalog module, use

import abacusnbody.data.compaso_halo_catalog

Unix Pipes

The pipe_asdf Python script reads columns from ASDF files and pipes them to stdout. For example:

    $ pipe_asdf halo_info_000.asdf -f N -f x_com | ./client

abacusutils's People

Contributors

Stargazers

Watchers

Forkers

nam8 boryanah sandyyuan sownakbose mattkwiecien adematti marcomolion gpetter heisenbergsdog epaillas sbouchard01 pburger112 ahnyu

abacusutils's Issues

Problem while trying to run the short example of AbacusHOD

Hi,

I'm running AbacusHOD through the new BinderHub.

First, I tried to run the first part of the process, running the prepare_sim code for z=0.500.

The first time, it took a few hours to reach slab number 33, producing two output files:
halos_xcom_32_seed600_abacushod_oldfenv_new.h5
particles_xcom_32_seed600_abacushod_oldfenv_new.h5

Next time, slab 31 and:
halos_xcom_30_seed600_abacushod_oldfenv_new.h5
particles_xcom_30_seed600_abacushod_oldfenv_new.h5

I also repeated for z = 0.200 and 0.100.

Now, when I run the short example, I receive this error:

FileNotFoundError: [Errno 2] Unable to synchronously open file (unable to open file: name = '.../output/subsamples/AbacusSummit_base_c000_ph000/z0.100/halos_xcom_0_seed600_abacushod_oldfenv_new.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Also, it creates empty folders in the output directory for galaxies.
.../output/galalxies/AbacusSummit_base_c000_ph000/z0.500

Add tutorials and reformat docs

To support tutorials and more readable docs, we should probably migrate to an Executable Book template and start writing notebook-style tutorials there.

Fix periodicity in theory-box HOD code

From @boryanah in #28 (comment), there appears to be at least one place where we use non-periodic KDTree where we should be using periodic KDTree. Let's fix that and double-check all the KDTree usage.

Parallel_Numpy_Rng not found

Hello,

I am trying to use AbacusHOD but keep running into problems. I am following the examples provided here:

https://abacusutils.readthedocs.io/en/latest/hod.html

but when I run the command:

from abacusutils.abacusnbody.hod.abacus_hod import AbacusHOD

it tells me there is no module named 'parallel_numpy_rng'. I've tried looking online for resources regarding this package but haven't been able to find any. Where is this module originally from (i.e. where is it getting called from) and what may be some ways to fix this issue? I understand what I am writing is pretty vague, so I will be happy to elaborate further if necessary

Best,
Weston

Better testing of ZCV module

In #130, it came up that we're not testing the ZCV code path that's in the tutorial. @boryanah, maybe you could take a look at this?

missing info in metadata module

c021 and c022: They are missing because they only exist as highbase boxes. Should be easy to fix.
power spectra: would need to compress, limit the k-range, or reduce the k-resolution
A_s: missing from the abacus params, so need to read from CLASS.ini

Thanks to @adematti for the report on item 1!

Sample AbacusHOD: index out of range

Hello again,

I am still working on getting AbacusHOD to work and I am stuck on the example where we construct an AbacusHOD object. When I run the provided cell, I get an error when the code tries to run the 'newBall' line. When I run this, the error message it gives me is:

~/.local/lib/python3.6/site-packages/abacusnbody/hod/abacus_hod.py in staging(self)
351 halo_info_fns =
352 list((sim_dir / simname / 'halos' / ('z%4.3f'%self.z_mock) / 'halo_info').glob('*.asdf'))
--> 353 f = asdf.open(halo_info_fns[0], lazy_load=True, copy_arrays=False)
354 header = f['header']
355

IndexError: list index out of range

I opened abacus_hod.py to check the variables being called and that led me to abacus.yaml. Currently the path to my asdf files is:

AbacusSummit_base_c000_ph000/halos/z0.000/halo_info

And the beginning of my abacus_hod.yaml file looks like:

Simulation parameters

sim_params:
sim_name: 'AbacusSummit_base_c000_ph000'
sim_dir: 'AbacusSummit_base_c000_ph000/halos/z0.100/halo_info/'
output_dir: 'AbacusSummit_base_c000_ph000/mock_summit/'
subsample_dir: 'AbacusSummit_base_c000_ph000/cleaned_summit/'
z_mock: 0.1
cleaned_halos: True

I've tried other variations of this but I still get the same error message. At this point I am unsure whether the error is due to defining these variables incorrectly or if it's due to something different. As always, please let me know what further information you need in order to help

Thank you in advance,
Weston

More descriptive error message when missing files

If the user requests subsample files that are not on disk, provide a more informative error message than "File not found". E.g. load_subsamples=True, but no field particles are on disk---a common situation!

Unpack cleaned subsamples directly into subsample table

Currently, we build a concatenated table of all cleaned particles, reindex it, merge it with the original subsamples, then do the RVint unpacking on the whole table. We may be able to achieve better performance by never constructing the concatenated cleaned particle table and instead do the RVint unpacking directly into the final location in the combined particle table.

add function to read multiple particle files

Currently, we expose a function to read a single particle file (read_abacus.read_asdf()). We should add a higher-level function that can read multiple files into a single table.

Basically, we want a smarter version of this snippet:

from pathlib import Path
from abacusnbody.data import read_abacus
import astropy.table
allp = []
for fn in Path('AbacusSummit_small_c000_ph3000/halos/z1.100/').glob(*_rv_*/*.asdf'):
    allp += [read_abacus.read_asdf(fn, load=['pos'])]
allp = astropy.table.vstack(allp)

Several features and bug fixes for AbacusHOD and light cones

Features:

output in the HOD light cone galaxy files the RA, DEC and CZ (also Zobs and Zreal) for each galaxy (or make that an option?)
when doing redshift space distortions on the light cone, currently it's taking the redshift of the entire halo catalog (which is more or less the median of that of all halos in it), but it would be more accurate to use the redshift_interp field (not sure if worth it and whether it would make AbacusHOD more cumbersome)
output in all the HOD galaxy files whether they were produced using light cones or cubic box
add scripts for applying arbitrary mangle masks to fits and dat AbacusHOD outputs and generating randoms
add scripts for computing convergence and shear maps.

Possible bug fix:

the particle subsampling in AbacusHOD is (I believe) resolution-dependent (also, note that currently only subsampleA particles are used; does this raise any issues with lower-mass samples?) -- need to fix that for using high and huge boxes.

Documentation:

add documentation and reference to the abacus_lc_cat repo

load_subsamples syntax

The "command string" syntax for the CompaSO load_subsamples is not intuitive. Should just let that argument accept a dict like dict(A=True, B=False, halo=True, field=False, pos=True, vel=False).

implement power spectrum tests against nbodykit

@boryanah added power spectrum computation routines in #65 (thanks!). We should add tests against nbodykit to the repo. Since #65 is closed, we can discuss and track progress here.

Responding to #65 (comment):

The code already looks fairly modular to me, but I agree the Pk stuff could go in its own file. And we could probably even make a new abacusnbody/clustering/ directory for the CF and Pk files, since they're in the hod dir right now but are useful beyond just HOD.

For the tests, you can just make test_power.py where the rest of the tests are. The mini sim has halos, particle subsamples, and particle slices; you can use any of these as the input data set.

test_power.py will look like:

from pathlib import Path

import pytest
from astropy.table import Table
import numpy as np

from common import check_close

curdir = Path(__file__).parent
refdir = curdir / 'ref_power'
EXAMPLE_SIM = curdir / 'Mini_N64_L32'
NBODYKIT_POWER = refdir / 'nbodykit_power.csv'

def test_power():
   '''Compare power spectrum against the saved nbodykit result
'''
    # TODO: this is all pseudo-ish code
    ref_power = Table.read(NBODYKIT_POWER)

    from abacusnbody.hod.power import compute_power

    pos = # load your dataset
    pk_paraks = { } # deltak, etc

    abacus_power = compute_power(pos, **pk_params)

    assert check_close(abacus_power['k'], ref_power['k'])
    for i in range(n_mubins):
         assert check_close(ref_power[i]['power'], abacus_power[i]['power'])

You can just run your nbodykit script locally and then save the result in a CSV file in the repo. That way the CI doesn't have to install/run nbodykit. But you can save it as tests/generate_nbodykit_power.py so someone else can run it in the future if needed.

I agree, it's probably worth matching the binnings between the codes. I usually find that checking the number of modes in each bin is the best way to confirm that the binnings are the same. In theory, the results should match very closely!

As you point out in the comments, there's opportunities for optimization/parallelization that could be done in the future, so it's important to get the tests in place before that work!

prepare_sim.main isssues

Hi all,

I am trying to use AbacusHOD, and to do so I must first run:

python -m abacusnbody.hod.prepare_sim --path2config /script/hod/config/abacus_hod.yaml

However when I do so, I get an error saying:

PermissionError: [Errno 13] Permission denied: '/mnt/marvin1'

I don't understand where this error is coming from or how to fix it. Do I have the correct .yaml file or it a different one I am supposed to use? I've also tried uninstalling and reinstalling abacusutils to see if that could help, but that doesn't seem to help. As always, if more info is need to address this issue, I will be happy to provide it.

Best,
Weston

Adding fields to the meta module

We could report the superslab number in the meta module

Fix numba parallel segfault in `compute_Menv`

In #45, we found that using Numba parallel was causing a segfault in compute_Menv and calc_fenv in prepare_sim. It doesn't appear to be a problem with the functions themselves, but rather something at the library level. Probably related to using Numba parallel inside multiprocessing (even though we are using the nominally fork-safe workqueue backend).

Debugging this is probably a heavy lift, so for now we've just disabled Numba parallel for those two functions.

Add changelog checker

We should add a changelog bot/GitHub Action that gives us a green checkmark when we add a changelog entry with the right PR number.

CompaSO reader needs to handle origin output which contains some extra information available to the user

The value of the origin field in the LC catalogs tells both from which copy of the box the halo came (0, 1, 2) and also whether it was interpolated (i.e. there was available merger tree info for it) or whether its location was taken as is (which might suggest that the fields pos,vel_avg are more valuable than pos,vel_interp). Need to decide way to output this info to the user.

Multi-threaded file IO for loading halo catalogs

On network file systems, we can probably get better IO performance in CompaSOHaloCatalog by reading multiple files in parallel. Hopefully the time-intensive parts of the IO release the GIL, so we could just spin up Python-level threads and use a queue to pass the loaded arrays back to the main thread.

Note that each IO thread will additionally spin off 1 to 4 Blosc threads to do the decompression.

A and B subsample differentiation when reading rv lightcone files with read_asdf()

Currently when I read the rv light cone outputs with read_asdf I get no error messages but also can't seem to be able to check whether I am loading A/B or both (might be some options I haven't considered)

Memory requirement for prepare_sim

Hi all,

I was wondering if any of you have successfully generated lightcones subsamples with prepare_sim on Perlmutter for z >= 0.8. The lower redshifts work fine, but for z = 0.8 the memory requirement hits the ~500 GB limit of a Perlmutter CPU node and the code chokes on this step

compiling compaso halo catalogs into subsampled catalogs
processing slab  0
loading halo catalog 
total number of halos,  40906121 keeping  9132292
masked randoms =  12.467233937923373
Building and querying trees for mass env calculation

My configuration file uses

prepare_sim:
    Nparallel_load: 1

(but I don't think this helps since we are procesing a single slab anyway).

Is there any workaround for this? I guess one could try decreasing the number of randoms for the environment calculation but this is already low relative to the number of haloes, so I don't know how safe that would be...

Cheers,
Enrique

Missing key in halo_data.

I'm trying to run AbacusHOD on the new "DESI2" simulation that Lehman has generated (Abacus_DESI2_c000_ph300). During construction of the AbacusHOD object it throws a missing key error:

newBall = AbacusHOD(sim_params,HOD_params,clustering_params)
File "/global/homes/m/mwhite/.conda/envs/abacus/lib/python3.9/site-packages/abacusnbody/hod/abacus_hod.py", line 153, in init
np.vstack((np.log10(self.halo_data['hmass']), self.halo_data['hdeltac'], self.halo_data['hfenv']))
.T,
KeyError: 'hdeltac'

where

sim_params: {'sim_name': 'Abacus_DESI2_c000_ph300', 'sim_dir': '/global/cfs/cdirs/desi/cosmosim/Abacus/', 'output_dir': './', 'subsample_dir': './', 'z_mock': 2.5, 'cleaned_halos': True}

Editing logM_cut in abacus_hod.yaml

Hi All,

I am trying to run the HOD module with an edited abacus_hod.yaml file; specifically with an edited LRG logM_cut value. So far I've tried directly editing the logM_cut value within the abacus_hod.yaml file, but when I generate correlation functions with the outputted .hdf5 files, the plot it makes is identical to the one plotted using the default logM_cut value. Are there more files I need to edit to generate modified .hdf5 files or is this an issue with my correlation function generation process?

Add support for OSX

It seems that os.sched_getaffinitycalled in the enclosed link is not supported by OSX. Or at least it doesn't work on my laptop.

I am running Python 3.8.8

abacusutils/abacusnbody/data/compaso_halo_catalog.py

Line 296 in 16d9728

 DEFAULT_BLOSC_THREADS = max(1, min(len(os.sched_getaffinity(0)), DEFAULT_BLOSC_THREADS)) 

Some issues from NERSC

A few issues and suggestions from my tests of AbacusHOD at NERSC. This was with v0.4.0, so possibly slightly out of date.

Some functions in ZCV broken after power spectrum code changes

 line 1145, in apply_zcv_xi
    r_binc, binned_poles_zcv, Npoles = pk_to_xi(asdf.open(power_cv_tr_fn)['data']['P_k3D_tr_tr_zcv'], self.lbox, r
_bins, poles=config['power_params']['poles'])
  File "/global/homes/m/mwhite/.conda/envs/abacus/lib/python3.9/site-packages/abacusnbody/analysis/power_spectrum.py", line 595, in pk_to_xi
    _, _, binned_poles, Npoles, r_avg = bin_kmu(nmesh, Lbox, r_bins, muedges=muedges, weights=Xi, poles=poles, space='real')
TypeError: some keyword arguments unexpected

adopt superslab terminology

We want to refer to the CompaSO files as "superslabs", describing the concatenation of multiple slabs. Right now the code uses "chunk" in a few places; let's change that to "superslab" to be consistent with our papers.

Metadata in mock catalog files?

From #28 (comment):

One small thing I noticed while looking at the tests is that the ELG.dat and LRG.dat files don't flag that they came from halo light cones. In fact, they don't contain any simulation or cosmology information, just the HOD parameter information (Acen, Asat, etc). I've kind of lost track of what file format we're using to distribute AbacusSummit mocks, but we might want to check that we are echoing the simulation/cosmology information to those files. Maybe we're already doing that though, just not in the tests; @SandyYuan, do you know?

Running error when setting want_AB = False

I just got an error when running the prepare_sim.py with setting want_AB=False.
Here is the error:

It seems a problem with the variable 'deltac_rank' in this case.

Improvements to the power spectrum and zcv module

The power spectrum code can be optimized and made to run faster. While for now we do not want to enable MPI on it, it is worth implementing multithreading/parallelization improvements. Several quick things include (but we don't need to limit ourselves to these):

Swapping scipy.fft with pyfftw which is multithreaded.
Painting the TSC/CIC grid can be done using multithreading straightforwardly.
For more robust performance (i.e. avoiding numerical roundoff errors), we can do the power spectrum binning in integers of the wavemodes rather than using floats.

On the side of the ZCV module:

Currently the zenbu calls are very slow, so there is room to improve those.
Also, it would be great to rewrite the parts that use Joe DeRose's code and add comments to them (i.e. tools_jdr.py)

Finally, it would be helpful to add notebooks with examples of how to use the power spectrum and the ZCV module.

Change v_L2com to vel_interp in LC HOD script

interpolated velocity is the more correct quantity to use in prepare_sim.py

Factorize dependencies

From cosmodesi/cosmodesiconda#1, if abacusutils is being folded into the DESI software stack, we ought to think about our dependencies and make sure they're all required. Furthermore, might separate them into multiple categories: the base dependencies that one will need if importing the code, and "extras" that one will need to run the examples/tests/scripts.

Trouble import abacusnbody

Hi there,

I am trying to import abacusnbody into jupyter-notebook but am having troubles. When I run the command:

from abacusnbody.data.compaso_halo_catalog import CompaSOHaloCatalog"

I get an error message saying:

No module named 'abacusnbody.version'

In the error message I see it's trying to go to the abacusnbody directory where int.py is (which I 've checked I have) yet it won't install. I am very much not tech savvy, so this may be an easy fix. If so, I apologize for the waste of time but will still appreciate the help.

Best,
Weston

abacusorg / abacusutils Goto Github PK

abacusutils's Introduction

abacusutils

Installation

Usage

Python

Unix Pipes

abacusutils's People

Contributors

Stargazers

Watchers

Forkers

abacusutils's Issues

Simulation parameters

Recommend Projects

Recommend Topics

Recommend Org