Git Product home page Git Product logo

lst-agn-analysis's Introduction

LST AGN Analysis

This project tries to handle all the steps needed for a typical AGN-analysis (or any point-source really):

  • Select runs observing the source and matching some quality criteria
  • Apply models on dl1 files / Produce dl2
  • Calculate IRFs
  • Produce dl3
  • Run the gammapy data reduction to dl4
  • Perform spectral fit, calculate flux points, calculate light-curve, ...
  • Perform multiple gammapy analyses with the same dl3

What it does NOT handle:

  • Low-level calibrations. We start at DL1
  • Non standard MC. We use the standard all-sky
  • Perform a 3D analysis
  • Perform any "non-standard" high-level analysis (pulsars etc)

Conceptually there are three main steps (You could define more, but these are still somewhat split in the workflow):

  • Select runs via the datacheck files
  • Link runs and mc to the build directory. This makes rules easier to reason about. These are only links, not copies (!) and show up as such using ls -l.
  • Run the analysis using lstchain and gammapy

Usage

Prerequisites

You need configuration files. Check our example repository for how to use them, and clone them next to this repo, e.g.

git clone https://github.com/nbiederbeck/lst-analysis-config.git ../lst-analysis-config

You need snakemake and astropy installed. If you do not have that, you can create an enviroment with only these packages like this:

mamba env create -f workflow/envs/snakemake.yml
mamba activate snakemake

For development, you should also install atleast pre-commit. The enviroment in workflow/envs/data-selection should contain everything (Don't forget to pre-commit install afterwards). This is not a specific development enviroment though. Cleaning up the envs is definetively a DEV-TODO as it got a bit messy after merging what was originally two projects.

Also you need the source catalogue. Since this requires credentials (standard LST ones), it is not done automatically. This is needed only once (or whenever you want to use new runs). You can use this command (replace <username> and <password>):

curl --user <username>:<password> \
    https://lst1.iac.es/datacheck/lstosa/LST_source_catalog.html \
    -o runlist.html

Config

Adapt to your liking

  • Data selection: ../lst-analysis-config/data-selection.json
  • MCs to use, lstchain env and number of off regions: ../lst-analysis-config/lst_agn.json (gammapy does not handle energy-dependent cuts automatically, so we need to work around this)
  • IRFs (lstchain): ../lst-analysis-config/irf_tool_config.json
  • gammapy: analysis.yaml and models.yaml in subdirectories ../lst-analysis-config/analysis_*. These all use the same dl3, but produce their own dl4

If you want to use configs from a different place, instead call make using make CONFIG_DIR=/path/to/configs.

Run the analysis

Basically only run make. If you run it without a specific target, everything should be resolved. In the event, that you give a target such as make build/plots/analysis-one/flux_points.pdf, the linking part might not run again. This is clearly suboptimal, but for now you will need to run make link first IF YOU CHANGED SOMETHING in the data selection. Since in this case, one probably wants to rerun everything anyway, this was not super high on our priority. It is related to #26

Local Usage

If you have run snakemake on the cluster, you can create the plots and tex files locally (using your own matplotlibrc for example). We separate the calculation of metrics and the plotting to make sure you can finetune plots later on without needing to run the expensive steps on the local machine. The tables for that are saved as either fits.gz or h5.

For the data-selection plots, you need to download build/dl1-datacheck-masked.h5, e.g.:

mkdir -p build
scp <host>:<path-to>/lst-data-selection/build/dl1-datachecks-masked.h5 build/

Afterwards:

make -f local.mk

DEV-TODO: Do the same for the other plots (#29) If you do some cp **/*.fits.gz shenanigans, beware that the dl3 files are saved with the extension fits.gz as well.

lst-agn-analysis's People

Contributors

lukasnickel avatar nbiederbeck avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

lukasnickel

lst-agn-analysis's Issues

Run multiple sources at once / in one repository

Do we want another level in the configuration, so that the analysis runs all configured sources at once?
Also we should move the existing configs, so that one can put their own ones without removing the existing ones and facing git issues on rebases.

Select declination line

It always uses the mrk declination right now.
Probably should just be set in the config, not sure how you would automatically select that

Think about missing features

Once I have this running for M87, I will make a list of things, that I had in my own analysis and that are not implemented here.

IRFs are linked *and* calculated

Both is pretty quick, but maybe it's worth skipping linking. Currently the workflow does no longer use the linked ones, so that is just noise polluting the filesystem.

Remove Makefile

I would like to have a solution that is completely independent of the Makefile (except as an entry point for snakemake).

I don't know whether this is feasible.

I think I'm ok with telling the user to issue two commands, e.g.:

snakemake --snakefile workflow/data-selection.smk
snakemake --snakefile workflow/analysis.smk

Make unlinking more robust

There is a small gotcha about symlinks:
If the link points to the wrong destination or the actual file was moved/deleted, pathlib will still recognize it as a symlink, but the .exists() relates to the destination and will return false.

We check if link.exists() and link.is_symlink() at multiple places to unlink which breaks in that case.

Mostly annoying for development, when you screw up the path to the mcs sometimes

DL2 OOM issues

The lstchain dl2 script uses enormous amounts of memory due to loading all models and data into ram at once. Nothing we can really do about that here. We had oom issues even with 64G due to the larger models on the crab declination line, so we probably need to increase even further.
Might slow things down on the slurm side, but this is really a (known) lstchain problem.

Rules broken

MissingInputException in rule dl2 in line 277 of /fefs/aswg/workspace/lukas.nickel/lst-agn-analysis/workflow/Snakefile:
Missing input files for rule dl2:
    output: build/dl2/dl2_LST-1.Run03218.h5
    wildcards: run_id=03218
    affected files:
        build/dl1/dl1_LST-1.Run03218.h5
        build/models/model_Run03218/lstchain_config.json

There is one model hierarchy in here

Standardise some things / Clean up

  • run id formatting is annoying (:05d)
  • plotting should be in one snakefile in order to fix the ruleorders and so on (we have a lightly different structure now)
  • configs, envs and other "shared variables" in the rules should be collected in one place
  • consistent logging and use of log files. Probably we dont need explicit ones, since snakemake handles that? At least using slurm (?)

Pydantic user error

Apparently we use now deprecated syntax in the data selection config code as I get this error:

pydantic.errors.PydanticUserError: If you use `@root_validator` with pre=False (the default) you MUST specify `skip_on_failure=True`. Note that `@root_validator` is deprecated and should be replaced with `@model_validator`.

Need to either limit the version or adapt to the new syntax

Stack datasets with identical IRFs

Regardless of the stack/no-stack discussion for analyses can we already now stack datasets that have identical IRFs, as there is no information loss in that step.

For Mrk421 we have 164 runs and 12 different IRFs, and I assume that number stays the same even when using more runs.

Since we still do observation plots etc, we might introduce another output level? E.g. dl4/{runs,stacked_per_irf,stacked} or something better.

Edit: If we do this, in the pipeline we need to keep in mind that we stacked somewhere. E.g. bayesian blocks should be done on un-stacked

Edit 2: also not really useful after #65

Handle non-detection

Right now some plots scripts fail if there is no flux.
This always annoyed me with gammapy plots...
Should make sure there are non-nans in the data to avoid failing jobs.
Not a huge deal as they get called at the end, but still...

calculate cuts based on sigma intervals

From the other repo:

currently we use the cuts that are configured. I want the option to use the configured sigma intervals instead.

I think that the best way to do this is have None or floats to disable/enable the cuts or intervals.

Split into "on cluster" and "local".

There are things that have to be done on the cluster where all the data lies.

Other things, e.g., plots, can then be created from reduced files. This makes it simpler to include in other projects.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.