Git Product home page Git Product logo

davidt3 / daxa Goto Github PK

View Code? Open in Web Editor NEW
10.0 1.0 0.0 6.6 MB

Democratising Archival X-ray Astronomy (DAXA) is an easy-to-use Python module for downloading multi-mission X-ray telescope data and processing it into usable archives. Users can acquire entire archives, or filter observations based on ID/positions/time. Supports XMM; partial support eROSITA, Chandra, NuSTAR, Swift, Suzaku, ASCA, ROSAT, INTEGRAL

License: BSD 3-Clause "New" or "Revised" License

Python 98.72% TeX 1.28%
astronomy astrophysics python x-ray-astronomy xmm chandra erosita xga archival-astronomy nustar

daxa's Introduction

Documentation Status

What is Democratising Archival X-ray Astronomy (DAXA)?

DAXA is a Python module designed to make the acquisition and processing of archives of X-ray astronomy data as painless as possible. It provides a consistent interface to the downloading and cleaning processes of each telescope, allowing the user to easily create multi-mission X-ray archives, allowing for the community to make better use of archival X-ray data. This process can be as simple or as in-depth as the user requires; if the default settings are used then data can be acquired and processed into an archive in only a few lines of code.

As the missions (i.e. telescopes) that should be included in the archive are defined, the user can filter the desired observations based on a unique identifier (i.e. observation ID), on whether observations are near to a coordinate (or set of coordinates), and the time frame in which the observations were taken. As such it is possible to very quickly identify what archival data might be available for a set of objects you wish to study. It is also possible to place no filters on the desired observations, and as such process every observation available for a set of missions.

Documentation is available on ReadTheDocs, and can be found here, or accessed by clicking on the documentation build status at the top of the README. The source for the documentation can be found in the 'docs' directory in this repository.

Installing DAXA

We strongly recommend that you make use of Python virtual environments, or (even better) Conda/Mamba virtual environments when installing DAXA.

DAXA is available on the popular Python Package Index (PyPI), and can be installed like this:

pip install daxa

You can also fetch the current working version from the git repository, and install it (this method has replaced 'python setup.py install'):

git clone https://github.com/DavidT3/DAXA
cd DAXA
python -m pip install .

Alternatively you could use the 'editable' option (this has replaced running setup.py and passing 'develop') so that any changes you pull from the remote repository are reflected without having to reinstall DAXA.

git clone https://github.com/DavidT3/DAXA
cd DAXA
python -m pip install --editable .

Which missions are supported?

DAXA is still in a relatively early stage of development, and as such the support for local re-processing is limited; however, support for the acquisition and use of pre-processed data is implemented for a wide selection of telescopes:

  • XMM-Newton Pointed
  • eROSITA Commissioning
  • eROSITA All-Sky Survey DR1 (German Half)
  • [Under Development - data acquisition implemented] NuSTAR
  • [Under Development - data acquisition implemented] Chandra
  • [Under Development - RASS/pointed data acquisition implemented] ROSAT
  • [Under Development - XRT/BAT/UVOT data acquisition implemented] Swift
  • [Under Development - data acquisition implemented] Suzaku
  • [Under Development - data acquisition implemented] ASCA
  • [Under Development - data acquisition implemented] INTEGRAL

If you would like to help with any of the telescopes above, or adding another X-ray telescope, please get in contact!

Required telescope-specific software

DAXA makes significant use of existing processing software released by the telescope teams, and as such there are some specific non-Python dependencies that need to be installed if that mission is to be included in a DAXA generated archive.

An alternative to installing the dependencies yourself

[Under Development] - A docker image containing relevant telescope-specific software is being created. The built image will be released on DockerHub (or some other convenient platform), and the actual dockerfile used for building the image will also be released for anyone to use/modify. The dockerfile is heavily inspired by/based off of the HEASoft docker image.

XMM-Newton

Science Analysis System (SAS) - v14 or higher

Analysing the processed archives

Once an archive of cleaned X-ray data has been created, it can be analysed in all the standard ways, however you may also wish to consider X-ray: Generate and Analyse (XGA), a companion module to DAXA.

XGA is also completely open source, and is a generalised tool for the analysis of X-ray emission from astrophysical sources. The software operates on a 'source based' paradigm, where the user declares sources or samples of objects which are analogous to astrophysical sources in the sky, with XGA determining which data (if any) are relevant to a particular source, and providing a powerful (but easy to use) interface for the generation and analysis of data products. The module is fully documented, with tutorials and API documentation available (support for telescopes other than XMM is still under development).

Problems and Questions

If you encounter a bug, or would like to make a feature request, please use the GitHub issues page, it really helps to keep track of everything.

However, if you have further questions, or just want to make doubly sure I notice the issue, feel free to send me an email at [email protected]

daxa's People

Contributors

davidt3 avatar dependabot[bot] avatar guptaagr avatar jessicapilling avatar tobywallage avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

daxa's Issues

Should design a DAXA-specific cleaning process at some point

This would be mission-agnostic, and ideally support any of the telescopes which DAXA ends up being able to reduce data for. This would be an alternative to the mission-specific methods I am implementing first (i.e. the SAS cleaning methods for XMM).

Add the basic structure to the documentation

Set up the general structure, with an installation guide, intro section, contact section etc.

Don't need to make it perfect for this issue, or write any tutorials, but sketch out the framework.

Setup convenience functions to easily set up Archives in particular circumstances

I.e. the simplest could be 'process all available observations from XMM', or 'process all available observations from XMM, Chandra, and eROSITA' (once support for other telescopes is added).

That would provide an archive instance which could be passed into processing functions, both the telescope-specific processing which is generally provided by a particular telescope's software suite, or the planned mission-agnostic processing that I will eventually add to DAXA (issue #17).

Other examples of convenience functions like this could be ones that would assemble an archive from multiple telescopes for observations relevant to some particular sources.

Add combined sky-coverage calculation capabilities

This should both be able to assess how much of the sky is covered by a particular set of data, but also produce coverage maps which can be stored alongside the processed datasets to allow for the identification of data relevant to a particular source.

PN non-imaging mode sub-exposures

As mentioned in issue #34, DAXA cannot currently parse XMM ODF summary files. As such it is difficult to efficiently identify which exposures are in which observing mode, and other details about them.

Currently only PN imaging mode data will be processed by DAXA (though hopefully that will change at some point), but as issue #34 is not implemented, and I don't want to be reading headers for thousands of fits files if I can avoid it, epchain will currently attempt to process every sub-exposure as an imaging mode observation.

This will cause an error for other data modes (like timing for instance):

At the moment I am just going to let them fail, and then catch them further down the line, rather than identifying them a priori and never running the commands in the first place

XMM data weirdness

This isn't really a question to me, but just XMM in general.

On XSA the observation 0001730401 shows only RGS data being available - the quality report just indicates RGS as well, no EPIC data.

However when I acquire the ODFs I find unscheduled PN observations and scheduled MOS observations - so what gives??

This is being left here mostly as a reminder to myself to try and solve this mystery

The documentation is not building on RTD

I will add more information as I explore the issue, but every build of the DAXA documentation on read the docs has failed thus far.

I think its a dependancy versioning problem.

Add anomalous CCD state checking for MOS

I am basically following the eSAS guide at this point, but checking for CCDs in anomalous states is going to be a good idea.

This should enable filtering based on what states the user considers acceptable as well.

The choice of acceptable states should of course be recorded for the archive.

Implement a wrapper for the eSAS espfilter soft-proton filter function

Again following the example of the XMM eSAS manual, I will be using espfilter to find bad time intervals with high levels of soft proton flaring courtesy of the Sun.

In the currently released version of eSAS there is a script called PN-FILTER (and an equivalent MOS implementation) that called espfilter, but the upcoming version of eSAS (per the unreleased manual I found) has removed it so that eSAS adds functions rather than processing scripts to SAS.

I will be attempting to make DAXA compatible with as many versions of SAS/eSAS as possible (whilst remaining consistent) by not using PN-FILTER, and making a espfilter function for DAXA that supports PN and MOS.

Downloading specific instruments for XMM currently downloads then deletes irrelevant

I intended the downloading of specific instruments to minimise disk usage/bandwidth usage by not downloading data that a user considers to be irrelevant to their use-case/can't (yet) be processed by DAXA. Unfortunately for XMM, the downloading of ODF (observation data files) using the AIO URLs (and thus the AstroQuery interface) for specific instruments is currently impossible, as regardless of the specified instrument, all instrument ODFs are downloaded.

This is happening on the XSA end, and I've sent in a ticket asking if this is an intended behaviour, however whatever the answer ends up being I have to deal with it for the time being. As such the XMMPointed class download behaviour will acquire all instrument data for a given observation.

Then (assuming that this doesn't break any pre-built data processing tasks downstream) it will delete those ODF files which relate to instruments that have NOT been selected by the user.

Add previous-process awareness to XMM processing tasks

Need to ensure that things are run in the right order - for instance cif_build must be run before everything, odf_ingest must be run before basically everything else etc.

Currently just rely on the user doing that, but that won't be a permanent state of affairs - I'll make use of the process_success property of Archive to a) check if dependencies have been run, and if they were successful.

Ensure that a new CCF is created if a different analysis date is used

In the case where ccfs already exist but cif_build is run again with a different analysis date set make sure they are overwritten. The date information should be stored somewhere as well.

This will be integrated into the backend datebase I suspect, in some way that I have yet to figure out.

If a CCF is re-created, then presumably reduction should be re-run to be completely valid?

cleaned_evt_lists fails for 0099280101 because of emanom and calclosed

Currently no checks are performed at any stage to identify what the filter value of a particular sub-exposure of an observation is, and as such everything is blindly thrown into emanom (if the user chooses to run it). This method will fail for any CalClosed filter data, which then carries through to cleaned_evt_lists because DAXA is trying to create cleaned versions of those evt lists as well and expects there to be an emanom log file, even though CalClosed is not useful observation data for us.

Add an Archive class

Instances of the Archive class will be capable of storing and accessing multiple missions, and will probably be the most user-facing class of this module. They will contain a bunch of convenience methods, and probably the planned sky coverage generation capabilities (issue #18).

Start work on the DAXA paper for JOSS

It won't need to be very long, and some liberties can be taken in terms of writing about features that don't exist yet, because this won't be going on arXiv until they do exist.

SAS v21's (upcoming not released) eSAS implementation is quite different from previous versions

I am currently implementing an eSAS-based XMM processing method, and have accidentally found the eSAS v21 manual indexed on google. It indicates that many of the eSAS tools have had their inputs changed considerably to better resemble normal SAS functions (i.e. they'll take arguments to point to specific event lists etc.)

This is obviously great, more control is better, but it does mean that there will be a significant difference in behaviour. As I do not want to lock people into one specific version of SAS if it can be avoided (especially considering that version isn't even out yet) I will have to build two different approaches (though within the same Python function) for SAS v21 and any lower SAS version (though I don't think I will allow any SAS version below v14).

Hopefully won't be too difficult, considering I already identify the installed SAS version in the find_sas function, it'll just be some extra work.

XMM scheduled and unscheduled observations

I want to ensure that any unscheduled (with U in their exposure identifier rather than S) PN observations are processed by epchain, but its not clear to me whether that is True by default.

You can set the 'schedule' flag in epchain to S or U, but it only triggers if odfaccess=odf rather than oal, which is not explained...

To be honest, exactly what an 'unscheduled' observation is isn't really explained either.

Failed to find or open the following file: (ffopen) toto.in.mos[1]

This is happening during emchain runs, and at another point in the stderr output there is 'sh: lcurve: command not found'

I suspect they might be connected.

lcurve is a part of xronos section of HEASoft, which I may not have selected for my laptop install of HEASoft. This could help me learn which parts of HEASoft are actually required for SAS to work in its entirety.

My ICER install of HEASoft is the whole thing, so I can test running emchain on there to see if the same problem pops up.

I should normalise how DAXA calls emchain and epchain as much as possible

Currently emchain will loop through all available sub-exposures, including unscheduled observations, without any extra intervention. As such the processing of an entire ObsID-MOSX set of data happens as one process.

As epchain has to have the sub-exposures manually specified, each sub-exposure of each observation is processed separately. As such it gets its own success/log/error entry in the Archive records - considerably more granular.

I think I should change emchain's behaviour in DAXA so it is more comparable to how epchain behaves. I can address separate sub-exposures by themselves in emchain (using the exposure argument) - this will also make it easier to check that a particular process for a particular sub-exposure did work when it comes to looking for anomalous CCD states in MOS observations.

NOTE ON ExceptionGroup IN JUPYTER NOTEBOOK

Very limited parts of DAXA use a new Python feature (introduced in 3.11, backported by exceptiongroup module), that allows me to raise a set of exceptions together.

Specifically this is used when Python errors occur during the parallel tasks that run command line SAS tools (and possibly other telescope specific command line tools in the future) - to be clear Python errors shouldn't happen in those parallelised tasks, but if they do an ExceptionGroup is used.

It seems that at the moment (this is true on my setup on the date this issue was created) that Jupyter notebooks do not show the tracebacks properly for ExceptionGroup. For instance in the notebook a test raised ExceptionGroup gives this traceback:

ExceptionGroup: pythony errors (3 sub-exceptions)

Whereas in a script run from terminal this is what you get (and should get):

  • Exception Group Traceback (most recent call last):
    | File "/Users/dt237/code/test_daxa/testo.py", line 12, in
    | success, errors, outs = cif_build(arch)
    | ^^^^^^^^^^^^^^^
    | File "/Users/dt237/code/DAXA/daxa/process/xmm/_common.py", line 209, in wrapper
    | raise ExceptionGroup("pythony errors", python_errors)
    | ExceptionGroup: pythony errors (3 sub-exceptions)
    +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    | File "/opt/anaconda3/envs/daxa_dev/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    | result = (True, func(*args, **kwds))
    | ^^^^^^^^^^^^^^^^^^^
    | File "/Users/dt237/code/DAXA/daxa/process/xmm/_common.py", line 89, in execute_cmd
    | print(boi)
    | ^^^
    | NameError: name 'boi' is not defined
    +---------------- 2 ----------------
    | Traceback (most recent call last):
    | File "/opt/anaconda3/envs/daxa_dev/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    | result = (True, func(*args, **kwds))
    | ^^^^^^^^^^^^^^^^^^^
    | File "/Users/dt237/code/DAXA/daxa/process/xmm/_common.py", line 89, in execute_cmd
    | print(boi)
    | ^^^
    | NameError: name 'boi' is not defined
    +---------------- 3 ----------------
    | Traceback (most recent call last):
    | File "/opt/anaconda3/envs/daxa_dev/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    | result = (True, func(*args, **kwds))
    | ^^^^^^^^^^^^^^^^^^^
    | File "/Users/dt237/code/DAXA/daxa/process/xmm/_common.py", line 89, in execute_cmd
    | print(boi)
    | ^^^
    | NameError: name 'boi' is not defined
    +------------------------------------

So just be aware of that!

Small-window mode PN processing errors

When running epchain on small-window mode without some extra configuration, it will throw errors when it finds that most of the CCD IME files are missing (small window just uses one). These errors aren't fatal to the epchain process, but they do contaminate the stderr output which DAXA parses to try to find any truly fatal errors.

As such we should identify which CCDs are available a priori and pass that list to the ccds parameter of epchain. Ideally this will eventually be done by parsing the SAS summary file (issue #34), but for now I think I can just search through files in the ODF directory.

Support the acquisition and reduction of proprietary data

What it says on the tin really. For XMM for instance you need to provide a login and password, and I'll also have to make sure that the proprietary data belonging to a particular user are marked as usable in the fetch_obs_info method, as currently all proprietary observations are marked as unusable.

Process logging storage keys

Currently the logs, errors, processed errors, and warnings are stored in archives under either an ObsID or an ObsID+instrument+sub exposure ID combo.

This is somewhat tantamount to what the docstrings in the Archive class say, as they state either an ObsID or an ObsID+Instrument key combo.

I should consider having lower level instrument and then sub-exposure dictionaries to store the results/logs in, rather than ObsID+instrument+exposure ID. I intend to implement some sort of lookup method that can grab all results for an ObsID, or a specific ObsID instrument combo, and that would probably be easier with more distinct layers of dictionaries.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.