Git Product home page Git Product logo

pwv_kpno's Introduction


               release python license Build Status Coverage Status arxiv

pwv_kpno is a Python package for modeling the atmospheric absorption due to H2O. For a full description of this package, including documentation and usage examples, please see https://mwvgroup.github.io/pwv_kpno/

pwv_kpno's People

Contributors

djperrefort avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pwv_kpno's Issues

Convenience function to return PWV for array of MJD[, airmass]

Is your feature request related to a problem? Please describe.
I'd like to use pwv_kpno to match to a specific observation. I generally have observations in MJD.

Describe the solution you'd like
I'd like to be able to retrieve an array of PWV values from an array of MJDs for a given site. If an array of airmass values of the same size as MJD is given, then apply those airmass corrections.

Describe alternatives you've considered
This can currently be done by converting the list of MJDs to datetime objects.

from astropy.time import Time
times = Time(mjd_obs, format='mjd')
datetimes = times.to_datetime()

pwv_values = [pwv_atm.pwv_date(dt) for dt in datetimes]

But that's a little awkward and ends up with a list of tuples instead of a table. It would be nice to not have to convert externally. And it would be nice to be able to just pass the array instead of looping.

Small typo in the docs

In Installation and Setup, section 1.2, there's an extra settings in line 8 of the example code:

 >>> from pwv_kpno.package_settings import settings
 >>>
 >>> sites_to_backup = settings.available_sites
 >>> general_path = './site_backups/{}.ecsv'
 >>> for site in sites_to_backup:
 >>>     settings.set_site(site)
 >>>     out_path = general_path.format(site)
 >>>     settings.settings.export_site_config(out_path)

which returns AttributeError: 'Settings' object has no attribute 'settings'. Just thought I'd let you know! :)

Compare atm models to technical and engineering night data

The modeled transmission function returned by transmission should be compared against results previously found using a standard star. This different then comparison via equivalent widths, but the differences between the results may be insightful.

Include secondary weather data when fitting for PWV

pwv_kpno currently uses a linear model to primary PWV calues from secondary receivers, but does not consider the inclusion of any non-pwv data (e.g., surface pressure, RH, and temperature). We should explore how including this data can improve our predictions.

Action items:

  • Perform a PCA to understand what parameters from our data have the biggest impact
  • Try a few different fitting techniques / models (multivariate regression, SVR, etc.)
  • If feeling ambitious, explore some additional machine learning options

Clarify PWV_los vs PWV_effective (change definition of PWV_los)

Describe the bug
The PWV returned by, e.g., pwv_date already has the airmass saturation applied. It should not.

In practice, I've just been calling pwv_date with no explict airmass, so it's just an airmass of 1, and this hasn't been an issue. But reading the code just now at

https://github.com/mwvgroup/pwv_kpno/blob/master/pwv_kpno/pwv_atm.py#L194

I see that it is applying the Wade+Horne correction for effective transparency.

But the line-of-sight PWV should just be mm of water vapor.

The fact that the transparency effect of this scales as airmass**0.6 because of the saturating lines is a separate issue. One could refer to this as something else. The effective PWV_los (pwv_los_eff?).

Separately I think Wade+Horne were incorrect to say that it's PWV * (X**0.6). It should be (PWV_los)**0.6 = (PWV * X)**0.6

Expected behavior
I expect to get mm of PWV back along the line of sight, not already scaled for the fraction of saturating lines.

Memory usage grows to 100s of GB when calculating 100,000s of dates with `pwv_date`

Describe the bug
When calculating large numbers of dates, the memory used grows without bound.
It was at 100s of GB when my laptop crawled to a halt, filled up disk with swap space.

Running more carefully and monitoring I see that it quickly builds up to 10s of GB.

To Reproduce
The following snippet, run at N=1000 builds to 3.7 GB of memory; (N=100, mem=200 MB; N=2000, mem=7.8 GB)

import os
import numpy as np
from pwv_kpno import pwv_atm

N = 1000
mjd_obs = np.linspace(57500, 58200, N)
pwv, pwv_err = pwv_atm.pwv_date(mjd_obs, format='mjd')

Expected behavior
Expectation is that memory usage should grow to no more than a few hundred MB
even when calculating 600,000 rows.

Version
v 1.2.0 as downloaded through pip. Python 3.7. numpy 1.18.1.

Struggling with pwv_kpno.pwv_atm.trans_for_pwv()

I'm having trouble retrieving a transmission function for a new site I configured. Here's exactly what I've done, following along with the documentation:

>>> from pwv_kpno.package_settings import ConfigBuilder, settings
>>> from pwv_kpno import pwv_atm
>>> from datetime import datetime
>>> import pytz
>>> 
>>> new_config = ConfigBuilder(
                           site_name = "apache_point",
                           primary_rec="P027",
                           sup_rec = []
                          )
>>> new_config.save_to_ecsv('./apache_point.ecsv')
>>> settings.import_site_config('./apache_point.ecsv')
>>> 
>>> settings.set_site('apache_point')
>>> 
>>> pwv_atm.update_models(years=[2018])
>>> 
>>> obsv_date = datetime(year=2018, month=5, day=15)
>>> # this works fine and returns a table as expected:
>>> pwv_atm.measured_pwv(year=2018, month=5, day=15)
>>> # here's where it breaks:
>>> pwv_atm.trans_for_date(date=obsv_date, airmass=1.5)

which returns the error message:

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-21-3ee644195537> in <module>()
      4 obsv_date = datetime(year=2018, month=5, day=15)
      5 # pwv_atm.measured_pwv(year=2018, month=5, day=15)
----> 6 pwv_atm.trans_for_date(date=obsv_date, airmass=1.5)

//anaconda/lib/python3.5/site-packages/pwv_kpno/pwv_atm.py in trans_for_date(date, airmass, bins)
    525     """
    526 
--> 527     return _trans_for_date(date, airmass, bins)
    528 
    529 

//anaconda/lib/python3.5/site-packages/pwv_kpno/pwv_atm.py in _trans_for_date(date, airmass, bins, test_model)
    503     """
    504 
--> 505     pwv, pwv_err = _pwv_date(date, airmass, test_model)
    506     return trans_for_pwv(pwv, pwv_err, bins)
    507 

//anaconda/lib/python3.5/site-packages/pwv_kpno/pwv_atm.py in _pwv_date(date, airmass, test_model)
    186 
    187     if test_model is None:
--> 188         pwv_model = Table.read(settings._pwv_modeled_path)
    189 
    190     else:

//anaconda/lib/python3.5/site-packages/astropy/table/table.py in read(cls, *args, **kwargs)
   2548         # RST table and inserts at the end of the docstring.  DO NOT REMOVE.
   2549 
-> 2550         out = io_registry.read(cls, *args, **kwargs)
   2551         # For some readers (e.g., ascii.ecsv), the returned `out` class is not
   2552         # guaranteed to be the same as the desired output `cls`.  If so,

//anaconda/lib/python3.5/site-packages/astropy/io/registry.py in read(cls, format, *args, **kwargs)
    500                     try:
    501                         ctx = get_readable_fileobj(args[0], encoding='binary')
--> 502                         fileobj = ctx.__enter__()
    503                     except OSError:
    504                         raise

//anaconda/lib/python3.5/contextlib.py in __enter__(self)
     57     def __enter__(self):
     58         try:
---> 59             return next(self.gen)
     60         except StopIteration:
     61             raise RuntimeError("generator didn't yield") from None

//anaconda/lib/python3.5/site-packages/astropy/utils/data.py in get_readable_fileobj(name_or_obj, encoding, cache, show_progress, remote_timeout)
    191                 name_or_obj, cache=cache, show_progress=show_progress,
    192                 timeout=remote_timeout)
--> 193         fileobj = io.FileIO(name_or_obj, 'r')
    194         if is_url and not cache:
    195             delete_fds.append(fileobj)

FileNotFoundError: [Errno 2] No such file or directory: '/anaconda/lib/python3.5/site-packages/pwv_kpno/site_data/apache_point/modeled_pwv.csv'

Going to ~/site-packages/pwv_kpno/site_data/apache_point/, I see there are three files:
atm_model.csv config.json measured_pwv.csv

Is there something else I have to run to get modeled_pwv.csv? On a related note, is it possible to model the transmission with the measured pwv instead of the modeled?

Handling years without Kitt Peak data

The PWV at Kitt Peak for years where the KITT receiver was not functioning is currently given by an average over the other receivers. This needs to be changed to a model relating the readout of the off site receivers to the receiver at Kitt Peak.

Finding average PWV between arrays

The PWV data for each GPS receiver is stored in structured numpy arrays with column names 'date' and 'pwv'. There is a need to average the PWV values in these arrays per date. Currently the average is found by looping over all the dates for which there is measured data. An alternative using numpy methods would most likely be more efficient.

The current method for averaging is as follows:

# example data
import numpy as np
list_of_arrays = [np.array([('date1', 1), ("date2", 2)],
                           dtype=[('date', (np.str_, 16)), ('pwv', float)]),

                  np.array([('date1', 1.5), ("date3", 3.5)],
                           dtype=[('date', (np.str_, 16)), ('pwv', float)])
                 ]

# the following code should be improved
combined_data = []
for date in ['date1', 'date2', 'date3']:
    pwv_vals = []
    for array in list_of_arrays:
        index = np.where(array['date'] == date)[0]
        if len(index):
            pwv_vals.append(array['pwv'][index[0]])

    if pwv_vals:
        avg = np.mean(pwv_vals)
    combined_data.append((date, avg))

which yields the result

[('date1', 1.25), ('date2', 2.0), ('date3', 3.5)]

Can this be written without the use of nested for loops?

Creating fit functions takes too long

The most time consuming part of modeling the PWV is padding the data (see the pad_data and get_padded_data functions in predicting_pwv.ipynb). An alternative method should be considered for combining the data arrays. One possibility is to retain the date time information in the data array for each site.

Compatability issue with Python 2.7

Commit 9f8ac6a required users to specify timezone information when using the 'transmission' function. This was accomplished by using datetime.timezone which wasn't implemented until Python 3. Using pytz as an alternative should restore compatibility with Python 2.7.

Check PWV Fit Functions

The fit functions in predicting_pwv.ipynb relate the PWV between offsite receivers and the Kitt Peak receiver. Each fit function has a slope of 1 and an intercept of zero. These parameters do not make physical sense. The model_pwv function needs to be checked to make sure it works correctly.

Unequal wavelengths in atm models

The atmospheric models generated by generate_atm_mosels.py list wavelengths in unequal increments. The reason why should be investigated as a check for possible coding mistakes.

Improve interface for handling multiple sites by transitioning to OOD

Switching between sites and managing multiple receivers is clunky and doesn't allow multiple collections of sites to be considered simultaneously. Part of this is due to a preference of functional over OO design when the package was written.

Proposed changes:

It would be more intuitive to have a GPSReceiver-like object that can be instantiated with a list of receivers and data cuts. The data management functions can be trivially moved into a class, but the modeling might require some changes since models are currently cached to file. One possibility is to lazy-fit the model when required and then cache the parameters in memory.

Tests under "TestSuomiNetDataDownload" fail on windows 10

Tests for downloading data from SuomiNet in version 0.10.0 pass on OS 10.11 and 10.12 but fail in Windows 10. Trying to connect to SuomiNet raises a series of HTTP errors that ultimately end with a 508 - infinite loop. Interestingly, the function being tested works as intended when run normally (not called as part of the test suite).

To recreate run python setup.py tests.

Check for nearby PWV measurements when interpolating models

The transmission function uses scipy.interpolate to estimate the PWV level at Kitt Peak from the PWV model. However, it does not check if the model has PWV values near the user specified datetime. If there are no values in the PWV model within a preset time interval, an error (or at least a warning) should be thrown.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.