Git Product home page Git Product logo

rascal's Introduction

Rascal: RANSAC Assisted Spectral CALibration

Python package Coverage Status Readthedocs Status PyPI version Downloads DOI Code style: black

Rascal is a library for automated spectrometer wavelength calibration. It has been designed primarily for astrophysics applications, but should be usable with spectra captured from any similar spectrometer.

Given a set of peaks located in your spectrum, Rascal will attempt to determine a model for your spectrometer to convert between pixels and wavelengths.

Unlike other calibration methods, rascal does not require you to manually select lines in your spectrum. Ideally you should know approximate parameters about your system, namely:

  • What arc lamp was used (e.g. Xe, Hg, Ar, CuNeAr)
  • What the dispersion of your spectrometer is (i.e. angstroms/pixel)
  • The spectral range of your system, and the starting wavelength

You don't need to know the dispersion and start wavelength exactly. Often this information is provided by the observatory, but if you don't know it, you can take a rough guess. The closer you are to the actual system settings, the more likely it is that Rascal will be able to solve the calibration. Blind calibration, where no parameters are known, is possible but challenging currently. If you don't know the lamp, you can try iterating over the various combinations of sources. Generally when you do get a correct fit, with most astronomical instruments the errors will be extremely low.

More background information can be referred to this arXiv article.

Dependencies

  • python >= 3.7
  • numpy>=1.16,<1.24
  • scipy>=1.3.3
  • pynverse>=0.1.4
  • matplotlib>=3.0.3
  • tqdm>=4.48.0

Optional Dependencies

Installation

Instructions can be found here.

Reporting issues/feature requests

Please use the issue tracker to report any issues or support questions.

Getting started

The quickstart guide will show you how to reduce the example dataset.

Contributing Code/Documentation

If you are interested in contributing code to the project, thank you! For those unfamiliar with the process of contributing to an open-source project, you may want to read through Github’s own short informational section on how to submit a contribution or send me a message.

Style -- we now use black for formatting, you can easily set this up using a pre-commit hook.

pip install pre-commit
pre-commit install

Disclaimer

We duplicate some of the relevant metadata, but we do not process the raw metadata. Some of the metadata this software creates contain full path to the files in your system, which most likely includes a user name on your machine. Please be advised it is your responsibility to be compliant with the privacy law(s) that you are oblidged to follow, and it is your responsibility to remove any metadata that may reveal personal information and/or provide information that can reveal any computing vulunerability.

rascal's People

Contributors

cylammarco avatar jveitchmichaelis avatar focusjosh avatar

Stargazers

Charlie Warren avatar wolfi3 avatar  avatar Hans Moritz Günther avatar Guy Freeman avatar Yuming Fu avatar  avatar  avatar Allan Laal avatar Paul Ross McWhirter avatar James Tocknell avatar  avatar

Watchers

James Cloos avatar  avatar

rascal's Issues

Add a test set of example spectra with known arc line/peak correspondences.

We should have a small set of example spectra from appropriate instruments in a reduced form (e.g. 1D signals). Along with these signals we should have a list of:

  • Extracted peak locations
  • Associated arc line wavelengths
  • Sky lines if visible
  • An actual "gold" fit made by a human (e.g. 4D polynomial coefficients)

`plot_fit()` failed to generate diagnostic plot

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\rascal\calibrator.py in plot_fit(self, fit_coeff, spectrum, tolerance, plot_atlas, log_spectrum, savefig, filename, json, renderer)
2244 ax1.plot(wave, spectrum)
2245 ax1.vlines(self.polyval(self.peaks, fit_coeff),
-> 2246 spectrum[self.pix_to_rawpix(self.peaks).astype('int')],
2247 vline_max,
2248 linestyles='dashed',
TypeError: only integer scalar arrays can be converted to a scalar index

Using the same line from a line list multiple times

I found a behaviour in the arc fitting that I had not expected. It is a little counter intuitive but may be a natural consequence of the way that rascal works. I should stress that I was using RASCAL via ASPIRED (https://github.com/cylammarco/ASPIRED), so it is possible I should file this query on that repository instead? Let me know if you want me to move it. Looking at the ASPIRED code, the wrapper to the Calibrator.fit() seems fairly thin so I am just guessing the issue more likely relates to here.

In ASPIRED I use
initialise_calibrator()
set_hough_properties()
do_hough_transform()
fit()

The attached plots are output from aspired.OneDSpec.fit() which I think is calling rascal.Calibrator.fit().

A single line in the arc line list can get used twice in the final fit for two different peaks in the arc spectrum. It appears that if two peaks fall within the specified tolerances it uses both. Naively I would normally expect a fit to match the arc line to the peak that fits best and ignore the other so that any one line from the line list only ever gets used once. What it does in practice is create two points in the final output fit. One has a residual very close to zero (the correct match) and the other has a large residual (an erroneously matched, nearby arc feature).

In the attached examples:

  • There are arc peaks detected at 4520 and 4495. Both have been matched to the 4521A feature in my supplied line list.
  • There are arc peaks detected at 7280 and 7260. Both have been matched to the 7284A feature in my supplied line list.

I have not done a lot yet to try and fine tune the input parameters to prevent this. I am primarily asking if this is expected behaviour and if there is an option to prevent it. I could just drop the problematic line from my line lists but that loses a point from the fit and since one of the two matches does appear to be correct, it would be good to keep them.

Screenshot 2021-08-17 at 3 37 44 PM

Screenshot 2021-08-17 at 3 36 09 PM

refine_peaks() sometimes return duplicated peaks

When the peaks are too closely spaced or confused, refine_peaks() can return duplicated peaks because it is done by fitting a Gaussian over the confused peaks and only 1 minimum is returned.

fit() can get completely stuck

fit() can get completely stuck at

0%| | 0/1000 [00:00<?, ?it/s]
upon ctrl+c :

Traceback (most recent call last):
  File ".\reduce_floyds_data.py", line 348, in <module>
    onedspec_blue = calibrate_blue(science_blue, standard_blue, standard_name)
  File ".\reduce_floyds_data.py", line 267, in calibrate_blue
    blue_onedspec.fit(max_tries=1000,
  File "c:\users\cylam\git\aspired\aspired\onedspec.py", line 2506, in fit
    self.standard_wavecal.fit(max_tries=max_tries,
  File "c:\users\cylam\git\aspired\aspired\wavelengthcalibration.py", line 876, in fit
    self.spectrum1D.calibrator.fit(
  File "c:\users\cylam\git\rascal\rascal\calibrator.py", line 1892, in fit
    self._solve_candidate_ransac(
  File "c:\users\cylam\git\rascal\rascal\calibrator.py", line 862, in _solve_candidate_ransac
    y_temp = np.random.choice(y_choice)

support both matplotlib and plotly

How can we do it more elegantly than the current implementation?

plotly_imported = False
matplotlib_imported = False
try:
import matplotlib.pyplot as plt
matplotlib_imported = True
except:
warnings.warn(
'matplotlib package not available.')
matplotlib_imported = False
try:
import plotly.graph_objects as go
import plotly.io as pio
plotly_imported = True
except ImportError:
if matplotlib_imported:
warnings.warn(
'plotly is not present, only matplotlib can be used.')
else:
warnings.warn(
'Plot cannot be generated.')

Use NIST wavelength list

Use NIST wavelength list instead of individual elements wavelength list copied from various sources.

Atlas filtering based on estimated dispersion

Currently we just extract all atlas lines between two wavelengths. While it's useful to store all catalogue lines, it can confuse the algorithm:

  • if line separation is lower than the resolution of the instrument
  • during preliminary peak/arc line matching there is a higher uncertainty in matching multiple close atlas lines to a single peak (particularly if only one peak in a group is extracted)

We can't necessarily filter by relative intensity because this is instrument specific and small peaks at the edge of the spectrum can be critical for accurate fitting.

Proposal:

  • Optional (default true) ignore atlas lines which are closer together than the lowest possible resolution we expect (e.g. ((end_wavelength + tolerance) - (start_wavelength - tolerance))/n_pix)
  • Once a good coarse calibration has been obtained, fit again using much stricter matching criteria (could repeat this until the final set of matches doesn't change).
  • Could initially use high relative intensity lines + peaks for RANSAC, and then relax this criteria for later fitting.

robust fitting is still not robust enough

The two ends of the polynomial fit can run away (see only bottom plot):

figure_6_heatmap

from each pixel, 0 to 1000, it is showing the histogram of the distribution of the deviation of the wavelength solution from the median value at that pixel.

the spectrograph is only usable from 4020 A onwards (see top x-axis), meaning, we have some data off by less than 10A at the usable wavelength while SPRAT has a resoution of 9.2A. From 4400 to 8000 A, the deviation is less than 1A, which is better than 0.1 pixel i.e. robust fit is good for almost all of it, except it can go run-away at the extreme ends when they are unconstrained.

Refining the polynomial solution should first focus on the constant term

For a given instrument, the major difference between the wavelength calibrations of two arcs taken at different times is a linear drift of the spectrum (due to, for example, flexure of the instrument). So, in polynomial terms, it is the constant term that is changed, while all the higher-order terms are mostly unaffected.

I propose a separate function to match_peaks_to_atlas() to focus on refining a really-good initial set of polynomial fit that is mostly affected by a linear shift or distortion over only a small range of the calibration.

matched peaks returned by fit is not accurate

self.matched_peaks is updated every iteration of RANSAC, which means that when the solve_candidate_ransac terminates, the matched peak list reflects the most recent iteration, not the best one.

Using Rascal with GMOS multi-object spectra

Hi, I would like to try using Rascal to do the wavelength calibration for some GMOS-S multi-object spectra to see if it does a better job than the gswavelength code provided by Gemini. If I have an arc calibration file that has been processed though the Gemini IRAF software can I open the MEF file with astopy.io fits and loop over each individual slit and run Rascal on it?

Thanks
Liz Buckley-Geer

Calibrator.fit() does not work in example

When trying to use the calibration_test_sprat.py (after fixing the path and file name issues with spectrum and atlas)

`

IndexError Traceback (most recent call last)
in ()
----> 1 best_p = c.fit(3)
2 final_p = c.match_peaks_to_atlas(best_p, 0.25)
3
4 print(atlas)
5 print(final_p)

/Users/marcolam/git/rascal/calibrator.pyc in fit(self, top_n)
43 def fit(self, top_n=5):
44
---> 45 self.accumulator = self._hough_points(self.pairs[:,0], self.pairs[:,1])
46
47 h, lines = self._get_top_lines(self.accumulator, bins=100, top_n=top_n)

IndexError: too many indices for array
`

Fix API with _set_peaks()

The first parameter of _set_peaks() is not used. Potentially a formerly free parameter that has now become a property of Calibrator.

self._set_peaks(self.peaks)
def _set_peaks(self, peaks):
# Create a list of all possible pairs of detected peaks and lines from atlas
self._generate_pairs()
# Include user supplied pairs that are always fitted to within a tolerance
self.set_guess_pairs()
# Include user supplied pairs that are always fitted as given
self.set_known_pairs()

best_error sometimes get trapped at 0.0000

During fit, the best_error sometimes gets stuck at 0.0000 with the number of best inliners equal to sample_size, which is clearly a bad over-fit. We should have some defensive statement to avoid "perfect fit" to overtake good fit with high peak utilisation fraction.

Constrain polyfit for matching peaks

During the final peak/line fitting (after RANSAC) the polyfit is not constrained (e.g. by wavelength range) which can cause really weird solutions.

Should be straightforward using one of scipy's minimisation routines that allows parameter bounds.

Test robustness of refine_fit()

Possibly by some Monte-Carlo method to quantify the probability of successful refit as a function of the fractional deviation of the true pfit.

Need a function to manually remove atlas

An extra function to remove atlas. Perhaps give each index an unique ID?

rascal/rascal/calibrator.py

Lines 1426 to 1447 in 622489f

def list_atlas(self):
'''
List all the lines loaded to the Calibrator.
'''
for i in range(len(self.atlas)):
print('Element ' + str(self.atlas_elements[i]) + ' at ' +
str(self.atlas[i]) + ' with intensity ' +
str(self.atlas_intensities[i]))
def clear_atlas(self):
'''
Remove all the lines loaded to the Calibrator.
'''
self.atlas_elements = []
self.atlas = []
self.atlas_intensities = []

CI

Should we use Travis-CI?

Fix API with _get_atlas()

Maybe an inconsistency? The parameters in Calibrator._get_atlas() are not all used, are some of them meant to be passed to the future development of the util.load_calibration_lines()

rascal/rascal/calibrator.py

Lines 509 to 512 in 9e0dddf

def _get_atlas(self, elements, min_wavelength, max_wavelength, min_intensity, min_distance):
self.atlas_elements, self.atlas, self.atlas_intensities = load_calibration_lines(elements,
min_wavelength,
max_wavelength)

rascal/rascal/util.py

Lines 90 to 93 in 9e0dddf

def load_calibration_lines(elements,
min_wavelength=1000.,
max_wavelength=10000.,
include_first_order=False):

Background of RANSAC

Include background information of RANSAC in the doc/source/background/ransac.rst

peak_utilisation fraction is sometimes off

I think there is confusion between peak utilisation fraction and line utilisation fraction: the former is in pixel, the latter is in wavelengths. We should have two separate quantities for diagnostics.

Allow accumulation of multiple Hough transform

Line spacing can vary significantly at different wavelength ranges. A low-resolution hough space can miss the densely packed pixel-wavelength pairs, a high-resolution one can amplify noise in the sparsely populated region.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.