jveitchmichaelis / rascal Goto Github PK

RAnsac Assisted Spectral CALibration

License: BSD 3-Clause "New" or "Revised" License

Python 52.99% Jupyter Notebook 47.01%

spectroscopy calibration astronomy astrophysics

rascal's Introduction

Rascal: RANSAC Assisted Spectral CALibration

Rascal is a library for automated spectrometer wavelength calibration. It has been designed primarily for astrophysics applications, but should be usable with spectra captured from any similar spectrometer.

Given a set of peaks located in your spectrum, Rascal will attempt to determine a model for your spectrometer to convert between pixels and wavelengths.

Unlike other calibration methods, rascal does not require you to manually select lines in your spectrum. Ideally you should know approximate parameters about your system, namely:

What arc lamp was used (e.g. Xe, Hg, Ar, CuNeAr)
What the dispersion of your spectrometer is (i.e. angstroms/pixel)
The spectral range of your system, and the starting wavelength

You don't need to know the dispersion and start wavelength exactly. Often this information is provided by the observatory, but if you don't know it, you can take a rough guess. The closer you are to the actual system settings, the more likely it is that Rascal will be able to solve the calibration. Blind calibration, where no parameters are known, is possible but challenging currently. If you don't know the lamp, you can try iterating over the various combinations of sources. Generally when you do get a correct fit, with most astronomical instruments the errors will be extremely low.

More background information can be referred to this arXiv article.

Dependencies

python >= 3.7
numpy>=1.16,<1.24
scipy>=1.3.3
pynverse>=0.1.4
matplotlib>=3.0.3
tqdm>=4.48.0

Optional Dependencies

plotly >= 4.0

Installation

Instructions can be found here.

Reporting issues/feature requests

Please use the issue tracker to report any issues or support questions.

Getting started

The quickstart guide will show you how to reduce the example dataset.

Contributing Code/Documentation

If you are interested in contributing code to the project, thank you! For those unfamiliar with the process of contributing to an open-source project, you may want to read through Github’s own short informational section on how to submit a contribution or send me a message.

Style -- we now use black for formatting, you can easily set this up using a pre-commit hook.

pip install pre-commit
pre-commit install

Disclaimer

We duplicate some of the relevant metadata, but we do not process the raw metadata. Some of the metadata this software creates contain full path to the files in your system, which most likely includes a user name on your machine. Please be advised it is your responsibility to be compliant with the privacy law(s) that you are oblidged to follow, and it is your responsibility to remove any metadata that may reveal personal information and/or provide information that can reveal any computing vulunerability.

rascal's People

Contributors

Stargazers

Watchers

Forkers

adam-broussard pierfra-rocci

rascal's Issues

refine_peaks() cannot handle nan

This function should first detect non-numeric values before refining.

Add a test set of example spectra with known arc line/peak correspondences.

We should have a small set of example spectra from appropriate instruments in a reduced form (e.g. 1D signals). Along with these signals we should have a list of:

Extracted peak locations
Associated arc line wavelengths
Sky lines if visible
An actual "gold" fit made by a human (e.g. 4D polynomial coefficients)

`plot_fit()` failed to generate diagnostic plot

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\rascal\calibrator.py in plot_fit(self, fit_coeff, spectrum, tolerance, plot_atlas, log_spectrum, savefig, filename, json, renderer)
2244 ax1.plot(wave, spectrum)
2245 ax1.vlines(self.polyval(self.peaks, fit_coeff),
-> 2246 spectrum[self.pix_to_rawpix(self.peaks).astype('int')],
2247 vline_max,
2248 linestyles='dashed',
TypeError: only integer scalar arrays can be converted to a scalar index

'Calibrator' object has no attribute 'load_user_atlas'

ignore, this issue.

Use logging instead of print statements

Then allow users to set log level (or set log level based on silence).

Using the same line from a line list multiple times

I found a behaviour in the arc fitting that I had not expected. It is a little counter intuitive but may be a natural consequence of the way that rascal works. I should stress that I was using RASCAL via ASPIRED (https://github.com/cylammarco/ASPIRED), so it is possible I should file this query on that repository instead? Let me know if you want me to move it. Looking at the ASPIRED code, the wrapper to the Calibrator.fit() seems fairly thin so I am just guessing the issue more likely relates to here.

In ASPIRED I use
initialise_calibrator()
set_hough_properties()
do_hough_transform()
fit()

The attached plots are output from aspired.OneDSpec.fit() which I think is calling rascal.Calibrator.fit().

A single line in the arc line list can get used twice in the final fit for two different peaks in the arc spectrum. It appears that if two peaks fall within the specified tolerances it uses both. Naively I would normally expect a fit to match the arc line to the peak that fits best and ignore the other so that any one line from the line list only ever gets used once. What it does in practice is create two points in the final output fit. One has a residual very close to zero (the correct match) and the other has a large residual (an erroneously matched, nearby arc feature).

In the attached examples:

There are arc peaks detected at 4520 and 4495. Both have been matched to the 4521A feature in my supplied line list.
There are arc peaks detected at 7280 and 7260. Both have been matched to the 7284A feature in my supplied line list.

I have not done a lot yet to try and fine tune the input parameters to prevent this. I am primarily asking if this is expected behaviour and if there is an option to prevent it. I could just drop the problematic line from my line lists but that loses a point from the fit and since one of the two matches does appear to be correct, it would be good to keep them.

SyntheticSpectrum test keeps failing

Need to investigate whether the generation of a synthetic spectrum is erroneous, the test is or the wavelength calibration is.

refine_peaks() sometimes return duplicated peaks

When the peaks are too closely spaced or confused, refine_peaks() can return duplicated peaks because it is done by fitting a Gaussian over the confused peaks and only 1 minimum is returned.

Readthedoc List of Modules are empty

The List of Modules section is empty.

User supplied element and wavelengths

Allow user to "install" their supplied elements and the wavelengths of the lines.

fit() can get completely stuck

fit() can get completely stuck at

0%| | 0/1000 [00:00<?, ?it/s]
upon ctrl+c :

Traceback (most recent call last):
  File ".\reduce_floyds_data.py", line 348, in <module>
    onedspec_blue = calibrate_blue(science_blue, standard_blue, standard_name)
  File ".\reduce_floyds_data.py", line 267, in calibrate_blue
    blue_onedspec.fit(max_tries=1000,
  File "c:\users\cylam\git\aspired\aspired\onedspec.py", line 2506, in fit
    self.standard_wavecal.fit(max_tries=max_tries,
  File "c:\users\cylam\git\aspired\aspired\wavelengthcalibration.py", line 876, in fit
    self.spectrum1D.calibrator.fit(
  File "c:\users\cylam\git\rascal\rascal\calibrator.py", line 1892, in fit
    self._solve_candidate_ransac(
  File "c:\users\cylam\git\rascal\rascal\calibrator.py", line 862, in _solve_candidate_ransac
    y_temp = np.random.choice(y_choice)

support both matplotlib and plotly

How can we do it more elegantly than the current implementation?

rascal/rascal/calibrator.py

Lines 11 to 32 in 9e0dddf

 plotly_imported = False 

 matplotlib_imported = False 

 try: 

 import matplotlib.pyplot as plt 

 matplotlib_imported = True 

 except: 

 warnings.warn( 

 'matplotlib package not available.') 

 matplotlib_imported = False 

 try: 

 import plotly.graph_objects as go 

 import plotly.io as pio 

 plotly_imported = True 

 except ImportError: 

 if matplotlib_imported: 

 warnings.warn( 

 'plotly is not present, only matplotlib can be used.') 

 else: 

 warnings.warn( 

 'Plot cannot be generated.')

Use NIST wavelength list

Use NIST wavelength list instead of individual elements wavelength list copied from various sources.

Add documentation

with sphinx and readthedoc.io?

Atlas filtering based on estimated dispersion

Currently we just extract all atlas lines between two wavelengths. While it's useful to store all catalogue lines, it can confuse the algorithm:

if line separation is lower than the resolution of the instrument
during preliminary peak/arc line matching there is a higher uncertainty in matching multiple close atlas lines to a single peak (particularly if only one peak in a group is extracted)

We can't necessarily filter by relative intensity because this is instrument specific and small peaks at the edge of the spectrum can be critical for accurate fitting.

Proposal:

Optional (default true) ignore atlas lines which are closer together than the lowest possible resolution we expect (e.g. ((end_wavelength + tolerance) - (start_wavelength - tolerance))/n_pix)
Once a good coarse calibration has been obtained, fit again using much stricter matching criteria (could repeat this until the final set of matches doesn't change).
Could initially use high relative intensity lines + peaks for RANSAC, and then relax this criteria for later fitting.

Double check existing atlases against NIST

We seem to be inconsistent - some lines agree, some don't - which may be causing errors in fits.

robust fitting is still not robust enough

The two ends of the polynomial fit can run away (see only bottom plot):

from each pixel, 0 to 1000, it is showing the histogram of the distribution of the deviation of the wavelength solution from the median value at that pixel.

the spectrograph is only usable from 4020 A onwards (see top x-axis), meaning, we have some data off by less than 10A at the usable wavelength while SPRAT has a resoution of 9.2A. From 4400 to 8000 A, the deviation is less than 1A, which is better than 0.1 pixel i.e. robust fit is good for almost all of it, except it can go run-away at the extreme ends when they are unconstrained.

Refining the polynomial solution should first focus on the constant term

For a given instrument, the major difference between the wavelength calibrations of two arcs taken at different times is a linear drift of the spectrum (due to, for example, flexure of the instrument). So, in polynomial terms, it is the constant term that is changed, while all the higher-order terms are mostly unaffected.

I propose a separate function to match_peaks_to_atlas() to focus on refining a really-good initial set of polynomial fit that is mostly affected by a linear shift or distortion over only a small range of the calibration.

generate_hough_points_brute_force() some times cannot yield good solution

generate_hough_points_brute_force() by definition contains all the possible pairs of (gradient, intercept), but the fit doesn't usually give a good solution. It could be related to the 2D-binning after Hough Transform before fitting.

Support vacuum to air conversion more generally

Function for vacuum->air is already in the code, but we should make sure all the arc lists are in vacuum (most are not).

Update PyPI

As title.

fit() and match_peaks() should return results in identical format

Both functions should return:

fit_coeff
matched_peaks
matched_atlas
rms
residual
peak_utilisation
atlas_utilisation

matched peaks returned by fit is not accurate

self.matched_peaks is updated every iteration of RANSAC, which means that when the solve_candidate_ransac terminates, the matched peak list reflects the most recent iteration, not the best one.

Cleanup arc lines into separate files

Easier to deal with I think.

Using Rascal with GMOS multi-object spectra

Hi, I would like to try using Rascal to do the wavelength calibration for some GMOS-S multi-object spectra to see if it does a better job than the gswavelength code provided by Gemini. If I have an arc calibration file that has been processed though the Gemini IRAF software can I open the MEF file with astopy.io fits and loop over each individual slit and run Rascal on it?

Thanks
Liz Buckley-Geer

'legendre' and 'chebyshev' fit are not working

rascal/rascal/calibrator.py

Lines 773 to 778 in e3778b0

 elif fittype == 'legendre': 

 self.polyfit = np.polynomial.legendre.legfit 

 self.polyval = np.polynomial.legendre.legval 

 elif fittype == 'chebyshev': 

 self.polyfit = np.polynomial.chebyshev.chebfit 

 self.polyval = np.polynomial.chebyshev.chebval

Calibrator.fit() does not work in example

When trying to use the calibration_test_sprat.py (after fixing the path and file name issues with spectrum and atlas)

`

IndexError Traceback (most recent call last)
in ()
----> 1 best_p = c.fit(3)
2 final_p = c.match_peaks_to_atlas(best_p, 0.25)
3
4 print(atlas)
5 print(final_p)

/Users/marcolam/git/rascal/calibrator.pyc in fit(self, top_n)
43 def fit(self, top_n=5):
44
---> 45 self.accumulator = self._hough_points(self.pairs[:,0], self.pairs[:,1])
46
47 h, lines = self._get_top_lines(self.accumulator, bins=100, top_n=top_n)

IndexError: too many indices for array
`

Fix API with _set_peaks()

The first parameter of _set_peaks() is not used. Potentially a formerly free parameter that has now become a property of Calibrator.

rascal/rascal/calibrator.py

Lines 89 to 100 in 9e0dddf

 self._set_peaks(self.peaks) 

 def _set_peaks(self, peaks): 

 # Create a list of all possible pairs of detected peaks and lines from atlas 

 self._generate_pairs() 

 # Include user supplied pairs that are always fitted to within a tolerance 

 self.set_guess_pairs() 

 # Include user supplied pairs that are always fitted as given 

 self.set_known_pairs()

best_error sometimes get trapped at 0.0000

During fit, the best_error sometimes gets stuck at 0.0000 with the number of best inliners equal to sample_size, which is clearly a bad over-fit. We should have some defensive statement to avoid "perfect fit" to overtake good fit with high peak utilisation fraction.

new name for match_peaks_to_atlas() and extend the function?

match_peaks_to_atlas() is somewhat "refining the solution", can we rename this to something more obvious?

Automate PyPI publishing

https://packaging.python.org/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/

or use some third party Github Actions:
https://github.com/marketplace/actions/pypi-publish
https://github.com/marketplace/actions/pypi-deployment

Manually adding, removing and listing arc lines (wavelength)

Converts the computationally heavy part to C/C++ for speedup

Constrain polyfit for matching peaks

During the final peak/line fitting (after RANSAC) the polyfit is not constrained (e.g. by wavelength range) which can cause really weird solutions.

Should be straightforward using one of scipy's minimisation routines that allows parameter bounds.

Test robustness of refine_fit()

Possibly by some Monte-Carlo method to quantify the probability of successful refit as a function of the fractional deviation of the true pfit.

Experiment with other RANSAC variants

See: https://dspace5.zcu.cz/bitstream/11025/6869/1/Hast.pdf

May reduce some of the stochastic-ness that we see when using low RANSAC iterations.

scikit-learn also has a generic RANSAC model: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RANSACRegressor.html

Various here: http://theia-sfm.org/ransac.html

Manually adding, removing and listing Peaks (Pixels)

Need an option for users to add or remove peaks for calibration.

Tidy up RTD and include version log

Tidy up and add the new features in RTD.

Allow manual removal of pixel-wavelength pairs after fitting

Allow manual refinement of the fitted solution.

Need a function to manually remove atlas

An extra function to remove atlas. Perhaps give each index an unique ID?

rascal/rascal/calibrator.py

Lines 1426 to 1447 in 622489f

 def list_atlas(self): 

 ''' 

  List all the lines loaded to the Calibrator. 

  ''' 

 for i in range(len(self.atlas)): 

 print('Element ' + str(self.atlas_elements[i]) + ' at ' + 

 str(self.atlas[i]) + ' with intensity ' + 

 str(self.atlas_intensities[i])) 

 def clear_atlas(self): 

 ''' 

  Remove all the lines loaded to the Calibrator. 

  ''' 

 self.atlas_elements = [] 

 self.atlas = [] 

 self.atlas_intensities = []

Put it on Zenodo

Then, update index.rst and README.md (https://gist.github.com/seignovert/ae6771f400ca464d294261f42900823a)

CI

Should we use Travis-CI?

wavelength solution of the first and the last pixels should be checked against the given min/max wavelength range

Even though ransac is checking that pix-wave solution is within the min/max wavelength range provided, the solution can still explode at the two ends of the solution.

We should check that the first and the last pixels are both fitted within the min/max wavelength range provided.

Simulated spectra, unit testing and test for robustness

We should look simulated spectra for functional testing, and to characterise how robust the fits are.

What I'm not sure about is how to generate sensible values of curvature in the polynomial - @cylammarco any ideas?

Change master to main

Rename branch name to the new GitHub default naming.

Fix pip and poetry conflict

pip installation runs into a dependency problem when .toml file is present.

Fix API with _get_atlas()

Maybe an inconsistency? The parameters in Calibrator._get_atlas() are not all used, are some of them meant to be passed to the future development of the util.load_calibration_lines()

rascal/rascal/calibrator.py

Lines 509 to 512 in 9e0dddf

 def _get_atlas(self, elements, min_wavelength, max_wavelength, min_intensity, min_distance): 

 self.atlas_elements, self.atlas, self.atlas_intensities = load_calibration_lines(elements, 

 min_wavelength, 

 max_wavelength)

rascal/rascal/util.py

Lines 90 to 93 in 9e0dddf

 def load_calibration_lines(elements, 

 min_wavelength=1000., 

 max_wavelength=10000., 

 include_first_order=False):

	plotly_imported = False
	matplotlib_imported = False

	try:
	import matplotlib.pyplot as plt
	matplotlib_imported = True
	except:
	warnings.warn(
	'matplotlib package not available.')
	matplotlib_imported = False

	try:
	import plotly.graph_objects as go
	import plotly.io as pio
	plotly_imported = True
	except ImportError:
	if matplotlib_imported:
	warnings.warn(
	'plotly is not present, only matplotlib can be used.')
	else:
	warnings.warn(
	'Plot cannot be generated.')

	elif fittype == 'legendre':
	self.polyfit = np.polynomial.legendre.legfit
	self.polyval = np.polynomial.legendre.legval
	elif fittype == 'chebyshev':
	self.polyfit = np.polynomial.chebyshev.chebfit
	self.polyval = np.polynomial.chebyshev.chebval

	self._set_peaks(self.peaks)

	def _set_peaks(self, peaks):

	# Create a list of all possible pairs of detected peaks and lines from atlas
	self._generate_pairs()

	# Include user supplied pairs that are always fitted to within a tolerance
	self.set_guess_pairs()

	# Include user supplied pairs that are always fitted as given
	self.set_known_pairs()

	def list_atlas(self):
	'''
	List all the lines loaded to the Calibrator.

	'''

	for i in range(len(self.atlas)):

	print('Element ' + str(self.atlas_elements[i]) + ' at ' +
	str(self.atlas[i]) + ' with intensity ' +
	str(self.atlas_intensities[i]))

	def clear_atlas(self):
	'''
	Remove all the lines loaded to the Calibrator.

	'''

	self.atlas_elements = []
	self.atlas = []
	self.atlas_intensities = []

	def _get_atlas(self, elements, min_wavelength, max_wavelength, min_intensity, min_distance):
	self.atlas_elements, self.atlas, self.atlas_intensities = load_calibration_lines(elements,
	min_wavelength,
	max_wavelength)

	def load_calibration_lines(elements,
	min_wavelength=1000.,
	max_wavelength=10000.,
	include_first_order=False):

jveitchmichaelis / rascal Goto Github PK

rascal's Introduction

Rascal: RANSAC Assisted Spectral CALibration

Dependencies

Optional Dependencies

Installation

Reporting issues/feature requests

Getting started

Contributing Code/Documentation

Disclaimer

rascal's People

Contributors

Stargazers

Watchers

Forkers

rascal's Issues

`

Recommend Projects

Recommend Topics

Recommend Org