Comments (20)
To move forward on this, I propose to fix the convention the LumiSpy specific parts of the metadata in the documentation based on parts of the above discussion:
#109
from lumispy.
I recently contributed a selective inclusion of CL metadata from Gatan files to HyperSpy: hyperspy/hyperspy#2590
Gatan, unfortunately has a very different metadata structure, depending on acquisition mode (PMT/CCD) and again for spectral images. I tried to include the most important fields in metadata.Acquisition_instrument.CL
However, so far it does not include post-processing done in Digital Micrograph. Which eventually would be good in view of #51. Though, multiple post-processing steps lead to a recursive metadata structure, which would have to be properly parsed.
As it might not always be possible to enforce the same metadata hierarchy for all instrument types (though desirable), I contributed the metadata search capability in hyperspy/hyperspy#2633
from lumispy.
That's very nice!
I guess my one comment is: should we have the CL specific metadata structure to be exclusive to lumispy?
I feel that the metadata search capability is definitely something for hyperspy, but maybe the CL metadata fields should be part of lumispy only?
What are your thoughts?
from lumispy.
The DigitalMicrograph reader was already parsing metadata for EELS and EDS, so I thought it makes sense to add the functionality for CL metadata there as well.
from lumispy.
The longterm aim for HyperSpy is that all the IO functions will move to an extension package at some point.
For me it makes more sense to handle the metadata when the file is loaded, otherwise one has to run an extra function on the data, or we again need a wrapper for the load function, no?
However, I do not know how it is for the reader that you use, could the metadata parsing be added there?
Despite the search functionality, we should in any case try to stick to some basic conventions where to save the essential metadata in the tree.
from lumispy.
The loader I use is for .sur
files mainly.
That one parses all the s.original_metadata
very nicely.
However, it is in my opinion a bit annoying to have to go back to the original metadata every time I need a value (because it has too many values).
That was the idea of my comments.
My question is: where should the metadata parsing from original metadata
--> metatadata
function be placed? In the IO reader?
On a side, I feel there is no convention on how to organise metadata for luminescence.
Hyperspy already has the Acquisition_instrument.SEM
structure declared. But what is common about any luminescence data are the Spectrometer
and Camera
acquisition instruments.
Should we think of a structure for those 2 pieces of instruments?
Based on the SEM structure in hyperspy...
├── Acquisition_instrument
│ ├── SEM
│ │ ├── Detector
│ │ │ ├── detector_type
│ │ │ └── EDS
│ │ │ ├── azimuth_angle (º)
│ │ │ ├── elevation_angle (º)
│ │ │ ├── energy_resolution_MnKa (eV)
│ │ │ ├── live_time (s)
│ │ │ └── real_time (s)
│ │ ├── beam_current (nA)
│ │ ├── beam_energy (keV)
│ │ ├── probe_area (nm²)
│ │ ├── convergence_angle (mrad)
│ │ ├── magnification
│ │ ├── microscope
│ │ ├── Stage
│ │ │ ├── rotation (º)
│ │ │ ├── tilt_alpha (º)
│ │ │ ├── tilt_beta (º)
│ │ │ ├── x (mm)
│ │ │ ├── y (mm)
│ │ │ └── z (mm)
│ │ └── working_distance (mm)
Here's an idea:
├── Acquisition_instrument
│ ├── SPECTROMETER
│ │ ├── model_name
│ │ ├── Grating
│ │ │ ├── groove_density (gr/mm)
│ │ │ ├── blazing_angle (º)
│ │ ├── central_wavelength (nm)
│ │ └── entrance_slit_width (mm)
│ └── CCD
│ │ ├── model_name
│ │ ├── life_time (ms)
│ │ ├── real_time (ms)
│ │ ├── binning
│ │ ├── signal_amplification (xN)
│ │ ├── readout_rate (MHz)
│ │ └── pixel_width (mm)
Please, copy this comment and add any other parameters that may be key for luminescence that I may be missing.
Then we can create documentation for the metadata structure.
from lumispy.
In think the parsing has to be in the io-reader, at least that seems to be how it is so far handled in HyperSpy. It should not be necessary to run an extra function. Alternative may be to add it in the init
of the CLSpectrum
, but then it would have to include support for a range of metadata formats in a single file. How do you do it for pyXEM @dnjohnstone ?
I second that we should agree on a common structure though.
The current implementation for DM files I chose is the following, depending on the detector configuration and what metadata Digital Micrograph saves. As it is a rather recent addition, if we agree on a different structure, it still can be adapted.
PMT spectrum:
├── Acquisition_instrument
│ └── CL
│ ├── acquisition_mode = Serial dispersive
│ ├── detector_type = linear
│ ├── dispersion_grating = 1200
│ ├── dwell_time = 1.0
│ ├── start_wavelength = 166.233642578125
│ └── step_size = 0.5
CCD spectrum:
├── Acquisition_instrument
│ ├── CL
│ │ ├── CCD
│ │ │ ├── binning = (1, 100)
│ │ │ ├── processing = Dark Subtracted
│ │ │ └── read_area = (0, 0, 100, 1336)
│ │ ├── acquisition_mode = Parallel dispersive
│ │ ├── central_wavelength = 949.9741821289062
│ │ ├── dispersion_grating = 300.0
│ │ ├── exposure = 30.0
│ │ ├── frame_number = 1
│ │ ├── integration_time = 30.0
│ │ └── saturation_fraction = 0.01861908845603466
Spectrum image:
├── Acquisition_instrument
│ ├── CL
│ │ ├── CCD
│ │ │ ├── binning = (1, 100)
│ │ │ ├── processing = Dark Subtracted
│ │ │ └── read_area = (0, 0, 100, 1336)
│ │ ├── SI
│ │ │ ├── drift_correction_periodicity = 1
│ │ │ ├── drift_correction_units = second(s)
│ │ │ └── mode = LineScan
│ │ ├── acquisition_mode = Parallel dispersive
│ │ ├── central_wavelength = 869.9838256835938
│ │ ├── dispersion_grating = 600.0
│ │ ├── exposure = 0.05000000074505806
│ │ ├── frame_number = 1
│ │ └── saturation_fraction <list>
from lumispy.
In principle, I prefer to have one general node for the CL system as it forms one acquisition instrument and include both the spectrometer and CCD/PMT related metadata there. My idea was to have what you call Spectrometer
in the main CL
node and the detector specific metadata in a sub-node. I'm happy to rename tings like 'dispersion_grating' to groove_density
, I just took the names that Gatan used.
from lumispy.
I am not sure if having all these as a CL node makes sense. I was thinking a broader metadata that could also work for PL people. At the end of the day, the only difference in our setups and PL setups is the presence of a laser/light source as opposed to the electron beam. All the other elements are the same (spectrometer and detector). That's why I still think keeping them as separate units makes sense.
Having said that, the hyperspy way to do this is as you are suggesting. However, there should be a common CL structure regardless of the navigation axis (e.g. a structure that contains all the features from the 2 CL examples you have above).
If we do take this hyperspy metadata way, then PL and CL metadatas would have many common Detector
sub-node (similar to the common EDS node for SEM and TEM in hyperspy).
While there would be some repetition, it may make more sense to stick to the hyperspy structure.
from lumispy.
Good point, CL/PL/EL would be encoded in Signal.signal_type
anyway.
Along a similar line is this HyperSpy issue hyperspy/hyperspy#2095 - it proposes that SEM/TEM nodes should be merged to a microscope node, as they have similar structure and e.g. the acquisition_mode
field or another extra field like microscope_type
would just distinguish the type. Unfortunately, it has not seen much resonance.
from lumispy.
Still, how about having CCD/PMT as well as SI as subsets of spectrometer? The important thing I would say is the overall structure. Every instrument may still have variations of the available parameters.
The names should not contain spaces and special signs, so the units would have to be either following fixed convention, or they might be added as an additional metadata fiels such as life_time_units
.
├── Acquisition_instrument
│ ├── Spectrometer
│ │ ├── model_name
│ │ └── acquisition_mode
│ │ ├── central_wavelength (nm)
│ │ └── entrance_slit_width (mm)
│ │ ├── Grating
│ │ │ ├── groove_density (gr/mm)
│ │ │ ├── blazing_angle (º)
│ │ ├── CCD
│ │ │ ├── model_name
│ │ │ ├── life_time (ms)
│ │ │ ├── real_time (ms)
│ │ │ ├── binning
│ │ │ ├── processing
│ │ │ ├── signal_amplification (xN)
│ │ │ ├── readout_rate (MHz)
│ │ │ └── pixel_width (mm)
│ │ ├── SI
│ │ │ ├── drift_correction_periodicity = 1
│ │ │ ├── drift_correction_units = second(s)
│ │ │ └── mode = LineScan
In case of PMT it would become e.g.:
├── Acquisition_instrument
│ ├── Spectrometer
│ │ ├── model_name
│ │ └── entrance_slit_width (mm)
│ │ └── exit_slit_width (mm)
│ │ └── acquisition_mode
│ │ ├── Grating
│ │ │ ├── groove_density (gr/mm)
│ │ │ ├── blazing_angle (º)
│ │ ├── PMT
│ │ │ ├── model_name
│ │ │ ├── dwell_time (ms)
│ │ │ ├── real_time (ms)
│ │ ├── start_wavelength = 166.233642578125
│ │ └── step_size = 0.5
from lumispy.
How do you distinguish between life_time
and real_time
is it automatically calculated? Gatan does not have such a distinction.
from lumispy.
How do you distinguish between
life_time
andreal_time
is it automatically calculated? Gatan does not have such a distinction.
Attolight provides us with the real_time
which is slightly different than dwell_time
due to binning.
I would keep units to a fixed convention (as hyperspy does).
I am happy to go ahead with a PR in which I will add a small guide on the structure of luminecence specific metadata.
Once we have that, what would be the next step? Change the IO-plugins? Or create an io-util which can parse the original_metadata to the structure we set? Not too sure
from lumispy.
Please do so.
Once we have the structure, I could adapt the DM reader to change the metadata structure it fills with the CL data accordingly (introduced in hyperspy/hyperspy#2590 - but as it is part of the dev-branch only so far, it does not hurt changing that again now).
The .sur
reader only creates a minimalistic metadata
object in the _build_metadata
function so far. One could add the parsing there as well.
Though, in view of the extension framework that HyperSpy is aiming at, it might indeed be preferable to parse the metadata from the original metadata in LumiSpy instead of in the file readers. In that case, we would have to create a function _build_metadata
in cl_spectrum.py
which is called from an __init__
function that we would have to create for the CLSpectrum
class. Any oppinions on which direction to head @ericpre @LMSC-NTappy ?
from lumispy.
Dear all,
I am happy to go ahead with a PR in which I will add a small guide on the structure of luminescence specific metadata.
That sounds great, if you could post the PR# on this thread once it's created that would be great, I'd be happy to participate in this discussion.
The .sur reader only creates a minimalistic metadata object in the _build_metadata function so far. One could add the parsing there as well.
Though, in view of the extension framework that HyperSpy is aiming at, it might indeed be preferable to parse the metadata from the original metadata in LumiSpy instead of in the file readers.
Both could work in my opinion. When coding, keep in mind that in a .sur file the data will be spectral only if s.original_metadata.Object_0_Channel_0.Header.H05_Object_Type
is equal to 20 (Spectrum) or 21 (HYPCard), for a spectrum or a spectral image respectively. Then, the relevant experiment setup content in s.original_metadata.Object_0_Channel_0.Parsed
is specific to the client using the .sur
file format, and this is where it can get a bit tricky because nothing guarantees that the data inside obey a certain structure. The [CHANNELS,SCAN,SEM,...] structure is purely attolight's convention.
from lumispy.
Though, in view of the extension framework that HyperSpy is aiming at, it might indeed be preferable to parse the metadata from the original metadata in LumiSpy instead of in the file readers. In that case, we would have to create a function
_build_metadata
incl_spectrum.py
which is called from an__init__
function that we would have to create for theCLSpectrum
class. Any oppinions on which direction to head @ericpre @LMSC-NTappy ?
I have been wondering what to do with the metadata for some time... Parsing the metadata at the class initialisation is indeed an interesting idea, but it has the following drawback:
- the mapping of
original_metadata
to themetadata
needs to be maintained in the signal class and this mapping is mainly reader dependent: for a specific signal, you can read different format, which means different mapping. - Even if this is a good pattern, not all readers use the mapping from
original_metadata
tometadata
- this may add some undesirable overhead... when creating many signals with large
original_metadata
- see hyperspy/hyperspy#2623.
Since the metadata are mainly reader dependent, it makes sense to keep the mapping in the reader itself. The issue is how to define the metadata in a consistent manner and centralise this information in one place! What @jlaehne suggested made me think that the metadata could be define in the signal definition in hyperspy_extension.yaml
For example, the metadata of EDS signal could be define as:
EDSSEMSpectrum:
signal_type: EDS_TEM
signal_dimension: 1
dtype: real
lazy: False
module: hyperspy._signals.eds_tem
metadata:
Acquisition_instrument:
SEM:
Detector:
EDS:
azimuth_angle:
- type: float
- units: º
- description: The azimuth angle of the detector in degree. If the azimuth is zero, the detector is perpendicular to the tilt axis.
elevation_angle:
- type: float
- units: º
- description: The elevation angle of the detector in degree. The detector is perpendicular to the surface with an angle of 90.
This would have the following advantage:
- each library can define its metadata in a consistent manner
- it would be easy to parse this information and centralise it in a single place and run a daily cron job to build the documentation of release and development version
I think that the keep point is to have something in place which make it easy to find what metadata already exist in order to encourage interoperability and consistency.
from lumispy.
That sounds like a fair strategy. I agree with your arguments favoring the reader. Parsing it in the signal definition could not be universal either, but would have to include specific variations for every type of format, because manufacturers do not follow any type of systematic scheme.
At the same time, defining the metadata scheme per signal type in hyperspy_extensions.yaml
gives the necessary flexibility to extensions without having all of that in HyperSpy directly, which would counteract the extensions scheme - the readers can then follow these formats for the signal types they suppport.
from lumispy.
When thinking further about this, many signals will share parts of the metadata structure and we need mechanisms how to easily ensure consistency. So I was just wondering about possibilities of inheritance of metadata and defining blocks of metadata that can be imported in a signal. So a kind of modular system. Any idea how to go about that @ericpre?
For example, in LumiSpy, all signal types will share metadata about the spectrometer + metadata of detectors (CCD/PMT/streak camera), while the CL signals will also need SEM/TEM metadata.
To this end, it could also be easier to have the EDS/EELS metadata one level higher in the hierarchy (not as a subnode of SEM/TEM).
from lumispy.
I was thinking that the most common one should be defined in base classes, for example, the stage
metadata could go in the BaseSignal
definition.
But maybe, it is better not to tight it up with the signal definition then... and simply define the metadata at the root of the hyperspy_extensions.yaml
file? I don't think that there is any benefit to have the metadata defined with the signal. It actually makes the signal definition less readable...
With a separate metadata
entry in the hyperspy_extensions.yaml
file:
signals:
MySignal:
...
components1D:
...
component2D:
...
metadata:
Acquisition_instrument:
Detector:
EDS:
azimuth_angle:
- type: float
- units: º
- description: The azimuth angle of the detector in degree. If the azimuth is zero...
elevation_angle:
- type: float
- units: º
- description: The elevation angle of the detector in degree. The detector is...
...
The most common metadata could go in the hyperspy repository.
from lumispy.
I think that the cleanest way of tackling the "metadata babel" issue is fully separating the metadata translation from hyperspy, the extensions and even the readers. I am thinking on a sort of pandoc for metadata. Pandoc defines its own markdown specification and includes bidirectional translators for many file formats. We could do the same: a package that translates metadata from (and to) any supported format to a standard specification. In this way, the reader's role is simply reading the metadata, nothing else: the translation is outsourced to a separate package with its own (fast) release cycle and versioning.
In order to maximize the contributions to the standard specification and the mapping between the different formats, we should think on how to simplify the task of amending them. Ideally it should not require any programming skills. I like @ericpre's yaml proposal above. However, the file may easily get uncomfortably big and complex if we put everything in one place. So what about recreating the tree structure using folders? In this way the specification could consist on directories and yaml files. In @ericpre example above, Acquisition_instrument
would be a folder that would contain Detector
(folder), that itself contains EDS
that contains azimuth_angle.yaml
. That file could also contain the mappings to other metadata structures. This is of course inefficient for the translation task, but we could then "compile" those files into an efficient mapping automatically using CI to get everybody (humans and machines) happy.
What do you think?
from lumispy.
Related Issues (20)
- Docstring formatting in RTD HOT 11
- Transient signal classes HOT 3
- LumiSpy v0.2 HOT 2
- Create a `remove_background_signal` function HOT 1
- Extend functionality of crop_edges HOT 3
- Implementation of Spectroscopy File Readers HOT 18
- Increase coverage HOT 2
- Webhook failing HOT 3
- Adding an interactive way to slice HS over a wavelength range, and view the result spatially! HOT 12
- Consolidate axis conversion codes
- Failure with numpy 1.24.dev
- Slicing of energy/wavenumber signal with isig fails
- Failure with numpy dev HOT 1
- Documentation is broken HOT 3
- Azure tests failing HOT 2
- Find maximum and width of a peak HOT 1
- Reminder: Change doc-links to sphinx
- Fix test for `remove_spikes` HOT 4
- Slicing TransientSpec HOT 22
- `to_eV()` function and non-uniform axis HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lumispy.