Git Product home page Git Product logo

Comments (20)

jlaehne avatar jlaehne commented on August 18, 2024 1

To move forward on this, I propose to fix the convention the LumiSpy specific parts of the metadata in the documentation based on parts of the above discussion:
#109

from lumispy.

jlaehne avatar jlaehne commented on August 18, 2024

I recently contributed a selective inclusion of CL metadata from Gatan files to HyperSpy: hyperspy/hyperspy#2590

Gatan, unfortunately has a very different metadata structure, depending on acquisition mode (PMT/CCD) and again for spectral images. I tried to include the most important fields in metadata.Acquisition_instrument.CL

However, so far it does not include post-processing done in Digital Micrograph. Which eventually would be good in view of #51. Though, multiple post-processing steps lead to a recursive metadata structure, which would have to be properly parsed.

As it might not always be possible to enforce the same metadata hierarchy for all instrument types (though desirable), I contributed the metadata search capability in hyperspy/hyperspy#2633

from lumispy.

jordiferrero avatar jordiferrero commented on August 18, 2024

That's very nice!
I guess my one comment is: should we have the CL specific metadata structure to be exclusive to lumispy?

I feel that the metadata search capability is definitely something for hyperspy, but maybe the CL metadata fields should be part of lumispy only?

What are your thoughts?

from lumispy.

jlaehne avatar jlaehne commented on August 18, 2024

The DigitalMicrograph reader was already parsing metadata for EELS and EDS, so I thought it makes sense to add the functionality for CL metadata there as well.

from lumispy.

jlaehne avatar jlaehne commented on August 18, 2024

The longterm aim for HyperSpy is that all the IO functions will move to an extension package at some point.

For me it makes more sense to handle the metadata when the file is loaded, otherwise one has to run an extra function on the data, or we again need a wrapper for the load function, no?

However, I do not know how it is for the reader that you use, could the metadata parsing be added there?

Despite the search functionality, we should in any case try to stick to some basic conventions where to save the essential metadata in the tree.

from lumispy.

jordiferrero avatar jordiferrero commented on August 18, 2024

The loader I use is for .sur files mainly.
That one parses all the s.original_metadata very nicely.
However, it is in my opinion a bit annoying to have to go back to the original metadata every time I need a value (because it has too many values).

That was the idea of my comments.

My question is: where should the metadata parsing from original metadata --> metatadata function be placed? In the IO reader?

On a side, I feel there is no convention on how to organise metadata for luminescence.
Hyperspy already has the Acquisition_instrument.SEM structure declared. But what is common about any luminescence data are the Spectrometer and Camera acquisition instruments.
Should we think of a structure for those 2 pieces of instruments?

Based on the SEM structure in hyperspy...

├── Acquisition_instrument
│   ├── SEM
│   │   ├── Detector
│   │   │   ├── detector_type
│   │   │   └── EDS
│   │   │       ├── azimuth_angle (º)
│   │   │       ├── elevation_angle (º)
│   │   │       ├── energy_resolution_MnKa (eV)
│   │   │       ├── live_time (s)
│   │   │       └── real_time (s)
│   │   ├── beam_current (nA)
│   │   ├── beam_energy (keV)
│   │   ├── probe_area (nm²)
│   │   ├── convergence_angle (mrad)
│   │   ├── magnification
│   │   ├── microscope
│   │   ├── Stage
│   │   │   ├── rotation (º)
│   │   │   ├── tilt_alpha (º)
│   │   │   ├── tilt_beta (º)
│   │   │   ├── x (mm)
│   │   │   ├── y (mm)
│   │   │   └── z (mm)
│   │   └── working_distance (mm)

Here's an idea:

├── Acquisition_instrument
│   ├── SPECTROMETER
│   │   ├── model_name
│   │   ├── Grating
│   │   │   ├── groove_density (gr/mm)
│   │   │   ├── blazing_angle (º)
│   │   ├── central_wavelength (nm)
│   │   └── entrance_slit_width (mm)
│   └── CCD
│   │   ├── model_name
│   │   ├── life_time (ms)
│   │   ├── real_time (ms)
│   │   ├── binning
│   │   ├── signal_amplification (xN)
│   │   ├── readout_rate (MHz)
│   │   └── pixel_width (mm)

Please, copy this comment and add any other parameters that may be key for luminescence that I may be missing.

Then we can create documentation for the metadata structure.

from lumispy.

jlaehne avatar jlaehne commented on August 18, 2024

In think the parsing has to be in the io-reader, at least that seems to be how it is so far handled in HyperSpy. It should not be necessary to run an extra function. Alternative may be to add it in the init of the CLSpectrum, but then it would have to include support for a range of metadata formats in a single file. How do you do it for pyXEM @dnjohnstone ?

I second that we should agree on a common structure though.

The current implementation for DM files I chose is the following, depending on the detector configuration and what metadata Digital Micrograph saves. As it is a rather recent addition, if we agree on a different structure, it still can be adapted.

PMT spectrum:

├── Acquisition_instrument
│   └── CL
│       ├── acquisition_mode = Serial dispersive
│       ├── detector_type = linear
│       ├── dispersion_grating = 1200
│       ├── dwell_time = 1.0
│       ├── start_wavelength = 166.233642578125
│       └── step_size = 0.5

CCD spectrum:

├── Acquisition_instrument
│   ├── CL
│   │   ├── CCD
│   │   │   ├── binning = (1, 100)
│   │   │   ├── processing = Dark Subtracted
│   │   │   └── read_area = (0, 0, 100, 1336)
│   │   ├── acquisition_mode = Parallel dispersive
│   │   ├── central_wavelength = 949.9741821289062
│   │   ├── dispersion_grating = 300.0
│   │   ├── exposure = 30.0
│   │   ├── frame_number = 1
│   │   ├── integration_time = 30.0
│   │   └── saturation_fraction = 0.01861908845603466

Spectrum image:

├── Acquisition_instrument
│   ├── CL
│   │   ├── CCD
│   │   │   ├── binning = (1, 100)
│   │   │   ├── processing = Dark Subtracted
│   │   │   └── read_area = (0, 0, 100, 1336)
│   │   ├── SI
│   │   │   ├── drift_correction_periodicity = 1
│   │   │   ├── drift_correction_units = second(s)
│   │   │   └── mode = LineScan
│   │   ├── acquisition_mode = Parallel dispersive
│   │   ├── central_wavelength = 869.9838256835938
│   │   ├── dispersion_grating = 600.0
│   │   ├── exposure = 0.05000000074505806
│   │   ├── frame_number = 1
│   │   └── saturation_fraction <list>

from lumispy.

jlaehne avatar jlaehne commented on August 18, 2024

In principle, I prefer to have one general node for the CL system as it forms one acquisition instrument and include both the spectrometer and CCD/PMT related metadata there. My idea was to have what you call Spectrometer in the main CL node and the detector specific metadata in a sub-node. I'm happy to rename tings like 'dispersion_grating' to groove_density, I just took the names that Gatan used.

from lumispy.

jordiferrero avatar jordiferrero commented on August 18, 2024

I am not sure if having all these as a CL node makes sense. I was thinking a broader metadata that could also work for PL people. At the end of the day, the only difference in our setups and PL setups is the presence of a laser/light source as opposed to the electron beam. All the other elements are the same (spectrometer and detector). That's why I still think keeping them as separate units makes sense.

Having said that, the hyperspy way to do this is as you are suggesting. However, there should be a common CL structure regardless of the navigation axis (e.g. a structure that contains all the features from the 2 CL examples you have above).
If we do take this hyperspy metadata way, then PL and CL metadatas would have many common Detector sub-node (similar to the common EDS node for SEM and TEM in hyperspy).

While there would be some repetition, it may make more sense to stick to the hyperspy structure.

from lumispy.

jlaehne avatar jlaehne commented on August 18, 2024

Good point, CL/PL/EL would be encoded in Signal.signal_type anyway.

Along a similar line is this HyperSpy issue hyperspy/hyperspy#2095 - it proposes that SEM/TEM nodes should be merged to a microscope node, as they have similar structure and e.g. the acquisition_mode field or another extra field like microscope_type would just distinguish the type. Unfortunately, it has not seen much resonance.

from lumispy.

jlaehne avatar jlaehne commented on August 18, 2024

Still, how about having CCD/PMT as well as SI as subsets of spectrometer? The important thing I would say is the overall structure. Every instrument may still have variations of the available parameters.

The names should not contain spaces and special signs, so the units would have to be either following fixed convention, or they might be added as an additional metadata fiels such as life_time_units.

├── Acquisition_instrument
│   ├── Spectrometer
│   │   ├── model_name
│   │   └── acquisition_mode
│   │   ├── central_wavelength (nm)
│   │   └── entrance_slit_width (mm)
│   │   ├── Grating
│   │   │   ├── groove_density (gr/mm)
│   │   │   ├── blazing_angle (º)
│   │   ├── CCD
│   │   │   ├── model_name
│   │   │   ├── life_time (ms)
│   │   │   ├── real_time (ms)
│   │   │   ├── binning
│   │   │   ├── processing
│   │   │   ├── signal_amplification (xN)
│   │   │   ├── readout_rate (MHz)
│   │   │   └── pixel_width (mm)
│   │   ├── SI
│   │   │   ├── drift_correction_periodicity = 1
│   │   │   ├── drift_correction_units = second(s)
│   │   │   └── mode = LineScan

In case of PMT it would become e.g.:

├── Acquisition_instrument
│   ├── Spectrometer
│   │   ├── model_name
│   │   └── entrance_slit_width (mm)
│   │   └── exit_slit_width (mm)
│   │   └── acquisition_mode
│   │   ├── Grating
│   │   │   ├── groove_density (gr/mm)
│   │   │   ├── blazing_angle (º)
│   │   ├── PMT
│   │   │   ├── model_name
│   │   │   ├── dwell_time (ms)
│   │   │   ├── real_time (ms)
│   │   ├── start_wavelength = 166.233642578125
│   │   └── step_size = 0.5

from lumispy.

jlaehne avatar jlaehne commented on August 18, 2024

How do you distinguish between life_time and real_time is it automatically calculated? Gatan does not have such a distinction.

from lumispy.

jordiferrero avatar jordiferrero commented on August 18, 2024

How do you distinguish between life_time and real_time is it automatically calculated? Gatan does not have such a distinction.

Attolight provides us with the real_time which is slightly different than dwell_time due to binning.

I would keep units to a fixed convention (as hyperspy does).

I am happy to go ahead with a PR in which I will add a small guide on the structure of luminecence specific metadata.
Once we have that, what would be the next step? Change the IO-plugins? Or create an io-util which can parse the original_metadata to the structure we set? Not too sure

from lumispy.

jlaehne avatar jlaehne commented on August 18, 2024

Please do so.

Once we have the structure, I could adapt the DM reader to change the metadata structure it fills with the CL data accordingly (introduced in hyperspy/hyperspy#2590 - but as it is part of the dev-branch only so far, it does not hurt changing that again now).

The .sur reader only creates a minimalistic metadata object in the _build_metadata function so far. One could add the parsing there as well.

Though, in view of the extension framework that HyperSpy is aiming at, it might indeed be preferable to parse the metadata from the original metadata in LumiSpy instead of in the file readers. In that case, we would have to create a function _build_metadata in cl_spectrum.py which is called from an __init__ function that we would have to create for the CLSpectrum class. Any oppinions on which direction to head @ericpre @LMSC-NTappy ?

from lumispy.

LMSC-NTappy avatar LMSC-NTappy commented on August 18, 2024

Dear all,

I am happy to go ahead with a PR in which I will add a small guide on the structure of luminescence specific metadata.

That sounds great, if you could post the PR# on this thread once it's created that would be great, I'd be happy to participate in this discussion.

The .sur reader only creates a minimalistic metadata object in the _build_metadata function so far. One could add the parsing there as well.
Though, in view of the extension framework that HyperSpy is aiming at, it might indeed be preferable to parse the metadata from the original metadata in LumiSpy instead of in the file readers.

Both could work in my opinion. When coding, keep in mind that in a .sur file the data will be spectral only if s.original_metadata.Object_0_Channel_0.Header.H05_Object_Type is equal to 20 (Spectrum) or 21 (HYPCard), for a spectrum or a spectral image respectively. Then, the relevant experiment setup content in s.original_metadata.Object_0_Channel_0.Parsed is specific to the client using the .sur file format, and this is where it can get a bit tricky because nothing guarantees that the data inside obey a certain structure. The [CHANNELS,SCAN,SEM,...] structure is purely attolight's convention.

from lumispy.

ericpre avatar ericpre commented on August 18, 2024

Though, in view of the extension framework that HyperSpy is aiming at, it might indeed be preferable to parse the metadata from the original metadata in LumiSpy instead of in the file readers. In that case, we would have to create a function _build_metadata in cl_spectrum.py which is called from an __init__ function that we would have to create for the CLSpectrum class. Any oppinions on which direction to head @ericpre @LMSC-NTappy ?

I have been wondering what to do with the metadata for some time... Parsing the metadata at the class initialisation is indeed an interesting idea, but it has the following drawback:

  • the mapping of original_metadata to the metadata needs to be maintained in the signal class and this mapping is mainly reader dependent: for a specific signal, you can read different format, which means different mapping.
  • Even if this is a good pattern, not all readers use the mapping from original_metadata to metadata
  • this may add some undesirable overhead... when creating many signals with large original_metadata - see hyperspy/hyperspy#2623.

Since the metadata are mainly reader dependent, it makes sense to keep the mapping in the reader itself. The issue is how to define the metadata in a consistent manner and centralise this information in one place! What @jlaehne suggested made me think that the metadata could be define in the signal definition in hyperspy_extension.yaml

For example, the metadata of EDS signal could be define as:

  EDSSEMSpectrum:
    signal_type: EDS_TEM
    signal_dimension: 1
    dtype: real
    lazy: False
    module: hyperspy._signals.eds_tem
    metadata:
      Acquisition_instrument:
        SEM:
          Detector:
            EDS:
              azimuth_angle:
                 - type: float
                 - units: º
                 - description: The azimuth angle of the detector in degree. If the azimuth is zero, the detector is perpendicular to the tilt axis.
              elevation_angle:
                 - type: float
                 - units: º
                 - description: The elevation angle of the detector in degree. The detector is perpendicular to the surface with an angle of 90.

This would have the following advantage:

  • each library can define its metadata in a consistent manner
  • it would be easy to parse this information and centralise it in a single place and run a daily cron job to build the documentation of release and development version

I think that the keep point is to have something in place which make it easy to find what metadata already exist in order to encourage interoperability and consistency.

from lumispy.

jlaehne avatar jlaehne commented on August 18, 2024

That sounds like a fair strategy. I agree with your arguments favoring the reader. Parsing it in the signal definition could not be universal either, but would have to include specific variations for every type of format, because manufacturers do not follow any type of systematic scheme.

At the same time, defining the metadata scheme per signal type in hyperspy_extensions.yaml gives the necessary flexibility to extensions without having all of that in HyperSpy directly, which would counteract the extensions scheme - the readers can then follow these formats for the signal types they suppport.

from lumispy.

jlaehne avatar jlaehne commented on August 18, 2024

When thinking further about this, many signals will share parts of the metadata structure and we need mechanisms how to easily ensure consistency. So I was just wondering about possibilities of inheritance of metadata and defining blocks of metadata that can be imported in a signal. So a kind of modular system. Any idea how to go about that @ericpre?

For example, in LumiSpy, all signal types will share metadata about the spectrometer + metadata of detectors (CCD/PMT/streak camera), while the CL signals will also need SEM/TEM metadata.

To this end, it could also be easier to have the EDS/EELS metadata one level higher in the hierarchy (not as a subnode of SEM/TEM).

from lumispy.

ericpre avatar ericpre commented on August 18, 2024

I was thinking that the most common one should be defined in base classes, for example, the stage metadata could go in the BaseSignal definition.
But maybe, it is better not to tight it up with the signal definition then... and simply define the metadata at the root of the hyperspy_extensions.yaml file? I don't think that there is any benefit to have the metadata defined with the signal. It actually makes the signal definition less readable...

With a separate metadata entry in the hyperspy_extensions.yaml file:

signals:
  MySignal:
    ...

components1D:
  ...

component2D:
  ...

metadata:
  Acquisition_instrument:
    Detector:
      EDS:
        azimuth_angle:
          - type: float
          - units: º
          - description: The azimuth angle of the detector in degree. If the azimuth is zero...
        elevation_angle:
          - type: float
          - units: º
          - description: The elevation angle of the detector in degree. The detector is...
  ...

The most common metadata could go in the hyperspy repository.

from lumispy.

francisco-dlp avatar francisco-dlp commented on August 18, 2024

I think that the cleanest way of tackling the "metadata babel" issue is fully separating the metadata translation from hyperspy, the extensions and even the readers. I am thinking on a sort of pandoc for metadata. Pandoc defines its own markdown specification and includes bidirectional translators for many file formats. We could do the same: a package that translates metadata from (and to) any supported format to a standard specification. In this way, the reader's role is simply reading the metadata, nothing else: the translation is outsourced to a separate package with its own (fast) release cycle and versioning.

In order to maximize the contributions to the standard specification and the mapping between the different formats, we should think on how to simplify the task of amending them. Ideally it should not require any programming skills. I like @ericpre's yaml proposal above. However, the file may easily get uncomfortably big and complex if we put everything in one place. So what about recreating the tree structure using folders? In this way the specification could consist on directories and yaml files. In @ericpre example above, Acquisition_instrument would be a folder that would contain Detector (folder), that itself contains EDS that contains azimuth_angle.yaml. That file could also contain the mappings to other metadata structures. This is of course inefficient for the translation task, but we could then "compile" those files into an efficient mapping automatically using CI to get everybody (humans and machines) happy.

What do you think?

from lumispy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.