Git Product home page Git Product logo

essnmx's Introduction

Contributor Covenant PyPI badge Anaconda-Server Badge License: BSD 3-Clause

ESSnmx

About

Data reduction for NMX at the European Spallation Source.

Installation

python -m pip install essnmx

essnmx's People

Contributors

yoosunyoung avatar jokasimr avatar simonheybrock avatar justin-bergmann avatar dependabot[bot] avatar mriduls avatar pre-commit-ci-lite[bot] avatar jl-wynen avatar nvaytet avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

mriduls

essnmx's Issues

Save results into nexus format.

The result of first data reduction step should be saved into nexus format to be fed to the next process.

It can be implemented once #15 is finished.

Required Fields

  • Sample name
  • Distance from source to sample
  • Other source information
  • Detector origen(?)
  • Fast axes (in beamline frame(?))
  • Slow axes (in beamline frame(?))
  • Proton charge (?)
  • Detector counts (grouped weights/counts, gzip compressed)
  • Time bins (in [ms])
  • Pixel IDs
  • Crystal Rotation

Questions

  • Is there anything can be already done in the first reduction step instead...?
  • Many fields used to be hard-coded and not straightforward where to retrieve them from the McStas file.
    Should users be able to update them manually some how...?

McStas 2.7 and 3.4 version auto-detect in the loader.

The files are almost same, but there are some differences that we should handle.

McStas 2.7 McStas 3.4
Event Data All in one bank N banks
Geometry May have to strip some blanks to parse numbers Tidy format, hence parsing may be easier
Version Info May not be available, or should be available in the root attrs, creator Should be available in the root attrs, creator

Geometry files look different due to the tidier format,
but the contents and the structure are the same, so we can keep the 2.7 version for 3.4.

Here are the example geometry xml extracted from sample files:
mcstas_geometry.zip

TODO:

  • fix #26 together Couldn't fix it after all...! We can fix it now..!

Compression options for saving output.

NMXReducedData keeps pixel id coordinate and there are >1e6 pixels
so it could carry relatively large data (>GB) in the end.

Therefore we would like to compress the large-size Dataset when we save it into h5 or nexus file.
The compression option is now hard-coded, but it makes saving very slow.
Some advanced users, i.e. IDS might want to turn it off when they have to debug the workflow.

Therefore we should

  • Find the optimal compression options comparing the advantages between speed and size of the output.
  • Make it possible to lower the compression option, with the optimal compression by default.
  • Document about compression in the workflow user-guide.

Memory monitoring.

NMX uses relatively large amount of raw data, so we might need to have an option to monitor/log memory consumption.
i.e. The original script from the owner was using tracemalloc to monitor memory.

It can probably be done in a similar way to the beamlime benchmarking...?

Select subset of data by detector names(or ID).

Related to #31.

From McStas 3 data, the event dataset is stored in separate data bank.
Therefore it is possible to load individual bank per detector.
For real measurement data, the data bank will be separate per detector as well.

For now, it is loading all banks at once (McStas 2) or merge all banks after loading (McStas 3).
This feature will be implemented once we have an general agreement
about how to handle detector also in other ess packages.

One of ideas was to use sciline param table.

Check accidental overriding.

The nmx domain type NMXData is not made by typing.NewType and directly inherits scipp.DataGroup.
It has some properties as convenient handles for some fields, such as crystal_rotation or sample_position.
Therefore, we could use DataGroup html repr in the jupyter lab and also have type hints from those type-hinted properties.
Current implementation assumes that the names of the properties are all NMX specific, therefore they don't override any methods of scipp.DataGroup.
But maybe there should be some simple checks if they are not actually overriding any methods.

This is the solution I came up with as implementing other things.

from typing import TypeVar

T = TypeVar("T")

def nooverriding(cls: T) -> T:
    """Decorator to prevent overriding a method in a subclass.
    
    It does not check for private methods (those starting with '_').
    """
    for parent in cls.__bases__:
        preserved_names = [
            name for name in dir(parent) if not name.startswith('_')
        ]
        for name in preserved_names:
            if not getattr(parent, name) is getattr(cls, name):
                raise ValueError(f"{parent}.{name} is overridden in {cls}")

    return cls


def test_no_overriding() -> None:
    class A:
        def foo(self):
            return 'A.foo'
    
    @nooverriding
    class AA(A):
        def foo2(self):
            return 'AA.foo2'
    
    a = AA()
    assert AA.foo is A.foo
    assert a.foo() == 'A.foo'
    assert a.foo2() == 'AA.foo2'

    
def test_no_overriding_raises() -> None:
    class A:
        def foo(self):
            return 'A.foo'

    with pytest.raises(ValueError):
        @nooverriding
        class AAA(A):
            def foo(self):
                return 'AAA.foo'

Handle banks via workflow graph instead of via extra dimension

The current McStas loader loads all banks and concats the events,

def _retrieve_raw_event_data(file: snx.File) -> sc.Variable:
"""Retrieve events from the nexus file."""
bank_names = _retrieve_event_list_names(file["entry1/data"].keys())
banks = [
file["entry1/data/" + bank_name]["events"][()].rename_dims({'dim_0': 'event'})
# ``dim_0``: event index, ``dim_1``: property index.
for bank_name in bank_names
]
return sc.concat(banks, 'event')

subsequently folding after grouping by pixel
return grouped.fold(dim='id', sizes={'panel': num_panels, 'id': -1})

  • Change this so only a single bank is handled in the code (loading, reduction steps).
  • Multiple banks should be handled via Sciline, i.e., in the workflow graph. This will, e.g., allow for processing only a single bank at a time, saving memory.
    • We can use a param table to define the detector banks that should be processed.
  • Saving to NeXus may need to change slightly. Need to check with the IDS.
    • I do not understand the current code, the panel dimension is first removed in
      counts: sc.DataArray = nmx_data.weights.flatten(dims=['panel', 'id'], to='id').hist(
      t=time_bin_step
      )
      and then a panel dim of length 1 is added back into the dataset when saving, in
      var=self.counts.fold(
      'id', sizes={'panel': 1, 'id': self.counts.sizes['id']}
      ),
      .

Geometric information parsing (McStas loader mainly)

NMX data loader should be able to retrieve...

  • detector rotation (x, y, z)
  • detector position
  • relative (x, y, z) list (?)
  • source position
  • sample position
  • fast vector list
  • slow vector list

In the McStas simulation file, there is a h5 group instrument/instrument_xml/ that contains geometry information of the instruments.
McStas loader should be able to parse the xml information to retrieve the information above.

TODO:

  • Compare instrument view with mantid

[Requirement] Scaling/normalization/correction workflow

Executive summary

We need to derive and apply scaling functions per wavelength and per position

Context and background knowledge

There is scaling step in the whole workflow.
And it could be done by LSCALE,
But LSCALE is not maintained anymore so we need to implement this part ourselves.

It is for the scaling and normalization of Laue intensity data to yield fully corrected structure amplitudes. 1

Inputs

MTZ files that contain

  • HKL
  • LAMBDA
  • Intensity
  • Sigma of Intensity

Methodology

This functionality was under development in pyscale and there is a notebook/script in the repository.

Outputs

MTZ file containing ...

Which interfaces are required?

Python module / function, Python script, Jupyter notebook

Test cases

Also available in pyscale with sample data.

Comments

pyscale is not open so you'll need to ask for an access to https://github.com/mlund/

Footnotes

  1. LSCALE โ†ฉ

nxmx format for NMX data

Executive summary

NMX Data output should be in the nexus-nxmx format

Context and background knowledge

In order to comply with modern data handling and archiving standards for macromolecular crystallography (MX) data, the data output should be in the nexus nxmx format (https://manual.nexusformat.org/classes/applications/NXmx.html). This format is already used at most synchrotron X-ray MX beamlines around the world, and NMX should comply with these standards as much as is possible. The nxmx format is generic to source and is inclusive of neutron MX data. This format is part of the MX "Gold standard" for data, adopted in 2020 after a long deliberation process. https://journals.iucr.org/m/issues/2020/05/00/ti5018/index.html

Inputs

All data properties are detailed here: https://manual.nexusformat.org/classes/applications/NXmx.html

Note that not all fields are required, and is experiment-specific. Others that are labeled "recommended" are absolutely necessary for NMX data processing. Determining which are required for NMX data is ongoing.

Methodology

For the most part, this will require accessing data from e.g. EPICS PVs and parsing them and putting them in the correct format. Some calculations might be required for determination of orientation- this is in progress.

Outputs

Data arranged in the correct format (see above).

Which interfaces are required?

Integrated into reduction workflow, Python module / function

Test cases

We have simulation data from McStas that we can use for testing.

Comments

No response

Space group description overrides info from files?

Docstrings says

spacegroup_desc:
The space group description to use if not found in the mtz files.
If None, :attr:`~DEFAULT_SPACE_GROUP_DESC` is used.
but
if spacegroup_desc is not None: # Use the provided space group description
return SpaceGroup(gemmi.SpaceGroup(spacegroup_desc))
elif len(space_groups) > 1:
raise ValueError(f"Multiple space groups found: {space_groups}")
elif len(space_groups) == 1:
return SpaceGroup(space_groups.popitem()[1])
else:
raise ValueError(
"No space group found and no space group description provided."
)
seems to override any info from files with the provided description.

Furthermore, the clean that DEFAULT_SPACE_GROUP_DESC is used is incorrect.

If I naively change this to match what the docstring claims, I get ValueError: Multiple space groups found: [<gemmi.SpaceGroup("P 21 21 21")>, <gemmi.SpaceGroup("P 21 21 21")>, <gemmi.SpaceGroup("P 21 21 21")>, <gemmi.SpaceGroup("P 21 21 21")>, <gemmi.SpaceGroup("P 21 21 21")>] when running the docs workflow, I suppose since all files have the same space group.

Docs build fails with instrument view.

Instrument view for NMX is not needed right now but it was useful for mcstas xml parser,
so I tried to add it in the documentation as well.

But the document build failed due to the output.
It fails complaining about the number of lines. It seems to be related to this issue.

ERROR: Line 404 exceeds the line-length-limit.
WARNING: Each notebook should have at least one section title

It leads to the failure of linking the notebook page to the main page, since it couldn't parse the title of the notebook.
It happened locally and also in the CI build. (The docs build test itself doesn't fail but the notebook is not included in the main page.)

scippneutron doesn't have this issue so there must be a solution.
For now the instrument view is included as a code snippet, not as a cell.

Validate exported nexus file.

essnmx should save reduced data as a nexus file for other analysis software.
This feature is done by #21 but it does not check if the nexus file is valid or not.
It should have a test or an option to check if the result file is valid or not.
Maybe chexus can be used here.

It has relatively simple structure, so it shouldn't be so difficult to implement.

Update diagram to mermaid or latex.

The diagram of higher-level workflow design is now in png format.
But since it needs some memos and annotations, regarding the status of some of software, we'd better keep them more editable.

The IDS owner also said the mermaid is fine but I had trouble make it work in the documentation build framework...
There is a latex code for this diagram so we can probably use latex instead.

Latex Version

NMX_work_flow

Mermaid Version

stateDiagram
    direction TB
    Step0: Starting Experiment
    Step1: Setting up Measurement Strategy
    Step0 --> Step1
    state ECDC {
        Step2: Data Readout from Detectors
    }
    Step1 --> ECDC
    state SCIPP {
        Step3.0: TOA to TOF
    }
    ECDC --> SCIPP
    state DIALS {
        direction LR
        Step3.1: 1. Spot Finding
        Step3.2: 2. TOF to &lambda
        Step3.3: 3. Indexing
        Step3.4: 4. Refine Indexing
        Step3.5: 5. Spot Integration
        Step3.1 --> Step3.2
        Step3.2 --> Step3.3
        Step3.3 --> Step3.4
        Step3.4 --> Step3.5
    }
    SCIPP --> DIALS
    state pyscale: pyscale {
        Step3.6: Scaling (temporarily using LSCALE)
    }
    DIALS --> pyscale
    state CCP4 {
        direction LR
        Step3.7: Merging, AIMLESS
        Step3.8: I to SFs
        Step3.7 --> Step3.8
    }
    pyscale --> CCP4
    state PHENIX {
        direction LR
        Step4.0: Phasing
        Step4.0 --> Step4.1
        state MODELCOMPLETED {
            state refine <<choice>>
            state map <<choice>>
            Step4.1: Model Completion
            Step4.2: Refinement
            Step4.3: MapCalculation
            Step4.4: Done
            Step4.1 --> Step4.2
            Step4.2 --> refine
            refine --> Step4.4
            refine --> Step4.3
            Step4.3 --> map
            map --> Step4.4
            map --> Step4.1
        }
    }
    CCP4 --> PHENIX
    Step5: Finished Experiment
    PHENIX --> Step5
Loading

Additional explanation of the last step.

The last iteration between model-completion, refinement and map calculation is more clear in the mermaid diagram, but the if-node may need more clear annotation.

Document whole data reduction steps.

NMX data is partially reduced with scipp and the rest of the steps are done by external software.
Still we would like to have all the information in one place.

Here are some information that IDS(Justin) provided:

Also received slides about these steps through the slack channel : D

TODO

  • Use pipeline html wrapper of sciline(>24.1.1) instead of parsing the function signature...!

Loaders for simulation data and measurement data.

There is a slight difference between simulation data and measurement data
In the measurement data, event data is provided per detector and simulation data has all event data in one place.
So for the measurement data, there is one more step to combine data sets from all detectors.

We also need to make small mcstas simulation data for testing.

Add unit-tests for scaling workflow steps.

The scaling workflow steps and test datasets are still being investigated and updated relatively frequently.
We have tests for file IO functionalities, but we still don't have any tests for each steps.
Once we fix mandatory steps, we shall write proper tests for them.
I'm opening this issue so that we don't forget....

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.