scipp / essnmx Goto Github PK

View Code? Open in Web Editor NEW

0.0 5.0 1.0 1.95 MB

Data reduction for NMX at the European Spallation Source

Home Page: https://scipp.github.io/essnmx/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

essnmx's Introduction

ESSnmx

About

Data reduction for NMX at the European Spallation Source.

Installation

python -m pip install essnmx

essnmx's People

Contributors

Watchers

Forkers

mriduls

essnmx's Issues

Save results into nexus format.

The result of first data reduction step should be saved into nexus format to be fed to the next process.

It can be implemented once #15 is finished.

Required Fields

Questions

Is there anything can be already done in the first reduction step instead...?
Many fields used to be hard-coded and not straightforward where to retrieve them from the McStas file.
Should users be able to update them manually some how...?

McStas 2.7 and 3.4 version auto-detect in the loader.

The files are almost same, but there are some differences that we should handle.

	McStas 2.7	McStas 3.4
Event Data	All in one bank	`N` banks
Geometry	May have to strip some blanks to parse numbers	Tidy format, hence parsing may be easier
Version Info	May not be available, or should be available in the root attrs, `creator`	Should be available in the root attrs, `creator`

Geometry files look different due to the tidier format,
but the contents and the structure are the same, so we can keep the 2.7 version for 3.4.

Here are the example geometry xml extracted from sample files:
mcstas_geometry.zip

TODO:

fix #26 together ~~Couldn't fix it after all...!~~ We can fix it now..!

Compression options for saving output.

NMXReducedData keeps pixel id coordinate and there are >1e6 pixels
so it could carry relatively large data (>GB) in the end.

Therefore we would like to compress the large-size Dataset when we save it into h5 or nexus file.
The compression option is now hard-coded, but it makes saving very slow.
Some advanced users, i.e. IDS might want to turn it off when they have to debug the workflow.

Therefore we should

Find the optimal compression options comparing the advantages between speed and size of the output.
Make it possible to lower the compression option, with the optimal compression by default.
Document about compression in the workflow user-guide.

Memory monitoring.

NMX uses relatively large amount of raw data, so we might need to have an option to monitor/log memory consumption.
i.e. The original script from the owner was using tracemalloc to monitor memory.

It can probably be done in a similar way to the beamlime benchmarking...?

Guidelines - performance (P1)

Investigate if the workflow(s) are performant enough to handle the event rates expected for this technique.

See P1 in the guidelines

Select subset of data by detector names(or ID).

Related to #31.

From McStas 3 data, the event dataset is stored in separate data bank.
Therefore it is possible to load individual bank per detector.
For real measurement data, the data bank will be separate per detector as well.

For now, it is loading all banks at once (McStas 2) or merge all banks after loading (McStas 3).
This feature will be implemented once we have an general agreement
about how to handle detector also in other ess packages.

One of ideas was to use sciline param table.

Only allow keyword arguments in ``load_mcstas_nexus``.

Loader accepts many arguments, so it'd better accept keyword arguments only.
Once sciline is released with the keyword arguments feature,
we can turn the loader into keyword-only function.

Check accidental overriding.

The nmx domain type NMXData is not made by typing.NewType and directly inherits scipp.DataGroup.
It has some properties as convenient handles for some fields, such as crystal_rotation or sample_position.
Therefore, we could use DataGroup html repr in the jupyter lab and also have type hints from those type-hinted properties.
Current implementation assumes that the names of the properties are all NMX specific, therefore they don't override any methods of scipp.DataGroup.
But maybe there should be some simple checks if they are not actually overriding any methods.

This is the solution I came up with as implementing other things.

from typing import TypeVar

T = TypeVar("T")

def nooverriding(cls: T) -> T:
    """Decorator to prevent overriding a method in a subclass.
    
    It does not check for private methods (those starting with '_').
    """
    for parent in cls.__bases__:
        preserved_names = [
            name for name in dir(parent) if not name.startswith('_')
        ]
        for name in preserved_names:
            if not getattr(parent, name) is getattr(cls, name):
                raise ValueError(f"{parent}.{name} is overridden in {cls}")

    return cls


def test_no_overriding() -> None:
    class A:
        def foo(self):
            return 'A.foo'
    
    @nooverriding
    class AA(A):
        def foo2(self):
            return 'AA.foo2'
    
    a = AA()
    assert AA.foo is A.foo
    assert a.foo() == 'A.foo'
    assert a.foo2() == 'AA.foo2'

    
def test_no_overriding_raises() -> None:
    class A:
        def foo(self):
            return 'A.foo'

    with pytest.raises(ValueError):
        @nooverriding
        class AAA(A):
            def foo(self):
                return 'AAA.foo'

Handle banks via workflow graph instead of via extra dimension

The current McStas loader loads all banks and concats the events,

essnmx/src/ess/nmx/mcstas_loader.py

Lines 50 to 60 in c5a2520

 def _retrieve_raw_event_data(file: snx.File) -> sc.Variable: 

 """Retrieve events from the nexus file.""" 

 bank_names = _retrieve_event_list_names(file["entry1/data"].keys()) 

 banks = [ 

 file["entry1/data/" + bank_name]["events"][()].rename_dims({'dim_0': 'event'}) 

 # ``dim_0``: event index, ``dim_1``: property index. 

 for bank_name in bank_names 

 ] 

 return sc.concat(banks, 'event')

subsequently folding after grouping by pixel

essnmx/src/ess/nmx/mcstas_loader.py

Line 134 in c5a2520

return grouped.fold(dim='id', sizes={'panel': num_panels, 'id': -1})

Change this so only a single bank is handled in the code (loading, reduction steps).
Multiple banks should be handled via Sciline, i.e., in the workflow graph. This will, e.g., allow for processing only a single bank at a time, saving memory.
- We can use a param table to define the detector banks that should be processed.
Saving to NeXus may need to change slightly. Need to check with the IDS.
- I do not understand the current code, the panel dimension is first removed in
  
  essnmx/src/ess/nmx/reduction.py
  
  Lines 245 to 247 in c5a2520
  
  counts: sc.DataArray = nmx_data.weights.flatten(dims=['panel', 'id'], to='id').hist(
  
  t=time_bin_step
  
  )
  
  and then a panel dim of length 1 is added back into the dataset when saving, in
  
  essnmx/src/ess/nmx/reduction.py
  
  Lines 151 to 153 in c5a2520
  
  var=self.counts.fold(
  
  'id', sizes={'panel': 1, 'id': self.counts.sizes['id']}
  
  ),
  
  .

Geometric information parsing (McStas loader mainly)

NMX data loader should be able to retrieve...

detector rotation (x, y, z)
detector position
relative (x, y, z) list (?)
source position
sample position
fast vector list
slow vector list

In the McStas simulation file, there is a h5 group instrument/instrument_xml/ that contains geometry information of the instruments.
McStas loader should be able to parse the xml information to retrieve the information above.

TODO:

Compare instrument view with mantid

[Requirement] Scaling/normalization/correction workflow

Executive summary

We need to derive and apply scaling functions per wavelength and per position

Context and background knowledge

There is scaling step in the whole workflow.
And it could be done by LSCALE,
But LSCALE is not maintained anymore so we need to implement this part ourselves.

It is for the scaling and normalization of Laue intensity data to yield fully corrected structure amplitudes. ¹

Inputs

MTZ files that contain

HKL
LAMBDA
Intensity
Sigma of Intensity

Methodology

This functionality was under development in pyscale and there is a notebook/script in the repository.

Outputs

MTZ file containing ...

Which interfaces are required?

Python module / function, Python script, Jupyter notebook

Test cases

Also available in pyscale with sample data.

Comments

pyscale is not open so you'll need to ask for an access to https://github.com/mlund/

LSCALE ↩

nxmx format for NMX data

Executive summary

NMX Data output should be in the nexus-nxmx format

Context and background knowledge

In order to comply with modern data handling and archiving standards for macromolecular crystallography (MX) data, the data output should be in the nexus nxmx format (https://manual.nexusformat.org/classes/applications/NXmx.html). This format is already used at most synchrotron X-ray MX beamlines around the world, and NMX should comply with these standards as much as is possible. The nxmx format is generic to source and is inclusive of neutron MX data. This format is part of the MX "Gold standard" for data, adopted in 2020 after a long deliberation process. https://journals.iucr.org/m/issues/2020/05/00/ti5018/index.html

Inputs

All data properties are detailed here: https://manual.nexusformat.org/classes/applications/NXmx.html

Note that not all fields are required, and is experiment-specific. Others that are labeled "recommended" are absolutely necessary for NMX data processing. Determining which are required for NMX data is ongoing.

Methodology

For the most part, this will require accessing data from e.g. EPICS PVs and parsing them and putting them in the correct format. Some calculations might be required for determination of orientation- this is in progress.

Outputs

Data arranged in the correct format (see above).

Which interfaces are required?

Integrated into reduction workflow, Python module / function

Test cases

We have simulation data from McStas that we can use for testing.

Comments

No response

Migrate to new sciline (replace `sciline.Series` usage with new mechanism)

When sciline is released we can also merge #67 .

The code that uses sciline.Series should be replaced with map from new sciline.

Space group description overrides info from files?

Docstrings says

essnmx/src/ess/nmx/mtz_io.py

Lines 173 to 175 in 611c89a

  spacegroup_desc: 

  The space group description to use if not found in the mtz files. 

  If None, :attr:`~DEFAULT_SPACE_GROUP_DESC` is used.

but

essnmx/src/ess/nmx/mtz_io.py

Lines 194 to 203 in 611c89a

 if spacegroup_desc is not None: # Use the provided space group description 

 return SpaceGroup(gemmi.SpaceGroup(spacegroup_desc)) 

 elif len(space_groups) > 1: 

 raise ValueError(f"Multiple space groups found: {space_groups}") 

 elif len(space_groups) == 1: 

 return SpaceGroup(space_groups.popitem()[1]) 

 else: 

 raise ValueError( 

 "No space group found and no space group description provided." 

 )

seems to override any info from files with the provided description.

Furthermore, the clean that DEFAULT_SPACE_GROUP_DESC is used is incorrect.

If I naively change this to match what the docstring claims, I get ValueError: Multiple space groups found: [<gemmi.SpaceGroup("P 21 21 21")>, <gemmi.SpaceGroup("P 21 21 21")>, <gemmi.SpaceGroup("P 21 21 21")>, <gemmi.SpaceGroup("P 21 21 21")>, <gemmi.SpaceGroup("P 21 21 21")>] when running the docs workflow, I suppose since all files have the same space group.

Docs build fails with instrument view.

Instrument view for NMX is not needed right now but it was useful for mcstas xml parser,
so I tried to add it in the documentation as well.

But the document build failed due to the output.
It fails complaining about the number of lines. It seems to be related to this issue.

ERROR: Line 404 exceeds the line-length-limit.
WARNING: Each notebook should have at least one section title

It leads to the failure of linking the notebook page to the main page, since it couldn't parse the title of the notebook.
It happened locally and also in the CI build. (The docs build test itself doesn't fail but the notebook is not included in the main page.)

scippneutron doesn't have this issue so there must be a solution.
For now the instrument view is included as a code snippet, not as a cell.

Validate exported nexus file.

essnmx should save reduced data as a nexus file for other analysis software.
This feature is done by #21 but it does not check if the nexus file is valid or not.
It should have a test or an option to check if the result file is valid or not.
Maybe chexus can be used here.

It has relatively simple structure, so it shouldn't be so difficult to implement.

Release ESSnmx

Is this blocked by anything, or can we go ahead?

Add `loss` selection for fitting(in the scaling step)

@jokasimr suggested trying different loss functions for fitting.
Maybe it'll be convenient if a user can choose loss functions as well.

Update diagram to mermaid or latex.

The diagram of higher-level workflow design is now in png format.
But since it needs some memos and annotations, regarding the status of some of software, we'd better keep them more editable.

The IDS owner also said the mermaid is fine but I had trouble make it work in the documentation build framework...
There is a latex code for this diagram so we can probably use latex instead.

Latex Version

Mermaid Version

stateDiagram
    direction TB
    Step0: Starting Experiment
    Step1: Setting up Measurement Strategy
    Step0 --> Step1
    state ECDC {
        Step2: Data Readout from Detectors
    }
    Step1 --> ECDC
    state SCIPP {
        Step3.0: TOA to TOF
    }
    ECDC --> SCIPP
    state DIALS {
        direction LR
        Step3.1: 1. Spot Finding
        Step3.2: 2. TOF to &lambda
        Step3.3: 3. Indexing
        Step3.4: 4. Refine Indexing
        Step3.5: 5. Spot Integration
        Step3.1 --> Step3.2
        Step3.2 --> Step3.3
        Step3.3 --> Step3.4
        Step3.4 --> Step3.5
    }
    SCIPP --> DIALS
    state pyscale: pyscale {
        Step3.6: Scaling (temporarily using LSCALE)
    }
    DIALS --> pyscale
    state CCP4 {
        direction LR
        Step3.7: Merging, AIMLESS
        Step3.8: I to SFs
        Step3.7 --> Step3.8
    }
    pyscale --> CCP4
    state PHENIX {
        direction LR
        Step4.0: Phasing
        Step4.0 --> Step4.1
        state MODELCOMPLETED {
            state refine <<choice>>
            state map <<choice>>
            Step4.1: Model Completion
            Step4.2: Refinement
            Step4.3: MapCalculation
            Step4.4: Done
            Step4.1 --> Step4.2
            Step4.2 --> refine
            refine --> Step4.4
            refine --> Step4.3
            Step4.3 --> map
            map --> Step4.4
            map --> Step4.1
        }
    }
    CCP4 --> PHENIX
    Step5: Finished Experiment
    PHENIX --> Step5

Additional explanation of the last step.

The last iteration between model-completion, refinement and map calculation is more clear in the mermaid diagram, but the if-node may need more clear annotation.

Document whole data reduction steps.

NMX data is partially reduced with scipp and the rest of the steps are done by external software.
Still we would like to have all the information in one place.

Here are some information that IDS(Justin) provided:

Also received slides about these steps through the slack channel : D

TODO

Use pipeline html wrapper of sciline(>24.1.1) instead of parsing the function signature...!

Guidelines - Preserve precision of input data (S6)

See S.6 in guidelines.

Investigate if this is a problem, and if it is, fix.

Loaders for simulation data and measurement data.

There is a slight difference between simulation data and measurement data
In the measurement data, event data is provided per detector and simulation data has all event data in one place.
So for the measurement data, there is one more step to combine data sets from all detectors.

We also need to make small mcstas simulation data for testing.

Data reduction workflow skeleton for NMX.

NMX already has a working script for data reduction written by the IDS owner(Justin).
We can build workflow with sciline from the script.

Guidelines adherance

Check adherence to https://scipp.github.io/essreduce/user-guide/reduction-workflow-guidelines.html.

Without having look closely, I think the main changes that will be required are (1) handle each detector bank separately, (2) add NeXus loaders (based on scipp/essreduce#8), and (3) eliminating data structures that were following McStas specifics.

Task for now: Survey the package and list what needs to be changed.

Use ``Pipeline._repr_html_`` after dropping python3.9

Once we drop python3.9, we can use Pipeline._repr_html_
Currently it's broken since it is not showing the real name of the NewType in python3.9

Add unit-tests for scaling workflow steps.

The scaling workflow steps and test datasets are still being investigated and updated relatively frequently.
We have tests for file IO functionalities, but we still don't have any tests for each steps.
Once we fix mandatory steps, we shall write proper tests for them.
I'm opening this issue so that we don't forget....

	def _retrieve_raw_event_data(file: snx.File) -> sc.Variable:
	"""Retrieve events from the nexus file."""
	bank_names = _retrieve_event_list_names(file["entry1/data"].keys())

	banks = [
	file["entry1/data/" + bank_name]["events"][()].rename_dims({'dim_0': 'event'})
	# ``dim_0``: event index, ``dim_1``: property index.
	for bank_name in bank_names
	]

	return sc.concat(banks, 'event')

	counts: sc.DataArray = nmx_data.weights.flatten(dims=['panel', 'id'], to='id').hist(
	t=time_bin_step
	)

	var=self.counts.fold(
	'id', sizes={'panel': 1, 'id': self.counts.sizes['id']}
	),

	spacegroup_desc:
	The space group description to use if not found in the mtz files.
	If None, :attr:`~DEFAULT_SPACE_GROUP_DESC` is used.

	if spacegroup_desc is not None: # Use the provided space group description
	return SpaceGroup(gemmi.SpaceGroup(spacegroup_desc))
	elif len(space_groups) > 1:
	raise ValueError(f"Multiple space groups found: {space_groups}")
	elif len(space_groups) == 1:
	return SpaceGroup(space_groups.popitem()[1])
	else:
	raise ValueError(
	"No space group found and no space group description provided."
	)

scipp / essnmx Goto Github PK

essnmx's Introduction

ESSnmx

About

Installation

essnmx's People

Contributors

Watchers

Forkers

essnmx's Issues

Required Fields

Questions

Executive summary

Context and background knowledge

Inputs

Methodology

Outputs

Which interfaces are required?

Test cases

Comments

Footnotes

Executive summary

Context and background knowledge

Inputs

Methodology

Outputs

Which interfaces are required?

Test cases

Comments

Latex Version

Mermaid Version

Additional explanation of the last step.

Recommend Projects

Recommend Topics

Recommend Org