Data reduction for NMX at the European Spallation Source.
python -m pip install essnmx
Data reduction for NMX at the European Spallation Source
Home Page: https://scipp.github.io/essnmx/
License: BSD 3-Clause "New" or "Revised" License
The result of first data reduction step should be saved into nexus format to be fed to the next process.
It can be implemented once #15 is finished.
The files are almost same, but there are some differences that we should handle.
McStas 2.7 | McStas 3.4 | |
---|---|---|
Event Data | All in one bank | N banks |
Geometry | May have to strip some blanks to parse numbers | Tidy format, hence parsing may be easier |
Version Info | May not be available, or should be available in the root attrs, creator |
Should be available in the root attrs, creator |
Geometry files look different due to the tidier format,
but the contents and the structure are the same, so we can keep the 2.7 version for 3.4.
Here are the example geometry xml extracted from sample files:
mcstas_geometry.zip
TODO:
NMXReducedData
keeps pixel id coordinate and there are >1e6 pixels
so it could carry relatively large data (>GB) in the end.
Therefore we would like to compress the large-size Dataset
when we save it into h5
or nexus
file.
The compression option is now hard-coded, but it makes saving very slow.
Some advanced users, i.e. IDS might want to turn it off when they have to debug the workflow.
Therefore we should
NMX uses relatively large amount of raw data, so we might need to have an option to monitor/log memory consumption.
i.e. The original script from the owner was using tracemalloc
to monitor memory.
It can probably be done in a similar way to the beamlime
benchmarking...?
Investigate if the workflow(s) are performant enough to handle the event rates expected for this technique.
See P1 in the guidelines
Related to #31.
From McStas 3 data, the event dataset is stored in separate data bank.
Therefore it is possible to load individual bank per detector.
For real measurement data, the data bank will be separate per detector as well.
For now, it is loading all banks at once (McStas 2) or merge all banks after loading (McStas 3).
This feature will be implemented once we have an general agreement
about how to handle detector
also in other ess
packages.
One of ideas was to use sciline param table.
Loader accepts many arguments, so it'd better accept keyword arguments only.
Once sciline is released with the keyword arguments feature,
we can turn the loader into keyword-only function.
The nmx domain type NMXData
is not made by typing.NewType
and directly inherits scipp.DataGroup
.
It has some properties as convenient handles for some fields, such as crystal_rotation
or sample_position
.
Therefore, we could use DataGroup
html repr in the jupyter lab and also have type hints from those type-hinted properties.
Current implementation assumes that the names of the properties are all NMX
specific, therefore they don't override any methods of scipp.DataGroup
.
But maybe there should be some simple checks if they are not actually overriding any methods.
This is the solution I came up with as implementing other things.
from typing import TypeVar
T = TypeVar("T")
def nooverriding(cls: T) -> T:
"""Decorator to prevent overriding a method in a subclass.
It does not check for private methods (those starting with '_').
"""
for parent in cls.__bases__:
preserved_names = [
name for name in dir(parent) if not name.startswith('_')
]
for name in preserved_names:
if not getattr(parent, name) is getattr(cls, name):
raise ValueError(f"{parent}.{name} is overridden in {cls}")
return cls
def test_no_overriding() -> None:
class A:
def foo(self):
return 'A.foo'
@nooverriding
class AA(A):
def foo2(self):
return 'AA.foo2'
a = AA()
assert AA.foo is A.foo
assert a.foo() == 'A.foo'
assert a.foo2() == 'AA.foo2'
def test_no_overriding_raises() -> None:
class A:
def foo(self):
return 'A.foo'
with pytest.raises(ValueError):
@nooverriding
class AAA(A):
def foo(self):
return 'AAA.foo'
The current McStas loader loads all banks and concats the events,
essnmx/src/ess/nmx/mcstas_loader.py
Lines 50 to 60 in c5a2520
essnmx/src/ess/nmx/mcstas_loader.py
Line 134 in c5a2520
panel
dimension is first removed in essnmx/src/ess/nmx/reduction.py
Lines 245 to 247 in c5a2520
essnmx/src/ess/nmx/reduction.py
Lines 151 to 153 in c5a2520
NMX data loader should be able to retrieve...
In the McStas simulation file, there is a h5 group instrument/instrument_xml/
that contains geometry information of the instruments.
McStas loader should be able to parse the xml information to retrieve the information above.
TODO:
We need to derive and apply scaling functions per wavelength and per position
There is scaling
step in the whole workflow.
And it could be done by LSCALE
,
But LSCALE is not maintained anymore so we need to implement this part ourselves.
It is for the scaling and normalization of Laue intensity data to yield fully corrected structure amplitudes. 1
MTZ files that contain
This functionality was under development in pyscale
and there is a notebook/script in the repository.
MTZ file containing ...
Python module / function, Python script, Jupyter notebook
Also available in pyscale
with sample data.
pyscale
is not open so you'll need to ask for an access to https://github.com/mlund/
NMX Data output should be in the nexus-nxmx format
In order to comply with modern data handling and archiving standards for macromolecular crystallography (MX) data, the data output should be in the nexus nxmx format (https://manual.nexusformat.org/classes/applications/NXmx.html). This format is already used at most synchrotron X-ray MX beamlines around the world, and NMX should comply with these standards as much as is possible. The nxmx format is generic to source and is inclusive of neutron MX data. This format is part of the MX "Gold standard" for data, adopted in 2020 after a long deliberation process. https://journals.iucr.org/m/issues/2020/05/00/ti5018/index.html
All data properties are detailed here: https://manual.nexusformat.org/classes/applications/NXmx.html
Note that not all fields are required, and is experiment-specific. Others that are labeled "recommended" are absolutely necessary for NMX data processing. Determining which are required for NMX data is ongoing.
For the most part, this will require accessing data from e.g. EPICS PVs and parsing them and putting them in the correct format. Some calculations might be required for determination of orientation- this is in progress.
Data arranged in the correct format (see above).
Integrated into reduction workflow, Python module / function
We have simulation data from McStas that we can use for testing.
No response
When sciline is released we can also merge #67 .
The code that uses sciline.Series
should be replaced with map
from new sciline.
Docstrings says
Lines 173 to 175 in 611c89a
Lines 194 to 203 in 611c89a
Furthermore, the clean that DEFAULT_SPACE_GROUP_DESC
is used is incorrect.
If I naively change this to match what the docstring claims, I get ValueError: Multiple space groups found: [<gemmi.SpaceGroup("P 21 21 21")>, <gemmi.SpaceGroup("P 21 21 21")>, <gemmi.SpaceGroup("P 21 21 21")>, <gemmi.SpaceGroup("P 21 21 21")>, <gemmi.SpaceGroup("P 21 21 21")>]
when running the docs workflow, I suppose since all files have the same space group.
Instrument view for NMX is not needed right now but it was useful for mcstas xml parser,
so I tried to add it in the documentation as well.
But the document build failed due to the output.
It fails complaining about the number of lines. It seems to be related to this issue.
ERROR: Line 404 exceeds the line-length-limit.
WARNING: Each notebook should have at least one section title
It leads to the failure of linking the notebook page to the main page, since it couldn't parse the title of the notebook.
It happened locally and also in the CI build. (The docs build test itself doesn't fail but the notebook is not included in the main page.)
scippneutron
doesn't have this issue so there must be a solution.
For now the instrument view is included as a code snippet, not as a cell.
essnmx
should save reduced data as a nexus file for other analysis software.
This feature is done by #21 but it does not check if the nexus file is valid or not.
It should have a test or an option to check if the result file is valid or not.
Maybe chexus can be used here.
It has relatively simple structure, so it shouldn't be so difficult to implement.
Is this blocked by anything, or can we go ahead?
@jokasimr suggested trying different loss
functions for fitting.
Maybe it'll be convenient if a user can choose loss
functions as well.
The diagram of higher-level workflow design is now in png format.
But since it needs some memos and annotations, regarding the status of some of software, we'd better keep them more editable.
The IDS owner also said the mermaid is fine but I had trouble make it work in the documentation build framework...
There is a latex code for this diagram so we can probably use latex instead.
stateDiagram
direction TB
Step0: Starting Experiment
Step1: Setting up Measurement Strategy
Step0 --> Step1
state ECDC {
Step2: Data Readout from Detectors
}
Step1 --> ECDC
state SCIPP {
Step3.0: TOA to TOF
}
ECDC --> SCIPP
state DIALS {
direction LR
Step3.1: 1. Spot Finding
Step3.2: 2. TOF to &lambda
Step3.3: 3. Indexing
Step3.4: 4. Refine Indexing
Step3.5: 5. Spot Integration
Step3.1 --> Step3.2
Step3.2 --> Step3.3
Step3.3 --> Step3.4
Step3.4 --> Step3.5
}
SCIPP --> DIALS
state pyscale: pyscale {
Step3.6: Scaling (temporarily using LSCALE)
}
DIALS --> pyscale
state CCP4 {
direction LR
Step3.7: Merging, AIMLESS
Step3.8: I to SFs
Step3.7 --> Step3.8
}
pyscale --> CCP4
state PHENIX {
direction LR
Step4.0: Phasing
Step4.0 --> Step4.1
state MODELCOMPLETED {
state refine <<choice>>
state map <<choice>>
Step4.1: Model Completion
Step4.2: Refinement
Step4.3: MapCalculation
Step4.4: Done
Step4.1 --> Step4.2
Step4.2 --> refine
refine --> Step4.4
refine --> Step4.3
Step4.3 --> map
map --> Step4.4
map --> Step4.1
}
}
CCP4 --> PHENIX
Step5: Finished Experiment
PHENIX --> Step5
The last iteration between model-completion, refinement and map calculation is more clear in the mermaid diagram, but the if-node may need more clear annotation.
NMX data is partially reduced with scipp and the rest of the steps are done by external software.
Still we would like to have all the information in one place.
Here are some information that IDS(Justin) provided:
Also received slides about these steps through the slack channel : D
TODO
See S.6 in guidelines.
Investigate if this is a problem, and if it is, fix.
There is a slight difference between simulation data and measurement data
In the measurement data, event data is provided per detector and simulation data has all event data in one place.
So for the measurement data, there is one more step to combine data sets from all detectors.
We also need to make small mcstas simulation data for testing.
NMX already has a working script for data reduction written by the IDS owner(Justin).
We can build workflow with sciline from the script.
Check adherence to https://scipp.github.io/essreduce/user-guide/reduction-workflow-guidelines.html.
Without having look closely, I think the main changes that will be required are (1) handle each detector bank separately, (2) add NeXus loaders (based on scipp/essreduce#8), and (3) eliminating data structures that were following McStas specifics.
Task for now: Survey the package and list what needs to be changed.
Once we drop python3.9, we can use Pipeline._repr_html_
Currently it's broken since it is not showing the real name of the NewType
in python3.9
The scaling workflow steps and test datasets are still being investigated and updated relatively frequently.
We have tests for file IO functionalities, but we still don't have any tests for each steps.
Once we fix mandatory steps, we shall write proper tests for them.
I'm opening this issue so that we don't forget....
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.