Git Product home page Git Product logo

chessanalysispipeline's Introduction

DOI pylint unittests

ChessAnalysisPipeline (CHAP)

CHAP is a package that provides a framework for executing data anlaysis pipelines. The package can be found on PyPI and conda-forge.

Subpackages

There are several subpackages within CHAP that contain specialized items to handle specific types of data processing in the CHAP framework. Dependencies for these subpackages can be found in CHAP/<subpackage_name>/environment.yml.

Documentation

Documentation for the latest version can be found on this project's github pages site.

Galaxy

The galaxy-tools/ directory contains a set of CHAP-based tools for use in the Galaxy science gateway.

chessanalysispipeline's People

Contributors

keara-soloway avatar rolfverberg avatar vkuznet avatar arwoll avatar

Stargazers

 avatar Volodymyr avatar  avatar

Watchers

 avatar Volodymyr avatar

chessanalysispipeline's Issues

Request option to set detector config defaults in EDD Calibration

The new / recommended workflow for EDD detector calibration calls for an Energy calibration step followed by a tth calibration. Currently, the output file from Energy calibration includes defaults for, e.g., tth_initial_guess, which in general will be among the first things the user will want to specity.

The current Energy config file reader permits these defaults to be set only at the per-detector level, as in the snippet below.

pipeline:
  # Perform energy calibration first
  - edd.MCAEnergyCalibrationProcessor:
      # Ce lines: Ka2=34.28, Ka1=34.72, Kb1=39.257, Kb2=40.236
      peak_energies: [34.72, 34.28,  39.257, 40.236]
      max_peak_index: 0
      fit_index_ranges: [[650, 850]]
      config:
        spec_file: raw_data/edd23-char-1/spec.log
        scan_number: 87
        scan_step_index: 40
        flux_file: reduced_data/analysis_ARW/flux.dft
        detectors:
          - detector_name: 0
            tth_initial_guess: 7.105
            include_energy_ranges: [[50, 100]]
            tth_max: 10.0
          - detector_name: 2
            tth_initial_guess: 7.105
            include_energy_ranges: [[50, 100]]
            tth_max: 10.0
...

Propose: permit specification of defaults for all detectors. For instance any item in the "detectors" section that is NOT "detector_name" could be interepreted as a default for all detectors, UNLESS overwitten in the detector specific location, e.g.:

pipeline:
  # Perform energy calibration first
  - edd.MCAEnergyCalibrationProcessor:
      # Ce lines: Ka2=34.28, Ka1=34.72, Kb1=39.257, Kb2=40.236
      peak_energies: [34.72, 34.28,  39.257, 40.236]
      max_peak_index: 0
      fit_index_ranges: [[650, 850]]
      config:
        spec_file: raw_data/edd23-char-1/spec.log
        scan_number: 87
        scan_step_index: 40
        flux_file: reduced_data/analysis_ARW/flux.dft
        detectors:
          - tth_initial_guess: 7.105
          - include_energy_ranges: [[50, 100]]
          - tth_max: 10.0
          - detector_name: 0             # <----  detector "0" uses all three defaults above
          - detector_name: 2             # <---- detector "2" uses the defaults for tth_initial_guess and tth_max, but overrides the energy range, e.g. due to some errant peak in that particular case
            include_energy_ranges: [[50, 90]]
...

TOMO: detector pixels frame of reference

TOMO all detector pixels relative to detector frame, instead of to the cropped image bounds. In all figures, but also in nexus output quantities like center_rows.
Also add local/long names to relavant Nexus output fields

Use MapReader with SMB-style.par files

MapReader should be able to accept a full map config dict, or accept a par file and list of columns to be used as independent dims and construct a MapConfig from those. See edd.model.StrainAnalysisConfig for example.

Previously working EDD map pipeline now fails

Describe the bug
A previously working EDD map pipeline constructed in December now fails to work

To Reproduce
Provide a minimal pipeline configuration in which the bug appears:

config:
  root: /nfs/chess/auxiliary/cycles/2023-3/id1a3/ko-3538-b/
  inputdir: raw_data/edd23-char-1/
  outputdir: reduced_data/analysis_ARW/
  log_level: debug

pipeline:
  - common.MapReader:
      map_config:
        title: edd-char-1-87
        station: id1a3
        experiment_type: EDD
        sample:
          name: ceria
        spec_scans:
        - spec_file: spec.log
          scan_numbers: 
          - 87
        independent_dimensions:
        - label: samy
          units: mm
          data_type: spec_motor
          name: sampYcp
        # ADD ANOTHER IND. DIM. IF NEEDED
      detector_names: [0,2,3,5,6,7,8,10,13,14,16,17,18,19,21,22]
  - common.NexusWriter:
      filename: edd-char-1-87-TEST.nxs
      force_overwrite: true

Expected behavior
Should produce the nexus file listed above, edd-char-1-87-TEST.nxs.

Screenshots
(CHAP_edd) [aw30@lnx-id3b-2 analysis_ARW]$ CHAP edd-char-1-87-ceria-map-pipeline.yaml
CHAP.runner : INFO: Input pipeline configuration: [{'common.MapReader': {'map_config': {'title': 'edd-char-1-87', 'station': 'id1a3', 'experiment_type': 'EDD', 'sample': {'name': 'ceria'}, 'spec_scans': [{'spec_file': 'spec.log', 'scan_numbers': [87]}], 'independent_dimensions': [{'label': 'samy', 'units': 'mm', 'data_type': 'spec_motor', 'name': 'sampYcp'}]}, 'detector_names': [0, 2, 3, 5, 6, 7, 8, 10, 13, 14, 16, 17, 18, 19, 21, 22]}}, {'common.NexusWriter': {'filename': 'edd-char-1-87-TEST.nxs', 'force_overwrite': True}}]

CHAP.runner : INFO: Loaded <CHAP.common.reader.MapReader object at 0x7fdb7a015510>
CHAP.runner : INFO: Loaded <CHAP.common.writer.NexusWriter object at 0x7fdb7a0156f0>
CHAP.runner : INFO: Loaded <CHAP.pipeline.Pipeline object at 0x7fdb7a0153c0> with 2 items

CHAP.runner : INFO: Calling "execute" on <CHAP.pipeline.Pipeline object at 0x7fdb7a0153c0>
Pipeline : INFO: Executing "execute"

Pipeline : INFO: Calling "execute" on <CHAP.common.reader.MapReader object at 0x7fdb7a015510>
MapReader : DEBUG: Executing "read" with {'inputdir': '/nfs/chess/previousid1a3/2023-3/ko-3538-b/edd23-char-1', 'map_config': {'title': 'edd-char-1-87', 'station': 'id1a3', 'experiment_type': 'EDD', 'sample': {'name': 'ceria'}, 'spec_scans': [{'spec_file': 'spec.log', 'scan_numbers': [87]}], 'independent_dimensions': [{'label': 'samy', 'units': 'mm', 'data_type': 'spec_motor', 'name': 'sampYcp'}]}, 'detector_names': [0, 2, 3, 5, 6, 7, 8, 10, 13, 14, 16, 17, 18, 19, 21, 22]}
MapReader : INFO: Executing "read"
Traceback (most recent call last):
File "/nfs/chess/user/aw30/Git_Repos/ChessAnalysisPipeline/CHAP/common/models/map.py", line 308, in validate_for_spec_scans
self.get_value(scans, scan_number, index)
File "/nfs/chess/user/aw30/Git_Repos/ChessAnalysisPipeline/CHAP/common/models/map.py", line 387, in get_value
return get_spec_motor_value(spec_scans.spec_file,
File "/nfs/chess/user/aw30/Git_Repos/ChessAnalysisPipeline/CHAP/common/models/map.py", line 446, in get_spec_motor_value
scanparser.get_spec_scan_motor_vals(
File "/nfs/chess/user/aw30/Git_Repos/ChessAnalysisPipeline/CHAP/utils/scanparsers.py", line 1213, in get_spec_scan_motor_vals
raise NotImplementedError('Only relative motor values are available.')
NotImplementedError: Only relative motor values are available.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/nfs/chess/user/aw30/miniconda3/envs/CHAP_edd/bin/CHAP", line 33, in
sys.exit(load_entry_point('ChessAnalysisPipeline', 'console_scripts', 'CHAP')())
File "/nfs/chess/user/aw30/Git_Repos/ChessAnalysisPipeline/CHAP/runner.py", line 100, in main
runner(run_config, pipeline_config)
File "/nfs/chess/user/aw30/Git_Repos/ChessAnalysisPipeline/CHAP/runner.py", line 115, in runner
run(pipeline_config,
File "/nfs/chess/user/aw30/Git_Repos/ChessAnalysisPipeline/CHAP/runner.py", line 218, in run
pipeline.execute()
File "/nfs/chess/user/aw30/Git_Repos/ChessAnalysisPipeline/CHAP/pipeline.py", line 44, in execute
data = item.execute(data=data, **kwargs)
File "/nfs/chess/user/aw30/Git_Repos/ChessAnalysisPipeline/CHAP/pipeline.py", line 168, in execute
data = method(**args)
File "/nfs/chess/user/aw30/Git_Repos/ChessAnalysisPipeline/CHAP/common/reader.py", line 121, in read
map_config = MapConfig(**map_config, inputdir=inputdir)
File "pydantic/main.py", line 339, in pydantic.main.BaseModel.init
File "pydantic/main.py", line 1076, in pydantic.main.validate_model
File "pydantic/fields.py", line 895, in pydantic.fields.ModelField.validate
File "pydantic/fields.py", line 928, in pydantic.fields.ModelField._validate_sequence_like
File "pydantic/fields.py", line 1094, in pydantic.fields.ModelField._validate_singleton
File "pydantic/fields.py", line 898, in pydantic.fields.ModelField.validate
File "pydantic/fields.py", line 1151, in pydantic.fields.ModelField._apply_validators
File "pydantic/class_validators.py", line 339, in pydantic.class_validators._generic_validator_basic.lambda14
File "/nfs/chess/user/aw30/Git_Repos/ChessAnalysisPipeline/CHAP/common/models/map.py", line 568, in validate_data_source_for_map_config
return _validate_data_source_for_map_config(data_source, values)
File "/nfs/chess/user/aw30/Git_Repos/ChessAnalysisPipeline/CHAP/common/models/map.py", line 566, in _validate_data_source_for_map_config
data_source.validate_for_spec_scans(values.get('spec_scans'))
File "/nfs/chess/user/aw30/Git_Repos/ChessAnalysisPipeline/CHAP/common/models/map.py", line 310, in validate_for_spec_scans
raise RuntimeError(
RuntimeError: Could not find data for sampYcp (data_type "spec_motor") on scan number 87 for index 0 in spec file /nfs/chess/previousid1a3/2023-3/ko-3538-b/edd23-char-1/spec.log

Environment (please complete the following information):
This was run on the cluster using the current version of the edd-spring2024 branch.

Additional context
I briefly thought the path was not properly resolving (my pipeline specifies

root: /nfs/chess/auxiliary/cycles/2023-3/id1a3/ko-3538-b/ 
inputdir: raw_data/edd23-char-1/

whereas the Traceback complains about the location:

/nfs/chess/previousid1a3/2023-3/ko-3538-b/edd23-char-1/spec.log

but these are clearly resolving to the same file.

Based on the error message "Only relative motor values are available" I suspect that this error is caused by changes made to accommodate the current EDD workflow.

link to galaxy tool

Create a generic tool (in common? or utils?, a processor or writer?) to create a history and upload files from command line, given a map and potentially additional input files.

See: /nfs/chess/user/rv43/Tomo/workflow/link_to_galaxy.py
for the former Tomo tool that does something similar.

EDD Energy / tth calibration option to specify & combine more than one scan point for better statistics

In the current EDD calibration workflow, the option exists to calibrate using a specific point of a scan, e.g as follows:

pipeline:
  # Perform energy calibration first
  - edd.MCAEnergyCalibrationProcessor:
      # Ce lines: Ka2=34.28, Ka1=34.72, Kb1=39.257, Kb2=40.236
      peak_energies: [34.72, 34.28,  39.257, 40.236]
      max_peak_index: 0
      fit_index_ranges: [[650, 850]]
      config:
        spec_file: raw_data/edd23-char-1/spec.log
        scan_number: 87
        scan_step_index: 40
        flux_file: reduced_data/analysis_ARW/flux.dft
        detectors:
          - detector_name: 0
            tth_initial_guess: 7.105
            include_energy_ranges: [[50, 100]]
            tth_max: 10.0
...

Propose change "scan_step_index" to "scan_step_indices" to allow the user to specify performing the calibration with a SUM of spectra from multiple points in scan, e.g.:

pipeline:
...
      config:
        spec_file: raw_data/edd23-char-1/spec.log
        scan_number: 87
        scan_step_indices: [38,39,40,41,42]
...

Ot better yet, something functionally equivalent to

pipeline:
...
      config:
        spec_file: raw_data/edd23-char-1/spec.log
        scan_number: 87
        scan_step_indices: list(range(30,51))
...

This would permit calibration with better statistics.

Bugs in docs/installation.md

The values supplied to options for the literalinclude directive need to be enclosed in quotes.

Language identifiers for code blocks should not be in braces.

CHAP.runner.parser should be a function that returns an argparse.ArgumentParser

Broken action: "Deploy Sphinx documentation to Pages"

Github's new deployment protection rule breaks the "Deploy Sphinx documentation to Pages" action.

From github's blog:

"...We are also preventing tags with the same name as a protected branch from deploying to the environments with branch policies around protected branches."

conda-forge and galaxy-dev ci/cd

There are still manual steps in the CHAP release -> galaxy tool pipeline:

  1. Approval of merge request from the conda-forge regro-cf-autotick-bot (usually happens after ~2 hours)
  2. Installation of galaxy tools on galaxy-dev (must happen only after 1. is done)

According to Installing Tools into Galaxy:

Automated installation - The process of installing tools from Tool Shed can be performed in an automated way using a set of scripts. This is particularly useful if you are trying to install a large number of tools. The required scripts are available as an Ansible playbook from here. Please see that page for complete instructions.

Energy Calibration output image files need "detector_name" included.

In the spring 2024 branch, Rolf helped me change line 1266 from this:

            figfile = os.path.join(outputdir, 'energy_calibration_fit.png')

to this:

            figfile = os.path.join(outputdir, f'energy_calibration_fit_{detector.detector_name}.png')

to disambiguate output files when fitting more than one detector. (I forgot to make a new branch prior to making this change so am suggesting it here rather than with a pull request...)

EDD fall 2023

Remaining tasks for the fall 2023 EDD workflow:

  • Add new Processor to refine lattice parameters.

StrainAnalysisProcessor & MCACeriaCalibrationProcessor:

  • select mask & HKLs for fitting in one interactive plot (not two separate ones).

DiffractionVolumeLengthProcessor:

  • Include text annotation on final plot: measured DVL
  • Include in results: parameters of the gaussian fit
  • Account for sample thickness

StrainAnalysisProcessor:

  • new interaction point: materials parameters (lattice params & space group) selection
  • allow for a variable tth angle to be used at each point in the map
  • show a flattened map (2d image) of all MCA data underneath the 1D reference spectrum shown when selecting HKLs for fitting
  • indicate the mask used for calibration when selecting the mask to use for strain analysis
  • place fit metadata, results, residuals, chisq values, etc in to the structure returned by get_nxroot. Include: goodness of fit value (redchi), success / failure flag.
  • include extra dataset in resulting nexus structure: map of integral of each MCA spectrum
  • outstanding question for CB, PK, KN: how to handle "jagged" maps or data taken at duplicate coordinates from input par files?
  • Generate plot: resulting strains
  • Generate plots: raw data & best fit at each map point
  • Fix bug: Material name in GUI for selecting material parameters is always "Ni"
  • Fix bug: in the returned nexus structure, initial guesses for centers on the unconstrained fits are not being recorded properly (uniform fits are okay, though)
  • Indicate that the microstrains are calculated from the unconstrained fits in the resulting nexus structure
  • Be more specific about what the reference spectrum shown is in the mask & HKL selection window
  • (non-urgent) add option to click on a location on the y-axis on the lower 2D reference map & make that the reference spectrum shown on the upper 1D plot
  • zero-pad the frame_n.png filenames

CI/CD for galaxy tools

Add a github workflow that uses planemo to deploy the CHAP galaxy tools to the usual toolshed. On every commit? Or on every new release?

CHAP.version

Add a CHAP.version attribute that holds the version string.

PipelineItem.get_configs

Add a get_configs method to PipelineItem. Use it to replace the nearly identical get_configs methods on so many of the implemented Processors.

Roughly:

def get_configs(self, data, schema):
    """Look through `data` for an item whose value for the `'schema'`           
    key matches `schema`. Convert the value for that item's `'data'`            
    key into the configuration `BaseModel` identified by `schema` and           
    return it.                                                                  
                                                                                
    :param data: input data from a previous `PipelineItem`                      
    :type data: list[PipelineData]                                              
    :param schema: name of the `BaseModel` class to match in `data` & return    
    :type schema: str                                                           
    :raises ValueError: if there's no match for `schema` in `data`              
    :return: matching configuration model                                       
    :rtype: BaseModel                                                           
    """

NB: This would impact pipeline configuration: schema will now need to get the full module path to the BaseModel of interest. For example, in examples/edd/pipeline.yaml, L14:
schema: MCACeriaCalibrationConfig
would need to become
schema: edd.MCACeriaCalibrationConfig
If we also add members of CHAP/edd/models.py to CHAP/edd/__init__.py.

Refactor Tomo from the NXTomo structure to a MapConfig compatible structure

Refactor Tomo from the NXTomo structure to a MapConfig structure, including using independent_dimensions for SMB. This has some issues with the theta dimension that would need a few hard wired code parts (get_spec_scan_npts and getting theta associated with a fake motor mnemonic), but looks otherwise pretty straightforward.

For SMB you would have in the map yaml something like:
independent_dimensions:

  • label: theta
    units: degrees
    data_type: spec_motor
    name: ome
  • label: horizontal_shift
    units: mm
    data_type: smb_par
    name: labx
  • label: vertical_shift
    units: mm
    data_type: smb_par
    name: labz
    scalar_data:
  • label: theta_start
    units: degrees
    data_type: smb_par
    name: omestart
  • label: theta_end
    units: degrees
    data_type: smb_par
    name: omeend
  • label: num_theta
    units: '-'
    data_type: smb_par
    name: nframes

unit tests

Add unit tests for the CHAP module; integrate their execution with CI/CD.

  • Enumerate testable entities
  • Decide what metrics need to be tested on each one
  • Implement

Flexible spec_numbers input in pipeline.yaml

Right now you enter spec info in the pipeline.yaml like:
spec_scans:
- spec_file: set2_c1-1/spec.log
scan_numbers: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

Create an option where you can enter the scan_numbers to include as a range along the line of:
spec_scans:
- spec_file: set2_c1-1/spec.log
scan_number_range: [1,13]
Or something similar.

Galaxy tool tests

Make sure every galaxy tool .xml file has something in its <tests> section.

CI/CD for pylint

Add a github action to run pylint on the CHAP module on every commit. Use the existing .pylintrc file to configure pylint, but change the fail-under parameter on line 42 to 8.

Docs

Add docs, build w/ sphinx & deploy to github pages w/ CI/CD.

Add cursory galaxy docs for users to the CLASSE computing wiki.

Add configuration pipeline item

Describe intended use of the requested PipelineItem
To do proper integration with CHAPBook and allow portable configuration and location of pipieline workflows it will be useful to develop (with some defaults) a configuration module which will perform the following:

  • setup CHAP ROOT location which will be used by CHAP tool and pipieline to load various workflows
  • location of workflows area
  • optionally define additional flags, like profile, etc.

Then, we can use such configuration pipeline module/item in every configuration, e.g.

# configuration
config:
  root: /path/workflows
  profile: true
  verbose_level: 1
  interactive_prompt: true
  input: /path/to/input/location
  output: /path/to/output/location

pipeline:
  # Collect map data
  - common.YAMLReader:
      filename: examples/saxswaxs/map_1d.yaml
      schema: MapConfig

In this example, the root defines main root area CHAP will use and all other pipeline files will use it, i.e. the examples/saxswaxs/map_1d.yaml will be defined not within current working directory but wrt to root path. The profile option will run profiler such that we do not need to specify this parameter at CLI, etc. We may have further extensions to configuration module, e.g. run interactive pipeline, or use verbosity level.

Additional context
Such configuration module will allow integration with CHAPBook service and/or relocation some parts of CHAP parts, e.g. examples, workflows, etc. to a different location.

CHAP.common.StrainAnalysisProcessor

Implement CHAP.common.StrainAnalysisProcessor. This needs to be ready to go for the 2023-3 run (first day of users: Wednesday October 18th)

Issue templates

Create issue templates for this project -- bug report, feature request, etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.