Git Product home page Git Product logo

dcm-classifier's Introduction

Introduction

In this work, we developed a robust, easily extensible classification framework that extracts key features from well-characterized DICOM header fields for identifying image modality and acquisition plane. Our tool is crucial for eliminating error-prone human interaction and allowing automatization, increasing imaging applications' reliability and efficiency. We used Random Forrest and Decision Tree algorithms to determine the image modality and orientation. We trained on header meta-data of over 49000 scan volumes from multiple studies and achieved over 99% prediction accuracy on image modality and acquisition plane classification.

This project was supported by several funding sources including:

  • UCSF SCOUTS RO1
  • NIH-NINDS R01NS114405 and NINDS R01 NS119896
  • Botimageai.

Paper

Click here to view the published paper.

Citing

Please reference the manuscript:

Michal Brzus, Cavan J. Riley, Joel Bruss, Aaron Boes, Randall Jones, Hans J. Johnson, "DICOM sequence selection for medical imaging applications," Proc. SPIE 12931, Medical Imaging 2024: Imaging Informatics for Healthcare, Research, and Applications, 1293108 (2 April 2024); https://doi.org/10.1117/12.3006568

Additionally, please reference the citations located in the citations directory

Instructions

Below are instructions for installing and using the package as a user and developer.

Documentation

The documentation for the package can be found here

Tutorials

Tutorial notebooks are provided in the scripts directory for training and using the classifier along with all the necessary scripts for training a custom model.

User Instructions

Pip install

Users have the ability to simply pip install the package which will install the classifier and all necessary dependencies to run the classifier for ease of use. This will also provide the user with the pretrained model for classification.

$ pip install dcm-classifier

Clone the repository

If you prefer to clone the git repository:

$ git clone https://github.com/BRAINSia/dcm-classifier.git

Navigate to the cloned repo

$ cd <repo path>

Setup virtual environment

$ python3 -m venv <venv_path> && source <venv_path>/bin/activate

Install required packages

$ pip install -r requirements.txt

Developer Instructions

For development, clone the repository and install the developer requirements in a virtual environment. Development allows for training of new models using the scripts directory.

$ pip install -r requirements_dev.txt

Install pre-commit hooks

$ pre-commit install

Run pre-commit hooks to ensure code quality

$ pre-commit run -a

Run the classify study script, the path to a model can be omitted and the default model provided in the package will be used.

$ python3 <path_to_scripts_directory>/classify_study.py -d <path_to_dicom_session>

or pass the path to a separate model

$ python3 <path_to_scripts_directory>/classify_study.py -m models/ova_rf_classifier.onnx -d <path_to_dicom_session>

Testing

Testing in the dcm-classifier package is done using pytest. To run the tests, navigate to the root directory of the package

The testing data is stored in Git LFS so the following commands will be needed before running pytest

  git lfs fetch
  git lfs checkout

and now

  pytest
  # or to fail on warnings
  python3 -Werror::FutureWarning -m pytest

Coverage Analysis

To run coverage analysis, navigate to the root directory of the package and run the following commands:

 coverage run --concurrency=multiprocessing --parallel-mode -m pytest tests --junitxml=tests/pytest.xml
 coverage combine
 coverage report --format=text -m |tee tests/pytest-coverage.txt
 coverage xml -o tests/coverage.xml
 coverage xml -o tests/coverage.xml

Contributing

We welcome contributions from the community! Before getting started, please take a moment to review our Contribution Guidelines for instructions on how to contribute to this project. Whether you're fixing a bug, implementing a new feature, or improving documentation, your contributions are greatly appreciated!

FAQs

  1. What is the purpose of this package?

    The purpose of this package is to provide a tool for classifying DICOM images based on their header information. This tool can be used to automate the classification process and eliminate human error.

  2. What are the key features of this package?

    The key features of this package include:

    • Classification of DICOM images based on header information
    • Automated classification process
    • Elimination of human error
  3. What are the future plans for this package and how can I contribute?

    The future plans for this package include:

    • Adding support for more image modalities
    • Improving the classification accuracy
    • Adding support for more DICOM header fields

Authors

  1. Michal Brzus

    github: mbrzus, email: [email protected]

  2. Hans J. Johnson

    github: BRAINSia, email: [email protected]

  3. Cavan Riley

    github: CavRiley, email: [email protected]

dcm-classifier's People

Contributors

mbrzus avatar cavriley avatar hjmjohnson avatar ivanjohnson-lab avatar joslinsome avatar saanbe16 avatar

Stargazers

 avatar Deepa Krishnaswamy avatar

Watchers

 avatar eunyoung regina kim avatar  avatar

dcm-classifier's Issues

Replace failing flair test

The test

@pytest.mark.skip(reason="This test is failing, and can not be confirmed to be correct")
def test_flair_dcm_series_modality(mock_flair_series):
    for series in mock_flair_series:
        assert series.get_modality() == "flair"
    # for series_number, series in mock_series_study.series_dictionary.items():
    #     if series_number == 7:
    #         assert series.get_modality() == "flair"

needs to be verified and made to pass.

Update Testing

If this feature request related to a problem? Please describe.

The testing suite currently has a couple of areas where changes can be made to make a more developer friendly environment. These include implementing git LFS for testing data and removing skipped tests

Alternatives Considered If Applicable

N/A

Rationale

  • Git LFS benefits the storage requirements for the repository
  • Removing skipped tests reduces clutter in testing files as well as improving pytest outputs

Implementation Ideas

  • Adding the testing data to git lfs
  • Reviewing skipped tests and deciding whether they will be implemented later or removed

Additional Context

N/A

Add GitHub Actions

If this feature request related to a problem? Please describe.

This feature request addresses the need for automating various tasks, such as testing, linting, and deployment, within the GitHub repository. Currently, these tasks are performed manually, leading to inconsistencies, delays, and potential errors in the development workflow. By adding GitHub Actions, we can automate these tasks, streamline the development process, and ensure the reliability and consistency of our codebase.

Alternatives Considered If Applicable

N/A

Rationale

The addition of GitHub Actions offers several benefits:

  • Automated Workflow: GitHub Actions allow us to define custom workflows to automate repetitive tasks, such as running tests, checking code quality, and deploying releases.
  • Increased Efficiency: Automating tasks reduces manual effort and minimizes the time required for code review and deployment, enabling faster iteration and delivery of features.
  • Enhanced Reliability: By automating testing and deployment processes, we can catch bugs early and ensure that changes are deployed consistently across different environments, leading to a more reliable and stable codebase.

Implementation Ideas

  • Define GitHub Actions workflows using YAML syntax to specify the sequence of tasks to be executed, such as running tests, linting code, and building releases.
  • Utilize pre-existing GitHub Actions provided by the community or create custom actions tailored to the specific needs of the project.
  • Integrate GitHub Actions with existing CI/CD tools, version control systems, and issue trackers to create a seamless development workflow.
  • Leverage GitHub Actions features such as triggers, conditions, and environment variables to customize and optimize workflows based on project requirements.
  • Document the GitHub Actions workflows and usage guidelines to facilitate collaboration and onboarding for contributors.

Additional Context

Integrating GitHub Actions into the project aligns with our goal of adopting modern development practices and improving the efficiency and reliability of our software delivery process. By automating repetitive tasks and standardizing our workflows, we can accelerate development cycles, reduce errors, and enhance the overall quality of our codebase. Additionally, GitHub Actions provide a flexible and scalable platform for automating various aspects of the development lifecycle, enabling us to adapt and evolve our processes as the project grows.

Update README For Updates in Classifier

Documentation Location

The documentation issue is located in the README file of the project repository.

Type of Documentation Issue

This is an enhancement request to improve the clarity and completeness of the README documentation.

Issue Description

The current README lacks detailed instructions on how to install and use the package effectively. Additionally, it does not provide information on the prerequisites required for running the package.

Suggested Changes

  • Add a comprehensive installation guide with clear steps for installing the package via pip.
  • Include instructions on how to import and use the package in various environments.
  • Provide details on any dependencies or system requirements necessary for running the package.
  • Incorporate examples or code snippets to illustrate common usage scenarios.

Additional Context

The current README provides a basic overview of the project but lacks sufficient information for users to effectively utilize the package. Enhancing the documentation will improve the user experience and reduce confusion for new users.

Proposed Solution

To address this issue, I propose updating the README with detailed installation instructions, usage guidelines, and information on prerequisites. Additionally, including examples and troubleshooting tips will further enhance the usability of the documentation.

Additional Comments

Improving the README documentation is crucial for ensuring that users can easily understand and utilize the features offered by the package. Clear and concise instructions will help users get started quickly and minimize the need for additional support.

Add Pre-Commit Hooks

If this feature request related to a problem? Please describe.

This feature request addresses the problem of maintaining code quality and consistency across contributions to the repository. Currently, there is no mechanism in place to enforce coding standards and best practices before committing changes. As a result, the codebase may suffer from inconsistencies, style violations, and potential errors introduced by contributors. Implementing pre-commit hooks will help mitigate these issues by automatically running code checks and formatting tasks before allowing commits, ensuring that all code adheres to the defined standards.

Alternatives Considered If Applicable

N/A

Rationale

The addition of pre-commit hooks is essential for maintaining a high level of code quality and consistency within the repository. By enforcing coding standards and best practices at the time of committing changes, we can prevent style violations, catch potential errors early in the development process, and streamline the review process for pull requests. This not only improves the overall readability and maintainability of the codebase but also promotes a collaborative and efficient development workflow.

Implementation Ideas

  • Configure pre-commit hooks to run code formatting tools such as Black to ensure consistent code style across the repository.
  • Integrate linters and static analysis tools (e.g., Flake8, ESLint) into pre-commit hooks to catch potential errors and enforce coding standards.
  • Provide clear documentation and guidelines for contributors on how to set up and use pre-commit hooks locally.
  • Consider incorporating automated testing tasks into pre-commit hooks to ensure that changes do not break existing functionality.

Additional Context

N/A

Future versions of pandas will fail

A set of ill-advised pandas uses should be modified to reduce the chance of failures.

After updating packages to #1

pip install -r requirements_dev.txt
python -Werror::FutureWarning -m pytest

Workflows Not Being Ran

Describe the bug:

We have a workflow that is supposed to run on pull requests to the main branch. This is not the actual behavior of the workflow currently, the workflow only runs on changes to the actual workflow file are included in the pull request.

To Reproduce:

Create a pull request with modifying the pr_to_main workflow and without modifying the file.

Expected behavior:

The workflow should be ran

Actual Behavior:

The workflow is not actually ran

Screenshots:

N/A

Environment:

N/A

Additional context:

N/A

Logs

N/A

Possible Fix

Remove the branches tab from the pull request section in the workflow file.

Improvement on Testing Coverage

If this feature request related to a problem? Please describe.

This request addresses the issue of insufficient test coverage within the testing suite. Currently, there are gaps in the tests, particularly in covering new methods added to the classifiers classes. Without comprehensive test coverage, there's a risk of undetected bugs or regressions, which could impact the reliability and stability of the package.

Alternatives Considered If Applicable

N/A

Rationale

Increasing test coverage is crucial for maintaining code quality and ensuring the robustness of the software. By adding tests for new methods introduced to the classifier classes, we can verify their functionality, detect potential issues early in the development cycle, and prevent regressions in existing functionality.

Implementation Ideas

  • Identify New Methods: Review recent changes to the classifier classes and identify any new methods that have been added.

  • Write Unit Tests: Develop unit tests to cover the functionality of each new method. Ensure that the tests check for expected behavior, handle edge cases, and verify that the method behaves as intended under various conditions.

Additional Context

N/A

Scripts Inclusion in Pip Package

If this feature request related to a problem? Please describe.

This feature request aims to include additional scripts in the pip package to enhance the usability of the package for end-users.

Alternatives Considered If Applicable

N/A

Rationale

Adding scripts to the pip package allows users to conveniently access command-line tools or utilities provided by the package without having to manually locate or install them separately. This enhances the user experience and streamlines the workflow for utilizing the package's functionality.

Implementation Ideas

  • Identify the scripts or command-line tools that should be included in the pip package.
  • Ensure that these scripts are properly documented and tested to meet the quality standards of the project.
  • Update the package setup script (pyproect.toml) to include the necessary configuration for including the scripts in the package distribution.
  • Consider organizing the scripts into a dedicated directory within the package structure for better organization and maintainability.
  • Update the package documentation to inform users about the available scripts and how to use them.

Additional Context

Including scripts in the pip package will make it easier for users to access and utilize the package's functionality directly from the command line. This feature aligns with the project's goal of providing a user-friendly and comprehensive solution for its users.

Implement Same ITK Image Spacing Function

Summary

Currently in source's study processing file, the fix adjacent volumes method contains logic to find if two ITK images are in the same space which can be a useful utility function and should be added to the utility function file for potential future use.

Additional Notes

N/A

Add Sphinx Documentation

If this feature request related to a problem? Please describe.

Currently, the classifier has no documentation pages. This can cause problems for new users and developers who would like to use the classifier but don't have a user friendly method of learning how.

Alternatives Considered If Applicable

N/A

Rationale

Without documentation explaining how the classifier works, there is little other developers or users can do to understand and use this tool. This is why it is imperative we add documentation for the sake of users and developers alike.

Implementation Ideas

  • Using sphinx, HTML documentation can be built fairly easily and automatically
  • Linking the documentation to the pypi page will be helpful for users to find the necessary help easier

Additional Context

N/A

Classiifer Fails On Organizing Volumes By Acquisition Times

Describe the bug:

When the classifier attempts to organize a dicom series of volumes with a volume that has the acquisition time represented as a string and the default value when the AcquisitionTime is unknown is an Int, a TypeError is thrown resulting in a crash as the string cannot be compared to an integer value.

To Reproduce:

Steps to reproduce the behavior:

  1. Input a dicom session into the classifier
  2. Ensure at least one volume has no acquisition time
  3. Run the classifier

Expected behavior:

The classifier should not crash and the comparison should be done without throwing a TypeError
Actual Behavior:

The classifier cannot organize the dicom volumes and crashes due to a TypeError

Screenshots:

N/A
Environment:

  • OS: Ubuntu

Additional context:

N/A

Logs

WARNING: Required DICOM fields: ['PixelBandwidth', 'FlipAngle'] in /mnt/studies/0ea53997-3316e4dd-4db2f8e2-1462f4f4-4677d74c/study_zip_0ea53997-3316e4dd-4db2f8e2-1462f4f4-4677d74c/rrfMpqwWaUHkYGr/UXtRECYfHYabaXA/MR WBL Pasted Series/MR000000.dcm are missing or have invalid values.

Traceback (most recent call last):
  File "/home/botimagedev/VERSION_COMPARISON/process_version.py", line 95, in <module>
    run_one_filesystem_study(
  File "/home/botimagedev/.venv/pr534/lib/python3.10/site-packages/botimageai/pipeline/filesystem_processing.py", line 65, in run_one_filesystem_study
    found_study_prostatid_dicom_volumes_dict = make_study_inputs_listing(
  File "/home/botimagedev/.venv/pr534/lib/python3.10/site-packages/botimageai/util/util_support_lib.py", line 226, in make_study_inputs_listing
    ) = find_study_info_and_best(curren_directory, refr_dictionary_sequences)
  File "/home/botimagedev/.venv/pr534/lib/python3.10/site-packages/botimageai/util/util_support_lib.py", line 271, in find_study_info_and_best
    ProstatIDDicomStudyToVolumesMapping(
  File "/home/botimagedev/.venv/pr534/lib/python3.10/site-packages/botimageai/dicom_processing/process_one_dicom_study_to_volumes_mapping.py", line 92, in __init__
    super().__init__(
  File "/home/botimagedev/.venv/pr534/lib/python3.10/site-packages/dcm_classifier/study_processing.py", line 167, in __init__
    self.__identify_single_volumes(self.study_directory)
  File "/home/botimagedev/.venv/pr534/lib/python3.10/site-packages/dcm_classifier/study_processing.py", line 338, in __identify_single_volumes
    volumes_dictionary[sn].add_volume_to_series(subseries_info)
  File "/home/botimagedev/.venv/pr534/lib/python3.10/site-packages/dcm_classifier/dicom_series.py", line 255, in add_volume_to_series
    self.organize_volumes()
  File "/home/botimagedev/.venv/pr534/lib/python3.10/site-packages/dcm_classifier/dicom_series.py", line 262, in organize_volumes
    sorted(
TypeError: '<' not supported between instances of 'str' and 'int'

Possible Fix

Modify the sanitization of the AcquisitionTime field to convert the value into a float. Ensure the default and the type after sanitization is the same type.

Develop Issue and Pull Request Templates

Description:

The current dcm-classifier repository lacks templates for creating issues and pull requests, which can lead to inconsistent or incomplete information provided by contributors. Implementing standardized templates for issues and pull requests will greatly improve the clarity and organization of communication within the repository.

Objective:

  • Standardization: Provide consistent structures for issue and pull request descriptions to ensure that all relevant information is included.
  • Clarity: Help contributors understand what information is required when submitting issues or pull requests.
  • Efficiency: Streamline the process of creating issues and pull requests by pre-filling common sections, such as description, steps to reproduce, and expected behavior.

Tasks:

  • Create Issue Template: Design a template for creating new issues. Include sections for a clear title, detailed description, steps to reproduce (if applicable), expected behavior, and any relevant screenshots or code snippets.

  • Develop Pull Request Template: Design a template for creating new pull requests. Include sections for a clear title, description of changes, related issues or pull requests, steps to test the changes, and any additional context necessary for review.

  • Testing: Test the templates by creating sample issues and pull requests to ensure that the provided structure is intuitive and effective.

  • Documentation: Update the repository's README or CONTRIBUTING file to provide guidance on how to use the new issue and pull request templates.

Deliverables:

  • Issue Template: ISSUE_TEMPLATE.md
  • Pull Request Template: PULL_REQUEST_TEMPLATE.md
  • Updated documentation reflecting the use of the new templates.

Importlib Metadata Bug

Describe the bug:

Encountering ImportError during local development when trying to import the package version using importlib.metadata.version("package_name"). The error occurs because the package metadata is not found for a package that is not yet installed in the development environment.

To Reproduce:

Steps to reproduce the behavior:

  1. Clone the repository to a local machine.
  2. Navigate to the project directory and set up a virtual environment.
  3. Without installing the package, run a script or test that imports the package version using importlib.metadata.version("package_name").
  4. See the ImportError: No package metadata was found for package_name.

Expected behavior:

The local development environment should handle the package version import gracefully, allowing developers to run and test the code without needing the package to be installed as a distribution.

Actual Behavior:

An ImportError is thrown, stating that no package metadata was found, which interrupts the development and testing process.

Screenshots:

Environment:
Linux Ubuntu 22.04

Additional context:

This issue impacts local development and testing workflows, particularly when the package is in a pre-release state and not yet installed as an editable or distributed package.

Possible Fix

A potential solution is to wrap the importlib.metadata.version call in a try-except block, defaulting to a development version string (e.g., "0.0.0-dev") in case of a PackageNotFoundError. This allows local development to proceed without requiring the package to be installed.

Add Dependecy Installation In Package Pip Install

If this feature request related to a problem? Please describe.

Currently, when installing a pip package, only the package itself is installed without considering its dependencies. This can lead to issues when users try to use the package without having the necessary dependencies installed. This feature request aims to address this problem by implementing the ability to install the minimum requirements for the pip package when it is installed using pip.

Alternatives Considered If Applicable

One alternative solution could be to manually specify the minimum requirements for the package in the package documentation and prompt users to install them separately. However, this approach may lead to confusion and inconvenience for users, especially those who are not familiar with the package dependencies.

Rationale

Implementing the ability to install the minimum requirements for the pip package when installed using pip would greatly improve the user experience and reduce potential issues caused by missing dependencies. By automatically installing the necessary dependencies along with the package, users can start using the package immediately without having to manually install additional dependencies.

Implementation Ideas

One possible implementation idea is to include the minimum requirements for the package in the package metadata (e.g., setup.py or pyproject.toml file). When users install the package using pip, the installation process would automatically detect and install the specified dependencies along with the package.

Another approach could involve modifying the pip installer to recognize and install the minimum requirements specified by the package. This would require changes to the pip codebase to implement the new feature.

Additional Context

Ensuring that the minimum requirements for the pip package are installed along with the package itself would streamline the installation process for users and improve the usability of the package. This feature aligns with the goal of enhancing user experience and reducing potential issues related to missing dependencies.

Develop Issue and Pull Request Templates

Description:

The current dcm-classifier repository lacks templates for creating issues and pull requests, which can lead to inconsistent or incomplete information provided by contributors. Implementing standardized templates for issues and pull requests will greatly improve the clarity and organization of communication within the repository.

Objective:

  • Standardization: Provide consistent structures for issue and pull request descriptions to ensure that all relevant information is included.
  • Clarity: Help contributors understand what information is required when submitting issues or pull requests.
  • Efficiency: Streamline the process of creating issues and pull requests by pre-filling common sections, such as description, steps to reproduce, and expected behavior.

Tasks:

  • Create Issue Template: Design a template for creating new issues. Include sections for a clear title, detailed description, steps to reproduce (if applicable), expected behavior, and any relevant screenshots or code snippets.

  • Develop Pull Request Template: Design a template for creating new pull requests. Include sections for a clear title, description of changes, related issues or pull requests, steps to test the changes, and any additional context necessary for review.

  • Testing: Test the templates by creating sample issues and pull requests to ensure that the provided structure is intuitive and effective.

  • Documentation: Update the repository's README file to provide guidance on how to use the new issue and pull request templates.

Deliverables:

  • Templates for bug reports, chores, documentation improvements, feature requests, user stories, and pull requests.
  • Updated documentation reflecting the use of the new templates.

Include Model File in Package

If this feature request related to a problem? Please describe.

This feature request aims to address the problem of packaging a machine learning model file within the Python package. Currently, the absence of a model file limits the usability and portability of the package, as users need to separately obtain and load the model file to utilize the package's functionality. By including a pre-trained model file within the package, users can easily install the package and immediately start using the provided model for inference tasks without additional manual steps.

Alternatives Considered If Applicable

N/A

Rationale

The addition of a model file to the pip package offers several benefits:

  • Enhanced Usability: Simplifies the user experience by providing a complete solution within the package, eliminating the need for users to locate and download the model file separately.
  • Improved Portability: Enables users to deploy the package in different environments without worrying about managing external dependencies, thus enhancing the portability and reproducibility of their workflows.
  • Streamlined Installation: Facilitates a seamless installation process for users, reducing setup time and potential errors associated with manual configuration steps.

Implementation Ideas

  • Identify the appropriate machine learning model file to include in the package. This could be a pre-trained model file serialized in a common format such as HDF5, Pickle, or ONNX.
  • Ensure that the model file is compatible with the package's functionality and can be easily loaded and utilized by users.
  • Update the package's setup script to include the model file in the distribution package generated by setuptools or Poetry.
  • Provide clear documentation and usage examples demonstrating how users can access and utilize the included model file within the package.
  • Consider implementing versioning or compatibility checks to ensure that the included model file matches the package version and is compatible with future updates.

Additional Context

Including a model file within the pip package enhances the overall user experience and makes the package more accessible to a wider audience. By bundling the model file with the package, users can seamlessly integrate machine learning capabilities into their Python applications, accelerating development and deployment workflows. Additionally, providing a pre-trained model file within the package aligns with the project's goal of delivering comprehensive and user-friendly solutions to its users.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.