Git Product home page Git Product logo

innereye-createdataset's Introduction

This project is now archived

This project is no longer under active maintenance. It is read-only, but you can still clone or fork the repo. Check here for further info. Please contact [email protected] if you run into trouble with the "Archived" state of the repo.

Introduction

InnerEye-CreateDataset contains tools to convert medical datasets in DICOM-RT format to NIFTI. Datasets converted using this tool can be consumed directly by InnerEye-DeepLearning.

Among the core features of this tools are:

  • Resampling of the dataset to a common voxel size.
  • Renaming of ground truth structures
  • Making the structures in a dataset mutually exclusive (this is required by some loss functions in InnerEye-DeepLearning)
  • Creating empty structures if they are missing from the dataset
  • Discarding subjects that do not have all the required structures
  • Augmenting the dataset by combining multiple structures into one, via set operations (intersection, union)
  • Removing parts of structures that are lower/higher than other structures in terms of their z coordinate
  • Computing statistics of a dataset, to identify outliers and possible annotation errors

Installing

Git for Windows

Get the installer from Git for Windows

The installer will prompt you to "Select Components". Make sure that you tick the boxes for:

  • Git LFS (Large File Support).
  • Git Credential Manager for Windows.

After the installation, open a command prompt or the Git Bash:

  • Run git lfs install to set up the hooks in git
  • Run git config --global core.autocrlf true to ensure that line endings are working as expected

Clone the InnerEye-CreateDataset repository on your machine: Run git lfs clone --recursive https://github.com/microsoft/InnerEye-CreateDataset

Visual Studio

Install

You need an installation of [Visual Studio 2019]. If you have an existing installation, start the Visual Studio Installer, click on "More..." -> "Modify".

In the "Workloads" section, the following items need to be selected:

  • .NET Development
  • Desktop development with C++

In the "Individual Components" section, make sure the following are ticked:

  • .NET:
    • .NET 6.0 Runtime
    • .NET Core 3.1 Runtime (Long Term Support)
    • Everything with .NET Framework 4.6.2 (and all higher framework versions for good measure)
    • .NET SDK
  • Compilers, build tools, and runtimes:
    • .NET Compiler Platform SDK
    • C++ 2019 Redistributable Update
    • MSVC v142 - VS 2019 C++ x64/x86 build tools (Latest)
    • C++ CMake tools for Windows
  • Debugging and testing:
    • C++ AddressSanitizer
    • C++ profiling tools
  • Development actitives:
    • C++ core features
    • F# language support
  • SDKs, libraries and frameworks:
    • C++ ATL for latest v142 build tools (x86 & x64)
    • Windows 10.0.19041.0

As well as the above listed componenets, some others may be installed also as part of the selected workloads.

Set Up CreateDataset solution

Then open the Source\projects\CreateDataset.sln solution.

You will see a dialog box suggesting that you upgrade two C++ projects to the latest toolset. Choose NOT to upgrade.

Make sure that the required nuget package sources are available for the solution:

  • Open Tools->NuGet Package Manager->Package Manager Settings

  • Choose NuGet Package Manager->Package Sources

  • Add the following sources to the list, if they are not there:

  • Select the above sources, and deselect others

Verify that all projects loaded correctly.

  • In the Visual Studio menu, make sure that "Test" / "Test Settings" / "Default Processor Architecture" is set to x64.
  • Build the solution ("Build" -> "Build Solution"). If it fails, build again.

To run tests: After the build, tests should be visible in the Test Explorer.

Convert a Dicom-RT dataset to NIFTI

To use the tool you will need a DICOM-RT dataset with the ground truth scans and rt-struct files describing the ground truth segmentations. The folder structure should have the files for each subject in a separate folder. Inside a folder, the script will search all subdirectories for files as well.

Now, create a parent folder called, for example, datasets and place your DICOM-RT dataset folder inside. The folder structure should resemble the following

* datasets
  * DICOM-RT dataset
    * subject 1
      * DICOM files for subject 1
    * series 2
      * DICOM files for subject 2
    .
    .
    .

The simplest form of the command to run is

InnerEye.CreateDataset.Runner.exe dataset --datasetRootDirectory=<path to directory holding all datasets> --niftiDatasetDirectory=<name of the folder to write to> --dicomDatasetDirectory=<name of dataset to be converted>
  • datasetRootDirectory is the path to a folder that holds one or more datasets.
  • dicomDatasetDirectory is the name of the folder, in datasetRootDirectory, with the DICOM-RT dataset.
  • niftiDatasetDirectory is the name of the folder to which the NIFTI dataset should be written. This folder will be created in datasetRootDirectory
  • One common switch is the geoNorm switch that performs normalization on the dataset voxel sizes, which takes the sizes in millimeters for the x, y, and z dimensions. For example --geoNorm 1;1;2

A description of the major commandline options that control the dataset creation can be found here.

Run Analysis on a converted dataset

To analyse a dataset, run

InnerEye.CreateDataset.Runner.exe analyze  --datasetFolder=<full path to the NIFTI dataset folder to analyse>

This will create a folder called statistics inside the dataset folder with several csv files containing dataset statistics. A detailed explanation of the csv files is available here.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

innereye-createdataset's People

Contributors

ant0nsc avatar peterhessey avatar shruthi42 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

innereye-createdataset's Issues

Improve Install Documenation

The following issues need correcting in the installation documentation:

  • git lfs is now deprecated and should be updated to use modern git commands
  • Emphasise more that VS 2017 is the version of VS that needs to be used.
  • Correct package name VS++ to VC++

Lack of documentation for command-line arguments

This tool contains many useful and powerful features, but none of these are documented. Documentation on how to use this tool to its full potential needs to be created to make it easier for our clinical partners to work with their datasets.

dataset.csv generated by this tool does not have the tags column

This tool does not add a "tags" column expected by InnerEye-DeepLearning and it generates this eerror:

File "InnerEyePrivate/ML/runner.py", line 50, in
main()
File "InnerEyePrivate/ML/runner.py", line 44, in main
runner.run(project_root=fixed_paths.repository_root_directory(),
File "/mnt/batch/tasks/shared/LS_root/jobs/radiomicsnn/azureml/master_1674467245_aec08e9c/wd/azureml/master_1674467245_aec08e9c/innereye-deeplearning/InnerEye/ML/runner.py", line 457, in run
return runner.run()
File "/mnt/batch/tasks/shared/LS_root/jobs/radiomicsnn/azureml/master_1674467245_aec08e9c/wd/azureml/master_1674467245_aec08e9c/innereye-deeplearning/InnerEye/ML/runner.py", line 220, in run
self.run_in_situ(azure_run_info)
File "/mnt/batch/tasks/shared/LS_root/jobs/radiomicsnn/azureml/master_1674467245_aec08e9c/wd/azureml/master_1674467245_aec08e9c/innereye-deeplearning/InnerEye/ML/runner.py", line 411, in run_in_situ
self.ml_runner.setup(azure_run_info)
File "/mnt/batch/tasks/shared/LS_root/jobs/radiomicsnn/azureml/master_1674467245_aec08e9c/wd/azureml/master_1674467245_aec08e9c/innereye-deeplearning/InnerEye/ML/run_ml.py", line 208, in setup
self.container.setup()
File "/mnt/batch/tasks/shared/LS_root/jobs/radiomicsnn/azureml/master_1674467245_aec08e9c/wd/azureml/master_1674467245_aec08e9c/innereye-deeplearning/InnerEye/ML/lightning_base.py", line 160, in setup
dataset_splits = self.config.get_dataset_splits()
File "/mnt/batch/tasks/shared/LS_root/jobs/radiomicsnn/azureml/master_1674467245_aec08e9c/wd/azureml/master_1674467245_aec08e9c/innereye-deeplearning/InnerEye/ML/model_config_base.py", line 214, in get_dataset_splits
splits = self.get_model_train_test_dataset_splits(dataset_df)
File "/mnt/batch/tasks/shared/LS_root/jobs/radiomicsnn/azureml/master_1674467245_aec08e9c/wd/azureml/master_1674467245_aec08e9c/InnerEyePrivate/ML/configs/segmentation/Prostate.py", line 23, in get_model_train_test_dataset_splits
test = list(dataset_df[dataset_df.tags.str.contains("ContinuousLearning")].subject.unique())
File "/azureml-envs/azureml_ee6e8ff99c839137094f58bdb42aca60/lib/python3.8/site-packages/pandas/core/generic.py", line 5130, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'tags'

Building ImageProcessingClr breaks with a linker error

2>ConnectedComponentsClr.obj : /DEBUG:FASTLINK is not supported when managed code is present; restarting link with /DEBUG:FULL
2>ImageProcessing.lib(GaussianKernel1D.obj) : warning LNK4075: ignoring '/EDITANDCONTINUE' due to '/OPT:LBR' specification
2>LINK : fatal error LNK1104: cannot open file 'MSCOREE.lib'

Resolve by https://stackoverflow.com/questions/41030806/visual-studio-c-cli-mysterious-error-with-template#:~:text=Check%20in%20Visual%20Studio%20installer%20%27C%2B%2B%2FCLI%20support%27%20for,Add%20both%20entries%20from%20below%20separated%20by%20semi-colon, installing .net 4.6.1 SDK?

Windows SDK version is incorrect

The README currently recommends selecting the Windows SDK 10.0.17134.0 when installing Visual Studio components. This causes the build process to fail. The correct version is 10.0.19041.0

InnerEye-CreateDataSet for Linux + other suggestions

Hello,
Is there an option to call the create-dataset-utility on a linux environment (through the bash)? If yes can I find somewhere documentation (i.e. how to build the binary from source c code)?

Suggestion1: The utility gives the user the option to "restructure" ones nifti images+segmentations. The result will be a proper folder structure, nomenclature and modifications (normalization etc) of the nifti images+segmentations to adhere to the constraints of the innereye-deeplearning.
Example:

init_images/
├── HUSAH150_ANON0005_ax0.nii.gz (this is a head ct volume)
├── HUSAH150_ANON0005_ax0_seg.nii.gz (this is its corresponding lesion segmentation)
├── HUSAH150_ANON0008_ax0.nii.gz
├── HUSAH150_ANON0008_ax0_seg.nii.gz
├── HUSAH150_ANON0023_ax0.nii.gz
└── HUSAH150_ANON0023_ax0_seg.nii.gz

restruct_images/
├── subjectID1
│   ├── blood.nii.gz
│   └── head.nii.gz
├── subjectID2
│   ├── blood.nii.gz
│   └── head.nii.gz
└── subjectID3
    ├── blood.nii.gz
    └── head.nii.gz

Why: It is not necessary that the users initial image+segmentation sets will be in a DICOM format.

Suggestion2: For the case where the image+segmentation sets are in DICOM please consider the option of adding a proper DICOM IOD for the segmentations in your options (more details in http://dicom.nema.org/dicom/2013/output/chtml/part03/sect_A.51.html )
Why: The RTStruct is an old format that has been created for radiation therapy. It creates planar contours (surfaces) instead of volumes. When the geometry of the segmented structure is complex, converting the RTSTRUCT IOD to a binary labelmap (like in a typical nifti format segmentation with 0 values on all voxels that are of no interest and 1 for all the voxels of interest) will lead to mistakes and segmentation quality degradation. That's the reason modern segmentation software that can segment DICOM images will save the segmentations as SEG objects and not RTSTRUCT.
Since though a lot of segmentation material exist in the form of RTSTRUCT IODs, the converter of RTSTRUCT to NIFTI of this utility will certainly be beneficial.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.