diagnijmegen / rse-panimg Goto Github PK

Conversion of medical images to MHA and TIFF.

License: Apache License 2.0

Python 100.00%

rse-panimg's Introduction

panimg

NOT FOR CLINICAL USE

Conversion of medical images to MHA and TIFF. Requires Python 3.8, 3.9, 3.10 or 3.11. libvips-dev and libopenslide-dev must be installed on your system.

Under the hood we use:

SimpleITK
pydicom
pylibjpeg
Pillow
openslide-python
pyvips
oct-converter
wsidicom

Usage

panimg takes a directory and tries to convert the containing files to MHA or TIFF. By default, it will try to convert files from subdirectories as well. To only convert files in the top level directory, set recurse_subdirectories to False. It will try several strategies for loading the contained files, and if an image is found it will output it to the output directory. It will return a structure containing information about what images were produced, what images were used to form the new images, image metadata, and any errors from any of the strategies.

NOTE: Alpha software, do not run this on directories you do not have a backup of.

from pathlib import Path
from panimg import convert

result = convert(
    input_directory=Path("/path/to/files/"),
    output_directory=Path("/where/files/will/go/"),
)

Command Line Interface

panimg is also accessible from the command line. Install the package from pip as before, then you can use:

NOTE: Alpha software, do not run this on directories you do not have a backup of.

panimg convert /path/to/files/ /where/files/will/go/

To access the help test you can use panimg -h.

Supported Formats

Input	Output	Strategy	Notes
`.mha`	`.mha`	`metaio`
`.mhd` with `.raw` or `.zraw`	`.mha`	`metaio`
`.dcm`	`.mha`	`dicom`
`.nii`	`.mha`	`nifti`
`.nii.gz`	`.mha`	`nifti`
`.nrrd`	`.mha`	`nrrd`	¹
`.e2e`	`.mha`	`oct`	²
`.fds`	`.mha`	`oct`	²
`.fda`	`.mha`	`oct`	²
`.png`	`.mha`	`fallback`	³
`.jpeg`	`.mha`	`fallback`	³
`.tiff`	`.tiff`	`tiff`
`.svs` (Aperio)	`.tiff`	`tiff`
`.vms`, `.vmu`, `.ndpi` (Hamamatsu)	`.tiff`	`tiff`
`.scn` (Leica)	`.tiff`	`tiff`
`.mrxs` (MIRAX)	`.tiff`	`tiff`
`.biff` (Ventana)	`.tiff`	`tiff`
`.dcm` (DICOM-WSI)	`.tiff`	`tiff`

1: Detached headers are not supported.

2: Only OCT volume(s), no fundus image(s) will be extracted.

3: 2D only, unitary dimensions

Post Processors

You can also define a set of post processors that will operate on each output file. Post processors will not produce any new image entities, but rather add additional representations of an image, such as DZI or thumbnails. We provide a dzi_to_tiff post processor that is enabled by default, which will produce a DZI file if it is able to. To customise the post processors that run you can do this with

result = convert(..., post_processors=[...])

You are able to run the post processors directly with

from panimg import post_process
from panimg.models import PanImgFile

result = post_process(image_files={PanImgFile(...), ...}, post_processors=[...])

Using Strategies Directly

If you want to run a particular strategy directly which returns a generator of images for a set of files you can do this with

files = {f for f in Path("/foo/").glob("*.dcm") if f.is_file()}

try:
    for result in image_builder_dicom(files=files):
        sitk_image = result.image
        process(sitk_image)  # etc. you can also look at result.name for the name of the file,
                             # and result.consumed_files to see what files were used for this image
except UnconsumedFilesException as e:
    # e.file_errors is keyed with a Path to a file that could not be consumed,
    # with a list of all the errors found with loading it,
    # the user can then choose what to do with that information
    ...

rse-panimg's People

Contributors

Stargazers

Watchers

Forkers

sjoerdk salmaelhdh pawere geertlitjens tien2020le2020

rse-panimg's Issues

install openslide on ubuntu 18.04

Hi.

I want to install your package with this usage :

pip install panimg

This gets me an error while installing openslide-python.

Is there a way to overcome this?

Error :

root@08fb6f42b3c8:/workspace# pip install openslide-python
Collecting openslide-python
  Using cached openslide-python-1.1.2.tar.gz (316 kB)
Requirement already satisfied: Pillow in /opt/conda/lib/python3.8/site-packages (from openslide-python) (8.1.0)
Building wheels for collected packages: openslide-python
  Building wheel for openslide-python (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /opt/conda/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-_51k9nyq/openslide-python/setup.py'"'"'; __file__='"'"'/tmp/pip-install-_51k9nyq/openslide-python/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-9sos8urx
       cwd: /tmp/pip-install-_51k9nyq/openslide-python/
  Complete output (17 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.8
  creating build/lib.linux-x86_64-3.8/openslide
  copying openslide/__init__.py -> build/lib.linux-x86_64-3.8/openslide
  copying openslide/_version.py -> build/lib.linux-x86_64-3.8/openslide
  copying openslide/deepzoom.py -> build/lib.linux-x86_64-3.8/openslide
  copying openslide/lowlevel.py -> build/lib.linux-x86_64-3.8/openslide
  running build_ext
  building 'openslide._convert' extension
  creating build/temp.linux-x86_64-3.8
  creating build/temp.linux-x86_64-3.8/openslide
  gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/conda/include/python3.8 -c openslide/_convert.c -o build/temp.linux-x86_64-3.8/openslide/_convert.o
  unable to execute 'gcc': No such file or directory
  error: command 'gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for openslide-python
  Running setup.py clean for openslide-python
Failed to build openslide-python
Installing collected packages: openslide-python
    Running setup.py install for openslide-python ... error
    ERROR: Command errored out with exit status 1:
     command: /opt/conda/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-_51k9nyq/openslide-python/setup.py'"'"'; __file__='"'"'/tmp/pip-install-_51k9nyq/openslide-python/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-zl_tqq6l/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/include/python3.8/openslide-python
         cwd: /tmp/pip-install-_51k9nyq/openslide-python/
    Complete output (17 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.8
    creating build/lib.linux-x86_64-3.8/openslide
    copying openslide/__init__.py -> build/lib.linux-x86_64-3.8/openslide
    copying openslide/_version.py -> build/lib.linux-x86_64-3.8/openslide
    copying openslide/deepzoom.py -> build/lib.linux-x86_64-3.8/openslide
    copying openslide/lowlevel.py -> build/lib.linux-x86_64-3.8/openslide
    running build_ext
    building 'openslide._convert' extension
    creating build/temp.linux-x86_64-3.8
    creating build/temp.linux-x86_64-3.8/openslide
    gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/conda/include/python3.8 -c openslide/_convert.c -o build/temp.linux-x86_64-3.8/openslide/_convert.o
    unable to execute 'gcc': No such file or directory
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /opt/conda/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-_51k9nyq/openslide-python/setup.py'"'"'; __file__='"'"'/tmp/pip-install-_51k9nyq/openslide-python/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-zl_tqq6l/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/include/python3.8/openslide-python Check the logs for full command output.

Unknown colour space exception

We do not have any source images for this but on Grand Challenge we often see this error:

File "/opt/poetry/.venv/lib/python3.8/site-packages/panimg/panimg.py", line 40, in convert
_convert_directory(
File "/opt/poetry/.venv/lib/python3.8/site-packages/panimg/panimg.py", line 116, in _convert_directory
builder_result = _build_files(
File "/opt/poetry/.venv/lib/python3.8/site-packages/panimg/panimg.py", line 148, in _build_files
for result in builder(files=files):
File "/opt/poetry/.venv/lib/python3.8/site-packages/panimg/image_builders/metaio_mhd_mha.py", line 110, in image_builder_mhd
yield SimpleITKImage(
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
raise validation_error
pydantic.error_wrappers.ValidationError: 1 validation error for SimpleITKImage
image
Unknown color space for MetaIO image: 112 (type=value_error)

Allow non-pathology tiff file conversion

Enface ophthalmology images are regularly exported from the machines as .TIF files. When these are handled by panimg they will be interpreted as pathology images. It would be nice if we could somehow differentiate between "normal" and "pathology" TIFF files and convert the "normal" TIFF files to MetaIO format so that handling of these files works correctly on grand-challenge and in the workstations.

@miriam-groeneveld do you know if there are some special headers in the pathology TIFF files that we can use to recognize these? I will check the same for the ophthalmology images.

Processing 4D MHA results in mangled TransformMatrix

When uploading a 4D MHA file, the header TransformMatrix is being mangled. This results in incorrect voxel-to-world placement.

Pre-upload header:
TransformMatrix = 0.99984952927629189 0.0019038487920032977 0.017242220441539672 0 4.97709998215079e-11 0.99395913200852193 -0.1097508264063002 0 -0.017347011442247363 0.10973431212088146 0.99380957025849759 0 0 0 0 1

Post-upload header:
TransformMatrix = 0.99984952927629189 4.97709998215079e-11 -0.017347011442247363 0 0.0019038487920032977 0.99395913200852193 0.10973431212088146 0 0.017242220441539672 -0.1097508264063002 0.99380957025849759 0 0 0 0 1

PRE (all headers)

ObjectType = Image
NDims = 4
BinaryData = True
BinaryDataByteOrderMSB = False
CompressedData = True
CompressedDataSize = 62117126
TransformMatrix = 0.99984952927629189 0.0019038487920032977 0.017242220441539672 0 4.97709998215079e-11 0.99395913200852193 -0.1097508264063002 0 -0.017347011442247363 0.10973431212088146 0.99380957025849759 0 0 0 0 1
Offset = -94.202199941380002 -65.439647421952998 -35.101142521466997 0
CenterOfRotation = 0 0 0 0
ElementSpacing = 0.85714286565780995 0.85714286565780995 2.9999999999999916 1
DimSize = 224 224 26 45
AnatomicalOrientation = ????
ElementType = MET_USHORT
ElementDataFile = LOCAL

Post (all header)

Postprocessed
ObjectType = Image
NDims = 4
BinaryData = True
BinaryDataByteOrderMSB = False
CompressedData = True
CompressedDataSize = 62117126
TransformMatrix = 0.99984952927629189 4.97709998215079e-11 -0.017347011442247363 0 0.0019038487920032977 0.99395913200852193 0.10973431212088146 0 0.017242220441539672 -0.1097508264063002 0.99380957025849759 0 0 0 0 1
Offset = -94.202199941380002 -65.439647421952998 -35.101142521466997 0
CenterOfRotation = 0 0 0 0
ElementSpacing = 0.85714286565780995 0.85714286565780995 2.9999999999999916 1
DimSize = 224 224 26 45
AnatomicalOrientation = ????
ElementType = MET_USHORT
ElementDataFile = LOCAL

changing title of mha file

How can i change the title of the mha file it produces (to the name of the orginal file)? Additionally i would also like to change the folder where the new mha file it being dropped into (or ideally, even completly eliminate the folder and put all new mha directly files into the output directory).

I tried to change the code of the panimg python file. specifically of the builder function however the packages stopped working after that.
It there a build in way, that can achieve this?

Aperio image converted to tiff gets color encoding YCbCr, even when original is RGB

Option to disable header validation

We sometimes get data with invalid headers, often caused by anonymization software - for example, in DICOM files fields like PatientBirthDate might are often replaced by a string like "ANONYMIZED". That is invalid DICOM in principle, but we still need to be able to read those files (mainly outside of grand challenge).

It would be useful to be able to turn off header validation, or to have panimg ignore (=remove) invalid headers instead of rejecting the image.

`nifti` and `nrrd` readers are not doing any header validation

See comic/grand-challenge.org#1376 (comment)

TypeError: WriteImage() got an unexpected keyword argument 'image'

What could cause this error?

Traceback (most recent call last):

Cell In[303], line 1
result = convert(

File C:\MyPrograms\Anaconda\Lib\site-packages\panimg\panimg.py:33 in convert
_convert_directory(

File C:\MyPrograms\Anaconda\Lib\site-packages\panimg\panimg.py:104 in _convert_directory
builder_result = _build_files(

File C:\MyPrograms\Anaconda\Lib\site-packages\panimg\panimg.py:136 in _build_files
n_image, n_image_files = result.save(

File C:\MyPrograms\Anaconda\Lib\site-packages\panimg\models.py:380 in save
WriteImage(

TypeError: WriteImage() got an unexpected keyword argument 'image'

.e2e file support

Placeholder issue for July cycle pitch: https://github.com/DIAGNijmegen/rse-roadmap/issues/63

TIFF importer modifies the source directory

When running this the source directory should only be read only. The TIFF importer currently produces TIFF files for mxrs files alongside the source images. They should be created in the target directory.

Error when window center and width are arrays in MHA

eg WindowCenter = [60, 60], same for window width. This is perfectly valid, so should be accepted.

See comic/grand-challenge.org#1376

Improve support for Enhanced DICOM

Reading enhanced DICOM where all slices are stored in a single .dcm file does not work correctly yet. For example, the spacing between slices is not computed correctly.

ZeroDivisionError in dicom.py

Uncaught ZeroDivisionError can in L173 of dicom.py - @nlessmann could you solve this as part of #71 ?

  File "panimg/panimg.py", line 90, in _convert_directory
    _convert_directory(
  File "panimg/panimg.py", line 116, in _convert_directory
    builder_result = _build_files(
  File "panimg/panimg.py", line 148, in _build_files
    for result in builder(files=files):
  File "panimg/image_builders/dicom.py", line 434, in image_builder_dicom
    studies = _validate_dicom_files(files=files, file_errors=file_errors)
  File "panimg/image_builders/dicom.py", line 173, in _validate_dicom_files
    elif len(headers) % n_time > 0:

DICOM importer does not check for missing slices / inconsistent slice spacing

See comic/grand-challenge.org#1301

Unhandled TypeError

TypeError

int() argument must be a string, a bytes-like object or a number, not 'NoneType'

rse-panimg/panimg/image_builders/dicom.py

Line 414 in 4e2a160

key=lambda x: int(x["data"].InstanceNumber)

Make vips and openslide optional dependencies

The library currently requires vips and openslide to process pathology images, but gdcm for processing certain DICOM files is optional. It's now not possible to import only the DICOM image builder because the __init__.py in image_builders imports all image builders, which loads the openslide package, and that crashes if openslide is not installed. It would be useful if vips and openslide could become optional dependencies that are only needed when reading certain images, similar to the gdcm dependency.

I got "GDCM is unavailable"

from this line of the code

I got

GDCM is unavailable, conversion of DICOM with compressed transfer syntax will fail to convert. To correct this, install GDCM

What should I do?
I tried to do

panimg convert my_dat_and_mrxs_directory output.tiff

How do I install GDCM?

Add a CLI with click

Would be great to have panimg convert <src> <dest> and panimg --help available from the shell. Other options could follow later.

Add a `recurse_subdirectories: bool = True` kwarg to `convert`

I would like panimg to only traverse the top level directory. Add this kwarg to convert and _convert_directory, check if the subdirectory should be processed at

rse-panimg/panimg/panimg.py

Line 86 in faf0c34

if o.is_dir():

. Probably pass the kwarg down to the calls of _convert_directory too (although that is probably unnecessary). Add a test with an image in a subdirectory too and check that only the images from the top level directory are created.

Slice spacing is not calculated correctly if slices are not axial

Unlike CT that are always acquired as axial slices, MRI scans are often acquired in different planes so that the individual DICOM files actually differ in their x or y coordinate, and not their z coordinate. However, in panimg the z coordinate is always used to determine the slice spacing, see https://github.com/DIAGNijmegen/rse-panimg/blob/main/panimg/image_builders/dicom.py#L216

Typically, the array avg_origin_diff will have only one non-zero value, I think it might work to use that instead of always using the third value. Not sure though about images that are a bit rotated, which is also possible with MRI.

If I currently try to read a sagittal MRI scan, the image builder fails because the slice spacing is calculated as 0.0 mm and SimpleITK complains about that.

UnboundLocalError local variable 'simple_itk_image' referenced before assignment

panimg/image_builders/metaio_mhd_mha.py in image_builder_mhd at line 118

I think when an uploaded mha file fails metadata validation.

Add support for .e2e and .fds

See comic/grand-challenge.org#1689

Compressed MHA writing is slow

See if SimpleITK have added support for threading yet

comic/grand-challenge.org#1240

Warning when using Pydantic 2

/Users/jamesmeakin/Library/Caches/pypoetry/virtualenvs/grand-challenge-org-DgCYUgrQ-py3.11/lib/python3.11/site-packages/pydantic/_internal/_config.py:257: UserWarning: Valid config keys have changed in V2:

'allow_mutation' has been removed

Support parsing OME-TIFF format

It seems like OME-TIFF format, which contains an OME-XML (https://genomebiology.biomedcentral.com/articles/10.1186/gb-2005-6-5-r47) metadata model in the ImageDescription are seemingly not parsed by panimg.

This metadata also contains physical dimensions (e.g. pixel size in micrometres). The 'regular' tags for this (i.e. XResolution/YResolution and ResolutionUnit) are either set to 0 or to "NONE".

The latter is a problem because panimg explicitly tries to read these to report on this metadata.

TIFF importer mixes DZI creation from source

The TIFF importer mixes DZI and TIFF imports in the same round. They should be split, generating the DZIs from the generated TIFFs in the target directory. Doing this import should be optional.

Remove dependency on tifffile = "2019.1.4"

This dependency is fixed, with openslide I don't think that this is needed any more.

Dicom importer should modify the imported pixel data if DICOM tags specify this

See comic/grand-challenge.org#1140

Naming of cases imported as dicom files

Currently, imported dicom files are named after the studyUID (more precise: [studyUID-0]).

When uploading two series of the same study (e.g. CT Chest and CT abdomen), these series get the same name. As the hanging list of the reader study requires unique names, I cannot add both series in my reader study.

For clarity, the hierarchical levels of dicom are: patient, study, series, image. Now, series seem to be named after the study.

A potential solution is to name the dicom files after the seriesUID instead of the studyUID (I assume we generally review series in GC not whole studies). Alternatively, if the name could for example be numbered by adding the series number e.g. [studyUID-seriesnumber] (if you like to group images of the same study by name).

Expand DICOM test suite

See comic/grand-challenge.org#1279

Cannot upload to PyPi

In #94 a direct dependency was introduced as the version of wsidicom on PyPi sets restrictive upper bounds on Python versions, which were resolved in imi-bigpicture/wsidicom#96 but are currently unreleased. This means that we cannot release new versions of panimg to PyPi until a new version of wsidicom is added there. For now, installs from git can be used by Grand Challenge, then we will need to cascade the releases up.