bodenmillergroup / imctools Goto Github PK

View Code? Open in Web Editor NEW

22.0 21.0 15.0 10 MB

Tools to handle IMC data

Home Page: https://bodenmillergroup.github.io/imctools

License: MIT License

Makefile 0.94% Python 99.06%

tiff mcd ome-tiff imc-data imc

imctools's Introduction

imctools

Deprecation note: This repository is not actively maintained. For a maintained IMC file parser implementation please refer to readimc.

Background

An IMC file conversion tool that aims to convert IMC raw data files (.mcd, .txt) into an intermediary ome.tiff, containing all the relevant metadata. Further it contains tools to generate simpler TIFF files that can be directly be used as input files for e.g. CellProfiller, Ilastik, Fiji etc.

Documentation is available at https://bodenmillergroup.github.io/imctools

Requirements

This package requires Python 3.7 or later.

Using virtual environments is strongly recommended.

Installation

Install imctools and its dependencies with:

pip install imctools

Usage

See Quickstart

Authors

Created and maintained by Vito Zanotelli [email protected] and Anton Rau [email protected]

Contributing

Changelog

License

MIT

imctools's People

Contributors

Stargazers

Watchers

Forkers

starxian bgbofficial arcolombo mohansran hrk2109 eddienko zorrodong paulbarber gjhanchem tabatsky jonhsussman jsymington caucheteux khalilisaeed ghostsp-0

imctools's Issues

Saving acquisitions with zero shape's dimension

convertfolder2imcfolder script is failing due to zero shape of acquisition numpy array passed to tifffile.

Support for pathlib

It would be good if the library would support pathlib Path instances for file paths (and not just strings)

Better deal with empty names in OME.TIFFs

Currently the name is left emtpy in the ome.tiff if it was emtpy in the IMC file.
Perhaps it would be better to use the fluor (=Metal) name as name, if no name was provided.

`omefolder2micatfolder` fails if an antibody name has a backslash

The antibody label needs to be striped of special signs before using it in a filename.

Allow in-memory extraction of slide/panorama/ablation images

Currently, slide/panorama/ablation images stored in the MCD files can only be extracted to disk. It would be desirable to give users the option to write them to custom output, e.g. StringIO, or allow direct binary data extraction (similar to _get_buffer). This would enable in-memory processing without a need for temporary files on disk, which is quite important. Ideally, one would also provide a function that tries to automatically convert the embedded images into numpy arrays (for supported data formats only), e.g. using PIL.

Wrong initialization of panorama coordinates

Consider the following code and output for a panorama instance:

x1, y1 = float(panorama.metadata['SlideX1PosUm']), float(panorama.metadata['SlideY1PosUm'])
x2, y2 = float(panorama.metadata['SlideX2PosUm']), float(panorama.metadata['SlideY2PosUm'])
x3, y3 = float(panorama.metadata['SlideX3PosUm']), float(panorama.metadata['SlideY3PosUm'])
x4, y4 = float(panorama.metadata['SlideX4PosUm']), float(panorama.metadata['SlideY4PosUm'])
x = min(x1, x2, x3, x4)
y = min(y1, y2, y3, y4)
width = max(x1, x2, x3, x4) - x
height = max(y1, y2, y3, y4) - y
print(x, y, width, height)
print(panorama.start_position_x, panorama.start_position_y, panorama.width, panorama.height)

-55.715 112.972 75000.0 25219.775999999998
-55.715 25332.748 75000.0 25219.775999999998

It looks like the current implementation assumes a specific order of the four points in the MCD Panorama metadata. However, to my knowledge, the order is not guaranteed. This might explain the discrepancies observed above.

Runtime error: dictionary keys changed during iteration

Hello,

I installed the imctools using pip and encountered the following error:

python
Python 3.8.0 (default, Nov 6 2019, 21:49:08)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

import imctools.io.mcdparser as mcdparser
import imctools.io.txtparser as txtparser
import imctools.io.ometiffparser as omeparser
import imctools.io.mcdxmlparser as meta

fn_mcd = '20171228_spleen315_500x500.mcd'
mcd = mcdparser.McdParser(fn_mcd)
Traceback (most recent call last):
File "", line 1, in
File "/home/zhanxw/anaconda2/envs/imctools/lib/python3.8/site-packages/imctools/io/mcdparser.py", line 31, in init
McdParserBase.init(self, filename, filehandle, metafilename)
File "/home/zhanxw/anaconda2/envs/imctools/lib/python3.8/site-packages/imctools/io/mcdparserbase.py", line 49, in init
self.parse_mcd_xml()
File "/home/zhanxw/anaconda2/envs/imctools/lib/python3.8/site-packages/imctools/io/mcdparserbase.py", line 160, in parse_mcd_xml
self._meta = McdXmlParser(self.xml)
File "/home/zhanxw/anaconda2/envs/imctools/lib/python3.8/site-packages/imctools/io/mcdxmlparser.py", line 244, in init
meta = libb.dict_key_apply(meta, libb.strip_ns)
File "/home/zhanxw/anaconda2/envs/imctools/lib/python3.8/site-packages/imctools/librarybase.py", line 47, in dict_key_apply
iterable[new_key] = dict_key_apply(iterable[new_key], str_fkt)
File "/home/zhanxw/anaconda2/envs/imctools/lib/python3.8/site-packages/imctools/librarybase.py", line 47, in dict_key_apply
iterable[new_key] = dict_key_apply(iterable[new_key], str_fkt)
File "/home/zhanxw/anaconda2/envs/imctools/lib/python3.8/site-packages/imctools/librarybase.py", line 43, in dict_key_apply
for key in iterable.keys():
RuntimeError: dictionary keys changed during iteration

Can you help to fix this error? Thanks.

Parser classes should be context managers

Parser classes should be implemented as context managers.

Change the way stacks are created from a provided pannel.csv

Currently when ilastik/fullstacks are created from the provided pannel.csv, the channel order of the stacks created is the same order as the channels are in the provided pannel.csv

However I am wondering if it would be better to order the channels in the created stacks by mass and metals, as this would make it somewhat more reproducible if you e.g. accidentally reorder channels in the pannel.csv. However changing this would mean somewhat breaking compatibility.

In order to not completly break compatibility I would suggest the following behaviour:

a) if you are defining channels in the pannel.csv, the order would be according to mass, metal (not as it was before)

b) you can also provide a .csv containing only metal names (e.g. one of the ones that is generated alongside any analysis tiff) or a list of metals, in which case the channels will have the exact same order as in this .csv or the list provided. Like this you could still reproduce the old behavior, if you e.g. want to generate more stacks for an existing experiment.

The convert_folder2imcfolder function returns AssertionError: assert(len(mcd_files) == 1)

Hello

Firstly, thanks so much for producing such a well documented and useful tool! We appreciate your work.

I have a question regarding the imcpreprocessing.iynb tutorial. I am currently trying to convert 3 .mcd file into .tiffs for a project I am working on. I have saved the .mcd files in the following format (trying to stay as close as possible to the format you gave for the 2 fluidigm example datasets):

..../bm/figs/hyperion_mcd_files/F67Bone.zip/F67Bone.mcd
..../bm/figs/hyperion_mcd_files/F69Bone.zip/F69Bone.mcd
..../bm/figs/hyperion_mcd_files/F70Bone.zip/F70Bone.mcd

When I run the "convert mcd containing folders into imc zip folders" cell in the tutorial I am met with an assertion error: AssertionError: assert(len(mcd_files) == 1). Details in screenshot.

Could this be because I am not able to provide the "associated .txt file generated during the acquisition of this .mcd file". Or is this due to another issue? I am able to run the tutorial notebook successfully with the test dataset

Thanks in advance for your help!

Simone

Command line script 'cropsection' not functional

This script is not yet finished and functional!
Needs either be fixed or removed.

Document everything

Cellprofiler 3

Hi is imctools compatible with CP3? Not sure if i missed an update.

Also is there a module that performs a median filter in a 3x3 area?

Thank you again
AC

imcfolderwriter should hold a reference of the full path of the created folder durning write operation

while foldername is preserved, the full path where the data are written during write_imc_folder is not.
Same holds in zipfolder=True in write_imc_folder. The location of the newly created zip file should be retrivable from the ImcFolderWriter object

2 properties could be added and 1 could be renamed:
@Property
def imc_folder_name(self):
return self.meta.metaname

@property
def imc_folder_path(self):
    return os.path.join(self.out_folder, self.imc_folder_name)

@property
def imc_folder_zip(self):
    return os.path.join(self.out_folder, self.foldername +'_imc.zip')

imc2tiff not working

While converting imc2tiff, the following happens (imctools.external.temporarydirectory not implemented maybe):

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/mnt/bbvolume/server_homes/mleutenegger/Git/imctools/imctools/scripts/imc2tiff.py", line 6, in <module>
    from imctools.external import temporarydirectory
ImportError: cannot import name temporarydirectory

Unable to parse zipped mcd and txt

Hello @votti ,

I am trying to get the pipeline up and running for our IMC acquisitions but am hitting a wall when trying to convert the zipped mcds into an acceptable input format.

I have zipped the mcd and the region associate .txt files but cannot seem to figure out where things are going wrong, the cell runs fine with the example data

Thanks in advance,

Add a script to export to HistoCat folderstructure

Currently the only way to export metadata into HistoCat is by naming the folders, it would be good to be export the HistoCat folder structure with a customized naming scheme.

As discussed with @ndamond it would be good to have a script that can export imcacquisitions with custom naming, e.g. specified in a metata .csv file.

get_image_stack_by_*** should return numpy array/view instead of list

The get_image_stack_by_*** functions should return numpy arrays/views instead of lists, because that's likely what the user wants when using these functions and because it avoids unnecessary memory consumption (in the case of views).

xarray conversion function for AcquisitionData

It would be nice to have a to_xarray function on the AcquisitionData class. To not add further (required) dependencies to this library, xarray could be dynamically imported when available.

Revisit OME-TIFF writing

I think the OME-TIFF generation should be revisited, for these reasons:

The default OME-XML from the bioformats package is flawed (e.g. see the "old format string-s" in the namespace of StructuredAnnotations) and sets some default values that aren't required by the OME-TIFF XML schema
Including a copy of the bioformats package is unnecessary, as the OME-TIFF standard provides an XSD which can be used to validate the custom header independent of the bioformats package
The current implementation of TiffWriter doesn't support 3D images and pixel sizes != 1
Some tifffile.imwrite arguments (i.e., standard TIFF properties) are not (fully) supported, including acquisitiondate and resolution
The file extension isn't enforced, which I think would be good practice to discriminate between "ImageJ" and OME TIFFs
Documentation on function parameters such as compression is wrong/incomplete
Function parameter values are not checked (e.g. assert bigtiff != imagej)

IMO, writing OME-TIFFs is a core functionality that should be fully and comprehensively supported by this library. For reference, I implemented some of the functionality here: https://github.com/BodenmillerGroup/serial-ablation/blob/master/src/serial_ablation/image_reconstruction.py (see the write_tiff function). I would be happy to contribute this (in a modified/more generic version) to a new imctools release.

Add reader for analysis stacks

There should be a reader that generates an IMC acquisition from an 'analysis' stack (=multiplane tiff image) using a metadata.csv with the channel annotations.

Change `bigtiff` default

Originally I thought we would need bigtiff (can be over 4gb/image) in IMC - however until now I never encountered a file close to that and bigtiffs are generally not widely compatible.

I would thus like to change the default for the tiffwriter to bigtiff = False.

No break in compatibility is anticipated for this change, except if somebody really has a single image that is over 4gb in size.

Any opinions?

gc3pie sub-package

It seems that gc3pie sub-package https://github.com/BodenmillerGroup/imctools/tree/master/imctools/gc3pie isn't used at the moment. Should it be deleted from master branch?

wrong output of images from MCD

from the new acquisition software 7.0 I have obtained MCD files which both contain multiple ROIs but some ROIs are corrupt. The OMEtiffs, tiffs and single tiffs all do not show the correct image. the dimensions are way off and the content is distorted.
When opening the txt files in the MCDviewer software from Fluidigm the images get displayed correctly.
Anton has the data for investigation.

Tifffile changed image_description attribute

	/usr/local/lib/python2.7/dist-packages/imctools/io/ometiffparser.pyc in read_image(self, filename)
     44         with tifffile.TiffFile(filename) as tif:
     45             self.data = tif.asarray()
---> 46             self.ome = tif.pages[0].tags['image_description'].value
     47 
     48     # @staticmethod

KeyError: 'image_description'

As reported by: https://www.imc-forum.org/viewtopic.php?f=4&t=19

Use xtiff for save_tiff

Maybe use xtiff for saving TIFF files as well, to enable basic consistency checks

Calculate all uncertainties from probabilities

Please add script
Thanks!!!

Factor pannel file parsing out of the ometiff2analysis script

Currently the function to extract the metal list from a pannel is hardcoded within the omeetiff2analysis script - move it to io.__init__.py to make it more generally available.

Correct Fluidigm's comma fault

The attributes roi_start_x_pos_um and roi_start_y_pos_um filled from .mcd metadata need to be divided by 1000 to correct for a bug in Fluidigm's software. This bugfix should only be enabled for software versions known to be affected (all releases until now, including the current one), as Fluidigm might release a bugfix in the future (to my knowledge, they know about the problem).

Inconsistent dimension order

The dimension order of AcquisitionData.image_data is undocumented
The function name of AcquisitionData.get_image_stack_cyx suggests a cyx dimension order of AcquisitionData.image_data
xtiff.to_tiff like most publicly available libraries (e.g. tifffile) expect cyx dimension order, so the function call suggests a cyx channel order of AcquisitionData.image_data as well
However, AcquisitionData.to_xarray suggests a cxy dimension order

Probably best to double-check the dimension order and fix this inconsistency.

'PROT_READ' error for mcdparser on Windows

Trying the mcdparser on windows returns:

 __init__
    self.retrieve_mcd_xml()
  File "D:\Python27\lib\site-packages\imctools\io\mcdparser.py", line 64, in ret
rieve_mcd_xml
    mm = mmap.mmap(self._metafh.fileno(), 0, prot=mmap.PROT_READ)
AttributeError: 'module' object has no attribute 'PROT_READ'

(via Email)

Define ome.tiff metadata

Define which metadata should be stored in the OME Tiff.
Reasonable defaults for non existing files as well as parsing it from the MCD file where available.

masscolumn and metalcolumn in ometiff2analysis.py

This part is confusing me [lines 20-24]:
if masscolumn is None:
metalcolumn = metalcolumn
selmetals = [str(n) for s, n in zip(selected, pannel[metalcolumn]) if s]
else:
selmass = [str(n) for s, n in zip(selected, pannel[masscolumn]) if s]

3 things:

'metalcolumn = metalcolumn' what for ?
you're checking whether masscolumn is None but then using metalcolumn without checking whether it is None or not
What is both metalcolumn and masscolumn are None ?

Add `imagecodecs` as dependency

Otherwise the probability2uncertainty function does not work with Ilastik generated probabiliity maps, due to their LZW compression.

BodenmillerGroup/ImcSegmentationPipeline#26

Add installation instructions to README

Currently the README does not contain recommendations for the installation of imctools.

ome2singletiff error on windows, histocat conversion errors.

Hello. I am working on the IMCpipeline and the imc2tiff conversion successfully completed!

Now, I am working off the IMCpipeline in particular this section
if do_histocat:
if not(os.path.exists(folder_histocat)):
os.makedirs(folder_histocat)
ome2micat.omefolder2micatfolder(folder_ome, folder_histocat, dtype='uint16')

I do have the imctools with this script successfully installed.
python -m imctools.scripts.ome2micat -h
usage: ome2micat [-h] [--mask_folder MASK_FOLDER] [--mask_suffix MASK_SUFFIX]
[--imagetype {None,uint16,int16,float}]
ome_folder out_folder

Convertes an ome folder (or file) to a micat folder

positional arguments:
ome_folder A folder with ome images or a single ome file
out_folder Folder to output the micat folders

optional arguments:
-h, --help show this help message and exit
--mask_folder MASK_FOLDER
Folder containing the masks, or single mask file.
--mask_suffix MASK_SUFFIX
suffix of the mask tiffs
--imagetype {None,uint16,int16,float}
The output image type

However when I issue the command to convert the ome.tiffs to HistoCAT compatible files, there is an error.

folder_ome
'C:/Users/Mohan/Desktop/testing/OUTPUT\ometiff'
folder_histocat
'C:/Users/Mohan/Desktop/testing/OUTPUT\histocat'
ome2micat.omefolder2micatfolder(folder_ome, folder_histocat, dtype='uint16')
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Mohan\Anaconda3\lib\site-packages\imctools\scripts\ome2micat.py", line 72, in omefolder2micatfolder
ome2micatfolder(path_ome, outfolder, path_mask=path_mask, dtype=dtype)
File "C:\Users\Mohan\Anaconda3\lib\site-packages\imctools\scripts\ome2micat.py", line 39, in ome2micatfolder
ome2singletiff(path_ome, outfolder,basename='', dtype=dtype)
File "C:\Users\Mohan\Anaconda3\lib\site-packages\imctools\scripts\ome2micat.py", line 15, in ome2singletiff
ome = ometiffparser.OmetiffParser(path_ome)
File "C:\Users\Mohan\Anaconda3\lib\site-packages\imctools\io\ometiffparser.py", line 22, in init
self.read_image(original_file)
File "C:\Users\Mohan\Anaconda3\lib\site-packages\imctools\io\ometiffparser.py", line 46, in read_image
self.ome = tif.pages[0].tags['image_description'].value
KeyError: 'image_description'

I have tifffile installed
tifffile 0.15.1 py36_1 conda-forge

python -V
Python 3.6.4 :: Anaconda, Inc.

I have CellProfiler Desktop app installed version 2.2.0.
My question is that the error is related to ometiffparser, how can I debug this? is this related to my tifffile version? What should be the dtype parameter for a windows OS?

Thank you very much,

SIncerely,
AC

Deal with temporary files in `omefolder2micatfolder`

Currently the omefolder2micatfolder script fails with temporary files.
BodenmillerGroup/ImcSegmentationPipeline#19

This is a common problem (eg #43).
Maybe it would make sense to do a custom 'listdir' funciton that ignores temporary files and use this instead of os.listdir.

Missing dependency: typing_extensions

When using Python <3.8, there is a hidden dependency to typing_extensions that isn't resolved during setup.

Correctly sort channels by mass

Due to a bug, imctools v1 did not sort metals numerically but sorted the number string when generating analysis stacks (ie metals [Y89, Ir193] would be sorted as [Ir193, Y89]).

imctools/imctools/converters/ome2analysis.py

Line 79 in a42f3ea

return "".join([m for m in x if m.isdigit()]), x

This should definitely be fixed in the new release, such that channels are correctly sorted by mass.

I think generally it would be good factor to generate a separate functions that gets the metal list from a panel:

def get_channels_from_panel(
    panel_csv_file: str,
    usedcolumn: str ,
    metalcolumn: str = "Metal Tag",
    sort_channels=True,
)
"""
    Get list of metals from a panel and a boolean column.

    Parameters
    ----------
    panel_csv_file
        Name of the CSV file that contains the channels to be written out.
    metalcolumn
        Column name of the metal names.
    usedcolumn
        Column that should contain booleans (0, 1) if the channel should be used, i.e. "ilastik".
    sort_channels
        Whether to sort channels by mass.

  Returns: list of metals
"""

I dont think the 'masses' option needs to be supported in V2.

Handle temporary files in raw data .zip folders

When manually zipping files in MacOS, hidden files (.filename.mcd etc) can be present, which currently do not get handled correctly in convertfolder2imcfolder and cause an error.

Handle `.schema` files in raw data folders

The .schema file is a temporary file that gets written by cyTOF during the writing process into the .mcd file. It contains important metadata information to read out the .mcd. Upon sucessful writing of the .mcd the IMC software deletes this file. Thus it usually is a sign of a corrupted .mcd file, if a .schema file is present in a folder. However - due to changed backup practices in our lab - now the .schema files can also be present in standard cases and should be ignored in such cases, as it can be outdated and contain invalid information.

Dependency on 'pandas' not documented

The file "imctools/scripts/ometiff2analysis.py" has a dependency on pandas that could be easily removed.

Add factory for parsers

I think it would be a good idea to have a (filename-dependent) factory for parsers. Also, parsers should have a shared base class in my opinion.

Remove `add_sum` functionality from `omefile_2_analysisfolder`

The current add_sum functionality is still a relict from the time imctools also contained image processing capabilities.

Given that the sum can also be added in Cellprofiler, this option should be removed in V2.

Total/Mean Intensity Quantification

Batch calculation of all channel intensities for all images.
Needed for comparison with IHC across TMA data.

A beer bounty will be provided!

Allow 'open multichannel ome' in Fiji plugin to load multiple files

Loading multiple files at once with 'open multichannel ome' in Fiji would be convenient.

miCAT should now also support 32bit-float

We should be able to read 32bit-float now. Can I have a 32bit-float stack and corresponding mask for testing purpose?

get_image_stack throws TypeError if acquisition is empty

get_image_stack methods (e.g. get_image_stack_by_names) throw TypeErrors if the acquisition is empty (there should probably be a check for Acquisition.is_valid). Also, there is a typo in the log message in _get_acquisition_raw_data.

Zip files >4gb not supported

The zipfile library that is used does not support tiffs >4gb

zipfile.BadZipFile: zipfiles that span multiple disks are not supported

as reported in BodenmillerGroup/ImcSegmentationPipeline#18

Switching to another zip library in IMCtools would likely solve the problem.

As a workaround just unzipped folders can be used.

unclear behaviour of imcfolderwriter.write_imc_folder()

imctools.io.imcfolderwriter.write_imc_folder(self, zipfolder=True, remove_folder=None)

remove_folder is used at the end of the method to cleanup out_folder - the folder where all the data have just been created:
if remove_folder:
os.removedirs(out_folder)

While this makes sense when also zipfolder=True (which is the case when also zipfolder=True and remove_folder=None) - I fail to see the logic when zipfolder=False and remove_folder=True
in this case, one would just remove everything just created

did I read it correctly ?
S.