Git Product home page Git Product logo

imctools's Introduction

imctools

PyPI PyPI - Python Version PyPI - License GitHub Workflow Status (branch)

Deprecation note: This repository is not actively maintained. For a maintained IMC file parser implementation please refer to readimc.

Background

An IMC file conversion tool that aims to convert IMC raw data files (.mcd, .txt) into an intermediary ome.tiff, containing all the relevant metadata. Further it contains tools to generate simpler TIFF files that can be directly be used as input files for e.g. CellProfiller, Ilastik, Fiji etc.

Documentation is available at https://bodenmillergroup.github.io/imctools

Requirements

This package requires Python 3.7 or later.

Using virtual environments is strongly recommended.

Installation

Install imctools and its dependencies with:

pip install imctools

Usage

See Quickstart

Authors

Created and maintained by Vito Zanotelli [email protected] and Anton Rau [email protected]

Contributing

Contributing

Changelog

Changelog

License

MIT

imctools's People

Contributors

bblab-git avatar jwindhager avatar nilseling avatar plankter avatar votti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

imctools's Issues

Support for pathlib

It would be good if the library would support pathlib Path instances for file paths (and not just strings)

Better deal with empty names in OME.TIFFs

Currently the name is left emtpy in the ome.tiff if it was emtpy in the IMC file.
Perhaps it would be better to use the fluor (=Metal) name as name, if no name was provided.

Allow in-memory extraction of slide/panorama/ablation images

Currently, slide/panorama/ablation images stored in the MCD files can only be extracted to disk. It would be desirable to give users the option to write them to custom output, e.g. StringIO, or allow direct binary data extraction (similar to _get_buffer). This would enable in-memory processing without a need for temporary files on disk, which is quite important. Ideally, one would also provide a function that tries to automatically convert the embedded images into numpy arrays (for supported data formats only), e.g. using PIL.

Wrong initialization of panorama coordinates

Consider the following code and output for a panorama instance:

x1, y1 = float(panorama.metadata['SlideX1PosUm']), float(panorama.metadata['SlideY1PosUm'])
x2, y2 = float(panorama.metadata['SlideX2PosUm']), float(panorama.metadata['SlideY2PosUm'])
x3, y3 = float(panorama.metadata['SlideX3PosUm']), float(panorama.metadata['SlideY3PosUm'])
x4, y4 = float(panorama.metadata['SlideX4PosUm']), float(panorama.metadata['SlideY4PosUm'])
x = min(x1, x2, x3, x4)
y = min(y1, y2, y3, y4)
width = max(x1, x2, x3, x4) - x
height = max(y1, y2, y3, y4) - y
print(x, y, width, height)
print(panorama.start_position_x, panorama.start_position_y, panorama.width, panorama.height)
-55.715 112.972 75000.0 25219.775999999998
-55.715 25332.748 75000.0 25219.775999999998

It looks like the current implementation assumes a specific order of the four points in the MCD Panorama metadata. However, to my knowledge, the order is not guaranteed. This might explain the discrepancies observed above.

Runtime error: dictionary keys changed during iteration

Hello,

I installed the imctools using pip and encountered the following error:

python
Python 3.8.0 (default, Nov 6 2019, 21:49:08)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

import imctools.io.mcdparser as mcdparser
import imctools.io.txtparser as txtparser
import imctools.io.ometiffparser as omeparser
import imctools.io.mcdxmlparser as meta

fn_mcd = '20171228_spleen315_500x500.mcd'
mcd = mcdparser.McdParser(fn_mcd)
Traceback (most recent call last):
File "", line 1, in
File "/home/zhanxw/anaconda2/envs/imctools/lib/python3.8/site-packages/imctools/io/mcdparser.py", line 31, in init
McdParserBase.init(self, filename, filehandle, metafilename)
File "/home/zhanxw/anaconda2/envs/imctools/lib/python3.8/site-packages/imctools/io/mcdparserbase.py", line 49, in init
self.parse_mcd_xml()
File "/home/zhanxw/anaconda2/envs/imctools/lib/python3.8/site-packages/imctools/io/mcdparserbase.py", line 160, in parse_mcd_xml
self._meta = McdXmlParser(self.xml)
File "/home/zhanxw/anaconda2/envs/imctools/lib/python3.8/site-packages/imctools/io/mcdxmlparser.py", line 244, in init
meta = libb.dict_key_apply(meta, libb.strip_ns)
File "/home/zhanxw/anaconda2/envs/imctools/lib/python3.8/site-packages/imctools/librarybase.py", line 47, in dict_key_apply
iterable[new_key] = dict_key_apply(iterable[new_key], str_fkt)
File "/home/zhanxw/anaconda2/envs/imctools/lib/python3.8/site-packages/imctools/librarybase.py", line 47, in dict_key_apply
iterable[new_key] = dict_key_apply(iterable[new_key], str_fkt)
File "/home/zhanxw/anaconda2/envs/imctools/lib/python3.8/site-packages/imctools/librarybase.py", line 43, in dict_key_apply
for key in iterable.keys():
RuntimeError: dictionary keys changed during iteration

Can you help to fix this error? Thanks.

Change the way stacks are created from a provided pannel.csv

Currently when ilastik/fullstacks are created from the provided pannel.csv, the channel order of the stacks created is the same order as the channels are in the provided pannel.csv

However I am wondering if it would be better to order the channels in the created stacks by mass and metals, as this would make it somewhat more reproducible if you e.g. accidentally reorder channels in the pannel.csv. However changing this would mean somewhat breaking compatibility.

In order to not completly break compatibility I would suggest the following behaviour:

a) if you are defining channels in the pannel.csv, the order would be according to mass, metal (not as it was before)

b) you can also provide a .csv containing only metal names (e.g. one of the ones that is generated alongside any analysis tiff) or a list of metals, in which case the channels will have the exact same order as in this .csv or the list provided. Like this you could still reproduce the old behavior, if you e.g. want to generate more stacks for an existing experiment.

The convert_folder2imcfolder function returns AssertionError: assert(len(mcd_files) == 1)

Hello

Firstly, thanks so much for producing such a well documented and useful tool! We appreciate your work.

I have a question regarding the imcpreprocessing.iynb tutorial. I am currently trying to convert 3 .mcd file into .tiffs for a project I am working on. I have saved the .mcd files in the following format (trying to stay as close as possible to the format you gave for the 2 fluidigm example datasets):

..../bm/figs/hyperion_mcd_files/F67Bone.zip/F67Bone.mcd
..../bm/figs/hyperion_mcd_files/F69Bone.zip/F69Bone.mcd
..../bm/figs/hyperion_mcd_files/F70Bone.zip/F70Bone.mcd

When I run the "convert mcd containing folders into imc zip folders" cell in the tutorial I am met with an assertion error: AssertionError: assert(len(mcd_files) == 1). Details in screenshot.

Could this be because I am not able to provide the "associated .txt file generated during the acquisition of this .mcd file". Or is this due to another issue? I am able to run the tutorial notebook successfully with the test dataset

Thanks in advance for your help!

Simone
Screenshot 2019-10-25 at 16 07 07

Cellprofiler 3

Hi is imctools compatible with CP3? Not sure if i missed an update.

Also is there a module that performs a median filter in a 3x3 area?

Thank you again
AC

imcfolderwriter should hold a reference of the full path of the created folder durning write operation

while foldername is preserved, the full path where the data are written during write_imc_folder is not.
Same holds in zipfolder=True in write_imc_folder. The location of the newly created zip file should be retrivable from the ImcFolderWriter object

2 properties could be added and 1 could be renamed:
@Property
def imc_folder_name(self):
return self.meta.metaname

@property
def imc_folder_path(self):
    return os.path.join(self.out_folder, self.imc_folder_name)

@property
def imc_folder_zip(self):
    return os.path.join(self.out_folder, self.foldername +'_imc.zip')

imc2tiff not working

While converting imc2tiff, the following happens (imctools.external.temporarydirectory not implemented maybe):

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/mnt/bbvolume/server_homes/mleutenegger/Git/imctools/imctools/scripts/imc2tiff.py", line 6, in <module>
    from imctools.external import temporarydirectory
ImportError: cannot import name temporarydirectory

Unable to parse zipped mcd and txt

Hello @votti ,

I am trying to get the pipeline up and running for our IMC acquisitions but am hitting a wall when trying to convert the zipped mcds into an acceptable input format.

I have zipped the mcd and the region associate .txt files but cannot seem to figure out where things are going wrong, the cell runs fine with the example data

Thanks in advance,

Add a script to export to HistoCat folderstructure

Currently the only way to export metadata into HistoCat is by naming the folders, it would be good to be export the HistoCat folder structure with a customized naming scheme.

As discussed with @ndamond it would be good to have a script that can export imcacquisitions with custom naming, e.g. specified in a metata .csv file.

xarray conversion function for AcquisitionData

It would be nice to have a to_xarray function on the AcquisitionData class. To not add further (required) dependencies to this library, xarray could be dynamically imported when available.

Revisit OME-TIFF writing

I think the OME-TIFF generation should be revisited, for these reasons:

  • The default OME-XML from the bioformats package is flawed (e.g. see the "old format string-s" in the namespace of StructuredAnnotations) and sets some default values that aren't required by the OME-TIFF XML schema
  • Including a copy of the bioformats package is unnecessary, as the OME-TIFF standard provides an XSD which can be used to validate the custom header independent of the bioformats package
  • The current implementation of TiffWriter doesn't support 3D images and pixel sizes != 1
  • Some tifffile.imwrite arguments (i.e., standard TIFF properties) are not (fully) supported, including acquisitiondate and resolution
  • The file extension isn't enforced, which I think would be good practice to discriminate between "ImageJ" and OME TIFFs
  • Documentation on function parameters such as compression is wrong/incomplete
  • Function parameter values are not checked (e.g. assert bigtiff != imagej)

IMO, writing OME-TIFFs is a core functionality that should be fully and comprehensively supported by this library. For reference, I implemented some of the functionality here: https://github.com/BodenmillerGroup/serial-ablation/blob/master/src/serial_ablation/image_reconstruction.py (see the write_tiff function). I would be happy to contribute this (in a modified/more generic version) to a new imctools release.

Add reader for analysis stacks

There should be a reader that generates an IMC acquisition from an 'analysis' stack (=multiplane tiff image) using a metadata.csv with the channel annotations.

Change `bigtiff` default

Originally I thought we would need bigtiff (can be over 4gb/image) in IMC - however until now I never encountered a file close to that and bigtiffs are generally not widely compatible.

I would thus like to change the default for the tiffwriter to bigtiff = False.

No break in compatibility is anticipated for this change, except if somebody really has a single image that is over 4gb in size.

Any opinions?

wrong output of images from MCD

from the new acquisition software 7.0 I have obtained MCD files which both contain multiple ROIs but some ROIs are corrupt. The OMEtiffs, tiffs and single tiffs all do not show the correct image. the dimensions are way off and the content is distorted.
When opening the txt files in the MCDviewer software from Fluidigm the images get displayed correctly.
Anton has the data for investigation.

Correct Fluidigm's comma fault

The attributes roi_start_x_pos_um and roi_start_y_pos_um filled from .mcd metadata need to be divided by 1000 to correct for a bug in Fluidigm's software. This bugfix should only be enabled for software versions known to be affected (all releases until now, including the current one), as Fluidigm might release a bugfix in the future (to my knowledge, they know about the problem).

Inconsistent dimension order

  • The dimension order of AcquisitionData.image_data is undocumented
  • The function name of AcquisitionData.get_image_stack_cyx suggests a cyx dimension order of AcquisitionData.image_data
  • xtiff.to_tiff like most publicly available libraries (e.g. tifffile) expect cyx dimension order, so the function call suggests a cyx channel order of AcquisitionData.image_data as well
  • However, AcquisitionData.to_xarray suggests a cxy dimension order

Probably best to double-check the dimension order and fix this inconsistency.

'PROT_READ' error for mcdparser on Windows

Trying the mcdparser on windows returns:

 __init__
    self.retrieve_mcd_xml()
  File "D:\Python27\lib\site-packages\imctools\io\mcdparser.py", line 64, in ret
rieve_mcd_xml
    mm = mmap.mmap(self._metafh.fileno(), 0, prot=mmap.PROT_READ)
AttributeError: 'module' object has no attribute 'PROT_READ'

(via Email)

Define ome.tiff metadata

Define which metadata should be stored in the OME Tiff.
Reasonable defaults for non existing files as well as parsing it from the MCD file where available.

masscolumn and metalcolumn in ometiff2analysis.py

This part is confusing me [lines 20-24]:
if masscolumn is None:
metalcolumn = metalcolumn
selmetals = [str(n) for s, n in zip(selected, pannel[metalcolumn]) if s]
else:
selmass = [str(n) for s, n in zip(selected, pannel[masscolumn]) if s]

3 things:

  1. 'metalcolumn = metalcolumn' what for ?
  2. you're checking whether masscolumn is None but then using metalcolumn without checking whether it is None or not
  3. What is both metalcolumn and masscolumn are None ?

S.

ome2singletiff error on windows, histocat conversion errors.

Hello. I am working on the IMCpipeline and the imc2tiff conversion successfully completed!

Now, I am working off the IMCpipeline in particular this section
if do_histocat:
if not(os.path.exists(folder_histocat)):
os.makedirs(folder_histocat)
ome2micat.omefolder2micatfolder(folder_ome, folder_histocat, dtype='uint16')

I do have the imctools with this script successfully installed.
python -m imctools.scripts.ome2micat -h
usage: ome2micat [-h] [--mask_folder MASK_FOLDER] [--mask_suffix MASK_SUFFIX]
[--imagetype {None,uint16,int16,float}]
ome_folder out_folder

Convertes an ome folder (or file) to a micat folder

positional arguments:
ome_folder A folder with ome images or a single ome file
out_folder Folder to output the micat folders

optional arguments:
-h, --help show this help message and exit
--mask_folder MASK_FOLDER
Folder containing the masks, or single mask file.
--mask_suffix MASK_SUFFIX
suffix of the mask tiffs
--imagetype {None,uint16,int16,float}
The output image type

However when I issue the command to convert the ome.tiffs to HistoCAT compatible files, there is an error.

folder_ome
'C:/Users/Mohan/Desktop/testing/OUTPUT\ometiff'
folder_histocat
'C:/Users/Mohan/Desktop/testing/OUTPUT\histocat'
ome2micat.omefolder2micatfolder(folder_ome, folder_histocat, dtype='uint16')
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Mohan\Anaconda3\lib\site-packages\imctools\scripts\ome2micat.py", line 72, in omefolder2micatfolder
ome2micatfolder(path_ome, outfolder, path_mask=path_mask, dtype=dtype)
File "C:\Users\Mohan\Anaconda3\lib\site-packages\imctools\scripts\ome2micat.py", line 39, in ome2micatfolder
ome2singletiff(path_ome, outfolder,basename='', dtype=dtype)
File "C:\Users\Mohan\Anaconda3\lib\site-packages\imctools\scripts\ome2micat.py", line 15, in ome2singletiff
ome = ometiffparser.OmetiffParser(path_ome)
File "C:\Users\Mohan\Anaconda3\lib\site-packages\imctools\io\ometiffparser.py", line 22, in init
self.read_image(original_file)
File "C:\Users\Mohan\Anaconda3\lib\site-packages\imctools\io\ometiffparser.py", line 46, in read_image
self.ome = tif.pages[0].tags['image_description'].value
KeyError: 'image_description'

I have tifffile installed
tifffile 0.15.1 py36_1 conda-forge

python -V
Python 3.6.4 :: Anaconda, Inc.

I have CellProfiler Desktop app installed version 2.2.0.
My question is that the error is related to ometiffparser, how can I debug this? is this related to my tifffile version? What should be the dtype parameter for a windows OS?

Thank you very much,

SIncerely,
AC

Correctly sort channels by mass

Due to a bug, imctools v1 did not sort metals numerically but sorted the number string when generating analysis stacks (ie metals [Y89, Ir193] would be sorted as [Ir193, Y89]).

return "".join([m for m in x if m.isdigit()]), x

This should definitely be fixed in the new release, such that channels are correctly sorted by mass.

I think generally it would be good factor to generate a separate functions that gets the metal list from a panel:

def get_channels_from_panel(
    panel_csv_file: str,
    usedcolumn: str ,
    metalcolumn: str = "Metal Tag",
    sort_channels=True,
)
"""
    Get list of metals from a panel and a boolean column.

    Parameters
    ----------
    panel_csv_file
        Name of the CSV file that contains the channels to be written out.
    metalcolumn
        Column name of the metal names.
    usedcolumn
        Column that should contain booleans (0, 1) if the channel should be used, i.e. "ilastik".
    sort_channels
        Whether to sort channels by mass.

  Returns: list of metals
"""

I dont think the 'masses' option needs to be supported in V2.

Handle `.schema` files in raw data folders

The .schema file is a temporary file that gets written by cyTOF during the writing process into the .mcd file. It contains important metadata information to read out the .mcd. Upon sucessful writing of the .mcd the IMC software deletes this file. Thus it usually is a sign of a corrupted .mcd file, if a .schema file is present in a folder. However - due to changed backup practices in our lab - now the .schema files can also be present in standard cases and should be ignored in such cases, as it can be outdated and contain invalid information.

Add factory for parsers

I think it would be a good idea to have a (filename-dependent) factory for parsers. Also, parsers should have a shared base class in my opinion.

Total/Mean Intensity Quantification

Batch calculation of all channel intensities for all images.
Needed for comparison with IHC across TMA data.

A beer bounty will be provided!

get_image_stack throws TypeError if acquisition is empty

get_image_stack methods (e.g. get_image_stack_by_names) throw TypeErrors if the acquisition is empty (there should probably be a check for Acquisition.is_valid). Also, there is a typo in the log message in _get_acquisition_raw_data.

unclear behaviour of imcfolderwriter.write_imc_folder()

imctools.io.imcfolderwriter.write_imc_folder(self, zipfolder=True, remove_folder=None)

remove_folder is used at the end of the method to cleanup out_folder - the folder where all the data have just been created:
if remove_folder:
os.removedirs(out_folder)

While this makes sense when also zipfolder=True (which is the case when also zipfolder=True and remove_folder=None) - I fail to see the logic when zipfolder=False and remove_folder=True
in this case, one would just remove everything just created

did I read it correctly ?
S.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.