Git Product home page Git Product logo

gustaveroussy / sopa Goto Github PK

View Code? Open in Web Editor NEW
100.0 6.0 10.0 23.47 MB

Technology-invariant pipeline for spatial omics analysis (Xenium / MERSCOPE / CosMx / PhenoCycler / MACSima / Hyperion) that scales to millions of cells

Home Page: https://gustaveroussy.github.io/sopa/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
generalization memory-management pipeline spatial-omics spatial-transcriptomics snakemake spatialdata multiplexed-imaging

sopa's Introduction

sopa_logo

Spatial omics pipeline and analysis

PyPI Downloads Docs Build Code Style License Imports: isort

Built on top of SpatialData, Sopa enables processing and analyses of image-based spatial omics (spatial transcriptomics and multiplex imaging data) using a standard data structure and output. We currently support the following technologies: Xenium, MERSCOPE, CosMX, PhenoCycler, MACSima, Hyperion. Sopa was designed for generability and low memory consumption on large images (scales to 1TB+ images).

The pipeline outputs contain: (i) Xenium Explorer files for interactive visualization, (ii) an HTML report for quick quality controls, and (iii) a SpatialData .zarr directory for further analyses.

Documentation

Check Sopa's documentation to get started. It contains installation explanations, CLI/API details, and tutorials.

Overview

The following illustration describes the main steps of sopa:

sopa_overview

Xenium Explorer is a registered trademark of 10x Genomics. The Xenium Explorer is licensed for usage on Xenium data (more details here).

Installation

PyPI installation

Sopa can be installed via PyPI on all operating systems. The preferred Python version is python==3.10, but we also support 3.9 to 3.11. On a new environment, run the following command:

pip install sopa

To install extras (for example, if you want to use snakemake/cellpose/baysor/tangram), please run:

pip install 'sopa[snakemake,cellpose,baysor,tangram]'

Important: even though pip install 'sopa[baysor]' will install some dependencies related to baysor, you still have to install the baysor command line (see the official repository) if you want to use it.

Other installation modes

You can clone the repository and run one of these command lines at the root of sopa:

pip install -e . # dev mode installation
poetry install    # poetry installation

Features

Sopa comes in three different flavours, each corresponding to a different use case:

  • Snakemake pipeline: choose a config, and run our pipeline on your spatial data in a couple of minutes
  • CLI: use our command-line-interface for prototyping quickly your own pipeline
  • API: use directly sopa as a Python package for complete flexibility and customization

Snakemake pipeline

Clone our repository, choose a config here (or create your own), and execute our pipeline locally or on a high-performance cluster:

git clone https://github.com/gustaveroussy/sopa.git
cd sopa/workflow
snakemake --configfile=/path/to/yaml_config --config data_path=/path/to/data_directory --cores 1 --use-conda

For more details on snakemake configuration and how to properly setup your environments, please refer to the documentation.

CLI

Below are examples of commands that can be run with the sopa CLI:

> sopa --help # show command names and arguments
> sopa read merscope_directory --technology merscope # read some data
> sopa patchify image merscope_directory.zarr # make patches for low-memory segmentation
> sopa segmentation cellpose merscope_directory.zarr --diameter 60 --channels DAPI # segmentation
> sopa resolve cellpose merscope_directory.zarr # resolve segmentation conflicts at boundaries
> sopa aggregate merscope_directory.zarr --average-intensities # transcripts/channels aggregation
> sopa explorer write merscope_directory.zarr # convert for interactive vizualisation

For a complete description of the CLI, please refer to the documentation.

API

import sopa

# use the 'sopa' python package

For a complete API description, please refer to the documentation.

Cite us

Our article is published in Nature Communications. You can cite our paper as below:

@article{blampey_sopa_2024,
	title = {Sopa: a technology-invariant pipeline for analyses of image-based spatial omics},
	volume = {15},
	url = {https://www.nature.com/articles/s41467-024-48981-z},
	doi = {10.1038/s41467-024-48981-z},
	journal = {Nature Communications},
	author = {Blampey, Quentin and Mulder, Kevin and Gardet, Margaux and Christodoulidis, Stergios and Dutertre, Charles-Antoine and André, Fabrice and Ginhoux, Florent and Cournède, Paul-Henry},
	year = {2024},
	note = {Publisher: Nature Publishing Group},
	pages = {4981},
}

sopa's People

Contributors

pakiessling avatar quentinblampey avatar stergioc avatar tdefa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

sopa's Issues

sdata object has no cellpose or baysor boundaries

Hello to the SOPA creators.

I am quite a beginner to Python and the Command Line Interface. I have attempted to run the Ready made Snakemake pipeline for Xenium data (with the Baysor cell segmentation feature)

However, i get the following warning and the error in the resolve_baysor section of the pipeline:

[WARNING] (sopa._sdata) sdata object has no cellpose boundaries and no baysor boundaries. Consider running segmentation first.

and the error:

Finished job 3.
3 of 8 steps (38%) done
MissingInputException in rule resolve_baysor in file C:\Windows\System32\sopa\workflow\Snakefile, line 125:
Missing input files for rule resolve_baysor:
output: C:/xenium_run/TC_070.zarr/.sopa_cache/baysor_boundaries_done, C:/xenium_run/TC_070.zarr/.sopa_cache/table
affected files:

I was thinking that the Snakemake pipeline would run Baysor segmentation automatically.
I also get a similar error when i try a Cellpose only config file or a Baysor.only config file.
What could be the solution here? I have installed both the cellpose package (as part of the sopa installation) and the Baysor package according to their instructions.

Thank you very much for your help :)

Overlay custom segmentation using cell segmentation resolver

Hi,

Thank you for SOPA, it looks really helpful. I'd love to use the cell segmentation resolver. However, I'm struggling a little with how best to implement.

I have custom segmentation for a particular cell type in H&E. I've added this as a shape to my spatialdata object in the same coordinate system as cell boundaries (the default Xenium segmentation). I'd love to overlay this custom segmentation onto the default Xenium cell segmentation (cell boundaries) so that any cells/transcripts in the default under the custom segmentation get merged into the custom segmentation but the other cells from the default are kept.

Is there a way to do this in SOPA at all please?

Best wishes,
Emily

Xenium Input File Change

Along with CosMx, I believe the Xenium input files have also changed. It looks like in the current sopa version, it still searches for morphology_mip.ome.tif; however, in the new Xenium outputs they no longer have this file. Because of this, I get errors when I try to run the snakemake pipeline or when I attempt to load the data using sopa.io.xenium. Thank you for your help!

[Bug] Error in to_spatialdata , Snakemake + Slurm

Hi I am getting the following when trying out your Snakemake pipeline + Slurm

I can see that a 125 GB job is succesfully submitted.

Afterwards the run crashes. Might be caused because the job is pending and not starting immediatly?

snakemake --profile slurm --config data_path=/hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0/ --configfile=/work/rwth1209/projects/merfish_segmentation/sopa/ES53-test/baysor_test.yml
Using profile slurm for setting default command line arguments.
SpatialData object path set to default: /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr
To change this behavior, provide `--config sdata_path=...` when running the snakemake pipeline
Building DAG of jobs...
Using shell: /usr/local_rwth/bin/bash
Provided cluster nodes: 50
Job stats:
job                count
---------------  -------
aggregate              1
all                    1
annotate               1
explorer               1
image_write            1
patchify_baysor        1
report                 1
resolve_baysor         1
to_spatialdata         1
total                  9

Select jobs to execute...

[Wed Jan 24 16:43:21 2024]
rule to_spatialdata:
    input: /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0
    output: /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.zgroup
    jobid: 5
    reason: Missing output files: /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.zgroup
    resources: mem_mb=128000, mem_mib=122071, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, partition=shortq, gpu=0


        sopa read /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0/ --sdata-path /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr --technology 'merscope'
        
sbatch: [I] No runtime limit given, set to: 15 minutes
Submitted job 5 with external jobid 'Submitted batch job 42489463'.
[Wed Jan 24 16:44:01 2024]
Error in rule to_spatialdata:
    jobid: 5
    input: /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0
    output: /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.zgroup
    conda-env: sopa
    shell:
        
        sopa read /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0/ --sdata-path /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr --technology 'merscope'
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: Submitted batch job 42489463

Error executing rule to_spatialdata on cluster (jobid: 5, external: Submitted batch job 42489463, jobscript: /rwthfs/rz/cluster/work/rwth1209/projects/merfish_segmentation/sopa/repo/sopa/workflow/.snakemake/tmp.76uy677n/snakejob.to_spatialdata.5.sh). For error details see the cluster log and the log files of the involved rule(s).
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-01-24T164321.430275.snakemake.log

RuntimeError in rule patchify_cellpose: add_image(), add_labels(), add_points() and add_shapes()

I am getting in this error at patchify stage:

RuntimeError: The functions add_image(), add_labels(), add_points() and
add_shapes() have been removed in favor of dict-like access to the elements.
Please use the following syntax to add an element:

    sdata.images["image_name"] = image
    sdata.labels["labels_name"] = labels
    ...

The new syntax does not automatically updates the disk storage, so you need to
call sdata.write() when the in-memory object is ready to be saved.
To save only a new specific element to an existing Zarr storage please use the
functions write_image(), write_labels(), write_points(), write_shapes() and
write_table(). We are going to make these calls more ergonomic in a follow up
PR.

any idea?

i am using snakemake-lsf profile (default setup) and xenium ome.tif

[Feature] Cellpose - artificially define an area of cytoplasm

Feature request / question
In the absence of a general membrane marker, Sopa/cellpose (using only DAPI) will not "expand" the detected nuclei to generate an artificial cell membrane/cytoplasm area. In QuPath, cellpose is available as an extension and the code has the possibility to artificially expand the nuclei:
image

Would it be possible to pass this parameter to cellpose within Sopa? Perhaps in the config
image

Output example:
Sopa (Cell = Nuclei)
image

QuPath (Cell > Nuclei)
image

Questions about Spatial Statistics Analysis

Hello!
Thank you for your contributions to spatial group data analysis. I have some questions regarding "Distances between cell categories". When the number of certain cell types or niches is small, the mean hop-distance to other cell types or niches tends to increase significantly. In comparing experimental and control groups, can I draw conclusions about distance increase or decrease based solely on mean hop-distance, or should I also consider the issue of object quantity?
I look forward to your response.

Steps to use APIs

Hi,

Thank you for building this tool! I'm very excited to implement it.

I might have missed this but may I know where can I find the tutorial on steps to use the APIs?
Do I sequentially implement these functions?
image

I've tried cross comparing the steps in CLI but failed to match the function names.
Would really appreciate your help on this!

Thank you,
Justine

ERROR in Patch_segmentation_baysor: Unable to find compatible target in cached code image. [Bug]

Hi there, thanks so much for developing a great software!

I am running the cellpose_baysor.yaml config file for Xenium data, which is able to complete "cellpose resolve", but does crashes at the "patch_segmentation_baysor". I've been trying to run it starting from "checkpoint patchify_baysor" to no avail. What happens is that patchify works, and some of the baysor jobs do complete, but the rest crash with the following error:

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=128000, mem_mib=122071, disk_mb=1000, disk_mib=954
Select jobs to execute...

[Tue Mar 26 19:42:35 2024]
rule patch_segmentation_baysor:
    input: /data/chanjlab/Xenium.lung.NE_plasticity.010124/preprocessing/20231013__183137__JC_PILOT_RU151_RU263_RESECTION/output-XETG00065__0005504__RU151__20231013__183151.zarr/.sopa_cache/patches_file_baysor, /data/chanjlab/Xenium.lung.NE_plasticity.010124/preprocessing/20231013__183137__JC_PILOT_RU151_RU263_RESECTION/output-XETG00065__0005504__RU151__20231013__183151.zarr/.sopa_cache/baysor_boundaries/10
    output: /data/chanjlab/Xenium.lung.NE_plasticity.010124/preprocessing/20231013__183137__JC_PILOT_RU151_RU263_RESECTION/output-XETG00065__0005504__RU151__20231013__183151.zarr/.sopa_cache/baysor_boundaries/10/segmentation_polygons.json, /data/chanjlab/Xenium.lung.NE_plasticity.010124/preprocessing/20231013__183137__JC_PILOT_RU151_RU263_RESECTION/output-XETG00065__0005504__RU151__20231013__183151.zarr/.sopa_cache/baysor_boundaries/10/segmentation_counts.loom
    jobid: 0
    reason: Forced execution
    wildcards: index=10
    resources: mem_mb=128000, mem_mib=122071, disk_mb=1000, disk_mib=954, tmpdir=/scratch/lsftmp/5002506.tmpdir

ERROR: Unable to find compatible target in cached code image.
Target 0 (cascadelake): Rejecting this target due to use of runtime-disabled features

[Tue Mar 26 19:42:35 2024]
Error in rule patch_segmentation_baysor:
    jobid: 0
    input: /data/chanjlab/Xenium.lung.NE_plasticity.010124/preprocessing/20231013__183137__JC_PILOT_RU151_RU263_RESECTION/output-XETG00065__0005504__RU151__20231013__183151.zarr/.sopa_cache/patches_file_baysor, /data/chanjlab/Xenium.lung.NE_plasticity.010124/preprocessing/20231013__183137__JC_PILOT_RU151_RU263_RESECTION/output-XETG00065__0005504__RU151__20231013__183151.zarr/.sopa_cache/baysor_boundaries/10
    output: /data/chanjlab/Xenium.lung.NE_plasticity.010124/preprocessing/20231013__183137__JC_PILOT_RU151_RU263_RESECTION/output-XETG00065__0005504__RU151__20231013__183151.zarr/.sopa_cache/baysor_boundaries/10/segmentation_polygons.json, /data/chanjlab/Xenium.lung.NE_plasticity.010124/preprocessing/20231013__183137__JC_PILOT_RU151_RU263_RESECTION/output-XETG00065__0005504__RU151__20231013__183151.zarr/.sopa_cache/baysor_boundaries/10/segmentation_counts.loom
    shell:

        if command -v module &> /dev/null; then
            module purge
        fi

        cd /data/chanjlab/Xenium.lung.NE_plasticity.010124/preprocessing/20231013__183137__JC_PILOT_RU151_RU263_RESECTION/output-XETG00065__0005504__RU151__20231013__183151.zarr/.sopa_cache/baysor_boundaries/10
        ~/.julia/bin/baysor run --save-polygons GeoJSON -c config.toml transcripts.csv :cell

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

Any guidance whatsoever would be greatly appreciated! Thank you!

Joe

cell id corresponds

Hi,

Following spatialdata_xenium_explorer.write#5

Like sopa has changed initial cell id by re-build Tables(Anndata) see below.

Before sopa processing:
Screenshot from 2024-04-03 15-03-09
After sopa processing:
Screenshot from 2024-04-03 15-03-30

As the result file analysis.zarr.zip:
[INFO] (sopa.io.explorer.table) Writing 2 cell categories: region, slide

How to find the new cell id corresponds? I guess that's same order, but I am not sure.
Does sopa have a temporary file contains this information.

Thanks for your help.
Chuang

CLI patching and cellpose segmentation + GPU

Hi,

Just a conceptual suggestion in the CLI tutorial. It would be more intuitive if sopa segmentation cellpose can automatically run on all the patches generated by sopa patchify image without having to specify either a --patch-index or again --patch-width --patch-overlap.

For example, simply running:
sopa segmentation cellpose tuto.zarr --channels DAPI --diameter 35 will return an error.

Also, any suggestion on how to parallelize the segmentation process and leverage CUDA? Thank you very much!

EDIT: asking because I'm working with a very large image. I have 560 patches that are 5000x5000 pixels and segmentation is projected to ~30h.

Error running snakemake

Hello!

Thank you for generating sopa!

I am trying to run snakemake on a toy dataset. I manage to set up the environment to run snakemake but failed for both baysor and cellpose.

Here is a snapshot of an error message for cellpose cell segmentation
image

and here is the error message for baysor
image

Both errors seem to be related to patchify.
I am very new in running snakemake pipelines, could you please give me some clue to understand these outputs?

Thanks,
Justine

Immunofluorescence Image Segmentation

I have 3-channel immunofluorescence (DAPI, Cy5_Quad and FITC_Quad) image. Through CLI i am getting this error:

IncompleteFilesException:
The files below seem to be incomplete. If you are sure that certain files are not incomplete, mark them as complete with

    snakemake --cleanup-metadata <filenames>

the original meta data looks like this:

<?xml version="1.0" encoding="UTF-8"?><OME xmlns="http://www.openmicroscopy.org/Schemas/OME/2016-06" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openmicroscopy.org/Schemas/OME/2016-06 [http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd"](http://www.openmicroscopy.org/Schemas/OME/2016-06/ome.xsd%22) UUID="urn:uuid:b78fd59f-1875-11ef-a2cd-480fcf6249ec" Creator="tifffile.py 2024.4.24"><Image ID="Image:0" Name="Image0"><Pixels ID="Pixels:0" DimensionOrder="XYCZT" Type="uint8" SizeX="44032" SizeY="76288" SizeC="3" SizeZ="1" SizeT="1" PhysicalSizeX="0.325" PhysicalSizeXUnit="m" PhysicalSizeY="0.325" PhysicalSizeYUnit="m"><Channel ID="Channel:0:0" SamplesPerPixel="1" Name="DAPI_Quad"><LightPath/></Channel><Channel ID="Channel:0:1" SamplesPerPixel="1" Name="FITC_Quad"><LightPath/></Channel><Channel ID="Channel:0:2" SamplesPerPixel="1" Name="Cy5_Quad"><LightPath/></Channel><TiffData IFD="0" PlaneCount="3"/></Pixels></Image><Image ID="Image:1" Name="label"><Pixels ID="Pixels:1" DimensionOrder="XYCZT" Type="uint8" SizeX="1721" SizeY="1421" SizeC="3" SizeZ="1" SizeT="1" Interleaved="true"><Channel ID="Channel:1:0" SamplesPerPixel="3"><LightPath/></Channel><TiffData IFD="3" PlaneCount="1"/></Pixels></Image></OME>

Any idea on how to analyze IF images in SOPA ?

[Bug] sopa-v1.0.14 errors on Xenium v2 + multimodal cell segmentation data but not dapi-only data

Bonjour. This is my first time running sopa. I found both sopa-v1.0.14 and v1.0.13 errors on Xenium v2 pipeline data that includes multimodal cell segmentation stains but not on v2 pipeline data that uses only DAPI. The error is

ValueError: cannot reindex or align along dimension 'c' because of conflicting dimension sizes: {16, 4}

Full stack-trace: sopa-1.0.14-read-error-20240607.txt

Using the DAPI-only v2 data succeeds and gives the stdout:

SpatialData object with:
├── Images
│     └── 'morphology_focus': MultiscaleSpatialImage[cyx] (1, 3553, 5791), (1, 1776, 2895), (1, 888, 1447), (1, 444, 723), (1, 222, 361)
└── Points
      └── 'transcripts': DataFrame with shape: (<Delayed>, 11) (3D points)
with coordinate systems:
▸ 'global', with elements:
        morphology_focus (Images), transcripts (Points)

Both data sets are public, small and at https://www.10xgenomics.com/support/software/xenium-onboard-analysis/latest/resources/xenium-example-data.

Command was sopa read Xenium_V1_human_Lung_2fov_outs --technology xenium --sdata-path sopa for the latter.

Given the discussion in #68, expected read function to work. May be I am missing something? Thanks in advance for your help.

  • OS: macOS Ventura 13.6.3
  • Python 3.9.13

[help wanted] Other segmentation options

Thanks for the software, it's a very comprehensive effort.
We just got some CosMX data and are pretty unhappy with the default segmentations. We were on the way of training Cellpose models with membrane fluorescence and getting good results when we ran into sopa, I see that we could include these models in the sopa pipeline. However we were also interested in testing more recent approaches like BIDCell or Bering, which I guess are similar to Baysor.
I would like to know how would you suggest implementing these tests with sopa, for example, maybe obtain segmentations and bring them into sopa and resolve the segmentation? how would one go around testing different segmentation pipelines.

I think this might be related to issue #55

Any advice, suggestion how to move forward with this, or if you believe this defeats the purpose of this package would be appreciated.

Speeding up Sopa explorer write

Hi, I am working my way through the Sopa Snakemake pipeline with one of our Merfish datasets.

A step that has been problematic is the sopa explorer write.

This takes a long time and the job was canceled after 12 hours.

I think the problem is that we have 11 stains all in all at 87012 x 64791 pixels.
The writing of tiles takes forever.

Is there a way to paralellize this or maybe only take a subset of stains?

Thanks!

Memory/RAM issue when writing MERSCOPE data (sopa read)

Hello,

My shell kills 'sopa read' although I do not seem to be exceeding my computer resources.

I also get this warning:

UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Trying to run on M2 Max Mac with 64 GB of RAM. Anyone with similar issues?

Thank you for your support.

Two questions about the Baysor pipeline

Hi,

I am running Sopa Baysor segmentation via Snakemake as shown in the tutorial.
It works quite well and is much more convenient than writing my own code for running Baysor on chunks.

I noted that the output .zarr does not contain a cell x gene matrix but only points, images and shapes.
It is not a big deal as there is an anndata in the .explorer, but shouldn't this also be in the zarr by default?

Another question I have is about the segmented shapes. For Baysor I assume this in the coordinate system of µm as that are the input coordinates or is this converted to pixel to match the image in the zarr?

Where is the tuto.zarr from the tuto.zarr?

snakemake --config sdata_path=tuto.zarr --configfile=config/toy/uniform_cellpose.yaml --cores 1 --use-conda

i did not find the tuto.zarr in the workflow or anywhere in the snakemake folder

Suggestions for documentation

I noticed some things while going through your (very impressive) project:

  • I would suggest to edit this line

    reference_preprocessing: log1p # preprocessing method applied to the reference scRNAseq. Either `normalized` (sc.pp.normalize_total), or `log1p` (sc.pp.normalize_total and sc.pp.log1p), or remove this parameter (raw counts)
    to make it more clear that this is telling Sopa about the state that the reference scRNA is in and not what action should be performed on the reference, that was confusing to me

  • In the Snakemake tutorial https://gustaveroussy.github.io/sopa/tutorials/snakemake/ I would also add the line

# this generates a 'tuto.zarr' directory
sopa read . --sdata-path tuto.zarr --technology uniform

from the CLI tutorial, as the code below otherwise wont work

  • In the html report you show the Per-cell intensity for every channel, but there is no legend mapping the colors in the diagram to the channels

[Bug] Self-intersection in resolve conflict (TopologyException)

Sometimes, after cellpose, some cells may be self-intersecting, leading to a TopologyException

sopa/segmentation/shapes.py:52 in solve_conflicts cell1.union(cell2).buffer(0)
GEOSException: TopologyException: Input geom 1 is invalid: Self-intersection

[Feature] how to handle updated output from different technologies.

Is your feature request related to a problem? Please describe.
I think the output of CosMX files has changed since sopa was developed.

Describe the solution you'd like
Could you suggest which files are the ones that sopa needs to read CosMX data properly.

The Morphology2D folder is the same, but the transcript files naming seems to have changed. Also, not sure which is the FOV location file needed.

Maybe if you could point me to which minimum data is required for each file, or a file example.

Thank you

[Bug] baysor jobs fail with DomainError

Hi,

I am trying to run the Snakemake pipeline with Baysor.
Unfortunately all my Baysor jobs fail like the following:


rule patch_segmentation_baysor:
    input: /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.sopa_cache/patches_file_baysor, /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.sopa_cache/baysor_boundaries/77
    output: /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.sopa_cache/baysor_boundaries/77/segmentation_polygons.json, /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.sopa_cache/baysor_boundaries/77/segmentation_counts.loom
    jobid: 0
    reason: Missing output files: /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.sopa_cache/baysor_boundaries/77/segmentation_counts.loom, /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.sopa_cache/baysor_boundaries/77/segmentation_polygons.json
    wildcards: index=77
    resources: mem_mb=64000, mem_mib=61036, disk_mb=1000, disk_mib=954, tmpdir=/w0/tmp/slurm_mz637064.42865265, partition=c18m, gpu=0, time=03:00:00, runtime=180


        if command -v module &> /dev/null; then
            module purge
        fi
        
        cd /hpcwork/rwth1209/data/merfish/processed/53-ES-500-HuHeart-E3279/region_0.zarr/.sopa_cache/baysor_boundaries/77
        /rwthfs/rz/cluster/work/rwth1209/software/baysor/baysor-x86_x64-linux-v0.6.2_build/bin/baysor/bin/baysor run --save-polygons GeoJSON -c config.toml transcripts.csv :cell_id
        
[10:25:26] Info: Run R998e84721
[10:25:26] Info: (2024-01-26) Run Baysor v0.6.2
[10:25:26] Info: Loading data...
[10:25:28] Info: Excluding genes: Blank-1, Blank-10, Blank-11, Blank-12, Blank-13, Blank-14, Blank-15, Blank-16, Blank-17, Blank-18, Blank-19, Blank-2, Blank-20, Blank-21, Blank-22, Blank-23, Blank-24, Blank-25, Blank-26, Blank-27, Blank-28, Blank-29, Blank-3, Blank-30, Blank-31, Blank-32, Blank-33, Blank-34, Blank-35, Blank-36, Blank-37, Blank-38, Blank-39, Blank-4, Blank-40, Blank-41, Blank-42, Blank-43, Blank-44, Blank-45, Blank-46, Blank-47, Blank-48, Blank-49, Blank-5, Blank-50, Blank-6, Blank-7, Blank-8, Blank-9
[10:25:28] Info: Loaded 438614 transcripts
[10:25:33] Info: Estimating noise level
[10:25:46] Info: Done
[10:25:58] Info: Clustering molecules...
[10:26:34] Info: Algorithm stopped after 287 iterations. Error: 0.0057. Converged: true.
[10:26:34] Info: Done
[10:26:34] Info: Initializing algorithm. Scale: -1.0, scale std: -0.25, initial #components: 87722, #molecules: 438614.
[10:26:38] Info: Using the following additional information about molecules: [:confidence, :cluster, :prior_segmentation]
[10:26:38] Info: Using 2D coordinates
ERROR: DomainError with -4.597020976692567:
sqrt will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).

Have you ever seen this before? Is there a specific Baysor version I should use? I am on 0.6.2

I also didnt change these settings:

        n_clusters: 4
        iters: 500
        n_cells_init: 0
        new_component_weight: 0.2
        new_component_fraction: 0.3

Do these need to be optimized per tissue or do they depend on the patchify settings?

I am also not entirely clear on

  patch_width_pixel: 6000
  patch_overlap_pixel: 150
  patch_width_microns: 1000
  patch_overlap_microns: 20

My pixelsize is 0.108 do I need to adjust this?

Add nextflow pipeline

The snakemake pipeline should be translated to nextflow to reach more users. Don't hesitate to comment this issue to show your interest, else we keep it as a low priority

EDIT: I'm not very familiar to Nextflow, so I'm open to pull requests :)

[Feature] Multiple segmentation rounds of stains or cells + nuclei

Thank you for the tool. It looks really awesome!

What I currently use VPT for is multiple rounds of segmentation of different protein stains that are then "harmonized". In addition, I separately segment nuclei and cell membrane into two sets of polygons that are matched with each other child <> parent.

I don't think that's currently supported in Sopa?

Filtering by qv score for Xenium datasets

Hello again,

I was just wondering if there was a way to filter by qv score (phred-based quality value) for Xenium datasets. The typical qv cutoff is 20; however, I noticed in my spatialdata object that the transcripts ranged from 7.6 to 40.0 so it doesn't look like it's being filtered by qv score in the pipeline. Would it be possible to add this option, or do you not recommend filtering by this value? Thank you!

CosMX stitching images

CosMX images (and transcripts) have to be stitched before running sopa

EDIT: people from spatialdata are working on this

CosMx input file format requirements

Hi Sopa development team,

Sopa looks like a perfect fit for what I've been trying to do with integrating IF stains with Baysor re-segmentation. However, I've been having trouble figuring out how to run the data properly since the newest CosMx output data (and all the column names) are quite different from the example ones on their website. I'm not exactly sure what columns/root file names sopa is looking for. Do you have any documentation there?

I am currently trying to run your Snakefile. From your FAQ, I see that you need these three components:
data_path is the directory containing (i) the transcript file (ending with _tx_file.csv or _tx_file.csv.gz), (ii) the FOV locations file, and (iii) a Morphology2D directory containing the images.

I presumed that these files were based on this CosMx README:
https://nanostring-public-share.s3.us-west-2.amazonaws.com/SMI-Compressed/SMI-ReadMe.html

The Morphology2D directory has files in this format:

20240215_023634_S3_C902_P99_N99_F001.TIF
20240215_023634_S3_C902_P99_N99_F002.TIF

for FOV001 and FOV002

The transcripts file looks like this:

,CellComp,CellId,Spot1_count,Spot2_count,Spot3_count,Spot4_count,codeclass,fov,multicolor_spots_per_feature,possible_BC_count,random_call_probability,seed_x,seed_y,spots_per_feature,target,target_call_observations,target_count_per_feature,target_idx,x,y,z
0,Cytoplasm,1,1,1,1,1,Endogenous,1,0,1,0.0025906,40612,14474,4,CD86,4,1,935,4061.07,1447.53,0
1,Nuclear,3,1,1,1,1,SystemControl,1,1,2,0.00517446,41620,14490,4,SystemControl100,4,1,537,4162.15,1449.18,0
2,Nuclear,2,1,2,1,2,Endogenous,1,0,1,0.0025906,42113,14517,4,RPL37,6,1,937,4211.38,1451.83,0

I'm not entirely sure what the FOV locations file is supposed to look like - there are x and y coordinates for each transcript and for each cell, though it seems like sopa is looking for one X and Y coordinate for each FOV if I'm understanding it right.

I get errors like this:

rule to_spatialdata:
    input: sopa/data/fov001
    output: sopa/data/fov001.zarr/.zgroup
    jobid: 4
    reason: Missing output files: sopa/data/fov001.zarr/.zgroup
    resources: tmpdir=/tmp, mem_mb=128000, mem_mib=122071

Activating conda environment: ../../../envs/env_sopa
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│conda_environments/radian/lib/python3.10/site-packages/pandas/core/inde │
│ xes/base.py:3805 in get_loc                                                                      │
│                                                                                                  │
│   3802 │   │   """                                                                               │
│   3803 │   │   casted_key = self._maybe_cast_indexer(key)                                        │
│   3804 │   │   try:                                                                              │
│ ❱ 3805 │   │   │   return self._engine.get_loc(casted_key)                                       │
│   3806 │   │   except KeyError as err:                                                           │
│   3807 │   │   │   if isinstance(casted_key, slice) or (                                         │
│   3808 │   │   │   │   isinstance(casted_key, abc.Iterable)                                      │
│                                                                                                  │
│ ╭───────────────────────────────────────── locals ─────────────────────────────────────────╮     │
│ │ casted_key = 'X_mm'                                                                      │     │
│ │        key = 'X_mm'                                                                      │     │
│ │       self = Index(['Unnamed: 0', 'CellId', 'Spot1_count', 'Spot2_count', 'Spot3_count', │     │
│ │              │      'Spot4_count', 'codeclass', 'fov', 'multicolor_spots_per_feature',   │     │
│ │              │      'possible_BC_count', 'random_call_probability', 'seed_x', 'seed_y',  │     │
│ │              │      'spots_per_feature', 'target', 'target_call_observations',           │     │
│ │              │      'target_count_per_feature', 'target_idx', 'x', 'y', 'z'],            │     │
│ │              │     dtype='object')                                                       │     │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────╯     │
│                                                                                                  │
│ in pandas._libs.index.IndexEngine.get_loc:167                                                    │
│                                                                                                  │
│ in pandas._libs.index.IndexEngine.get_loc:196                                                    │
│                                                                                                  │
│ in pandas._libs.hashtable.PyObjectHashTable.get_item:7081                                        │
│                                                                                                  │
│ in pandas._libs.hashtable.PyObjectHashTable.get_item:7089                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'X_mm'

In this case, X_mm isn't a column header in the sample data or in the newest CosMx output.

Thank you for your help!

Advice on simple IF hystocytometry

Similar I guess to issue #78, I'm trying to use sopa to do histocymetry-like protocol with a one-round classic IF tissue staining. I have confocal image with 4 channels, I want to use cellpose with one channel (e.g. CH3) and would like to get the average intensity info of the other channels based on that segmentation. Then use the different levels of fluorescence to assign cell labels, cells will have combinations of CH1+CH2 or CH4+CH2 or single CH4 and CH1 with no CH2 intensity.

What would your workflow advice on going through this?

I tested the following code, but I guess the way I'm doing it is trying to look for cells with either of the markers and not taking into account double positives.

I'm still troubleshooting the min_area in the segmentation method, do you have a recommendation for this based on the diameter passed to cellpose?

import spatialdata
import sopa.segmentation
import sopa.io
import aicsimageio

sdata = sopa.io.aicsimageio("data_if/Vh169 7dpt_4 spleen1 30x stitch.oir", z_stack=0, image_models_kwargs=None, aics_kwargs=None)

sdata

SpatialData object with:
└── Images
└── 'image': MultiscaleSpatialImage[cyx] (4, 2947, 3921), (4, 1473, 1960), (4, 736, 980), (4, 368, 490), (4, 184, 245)
with coordinate systems:
▸ 'global', with elements:
image (Images)

image_key = "image"


patches = sopa.segmentation.Patches2D(sdata, image_key, patch_width=1200, patch_overlap=50)
patches.write();

[INFO] (sopa.segmentation.patching) 12 patches were saved in sdata['sopa_patches']

from sopa._sdata import get_spatial_image

print(get_spatial_image(sdata, image_key).c.values)

['CH1' 'CH2' 'CH3' 'CH4']

channels = ["CH3"]

method = sopa.segmentation.methods.cellpose_patch(diameter=7, channels=channels, flow_threshold=0.8, cellprob_threshold=-12,  pretrained_model= "model_if/CP_20240502_095143")
segmentation = sopa.segmentation.StainingSegmentation(sdata, method, channels, min_area=2500)

# The cellpose boundaries will be temporary saved here. You can choose a different path
cellpose_temp_dir = "tuto.zarr/.sopa_cache/cellpose"

# parallelize this for loop yourself (or use the Snakemake pipeline)
for patch_index in range(len(sdata['sopa_patches'])):
    segmentation.write_patch_cells(cellpose_temp_dir, patch_index)

[INFO] (sopa.segmentation.shapes) Percentage of non-geometrized cells: 0.08% (usually due to segmentation artefacts)
[INFO] (sopa.segmentation.shapes) Percentage of non-geometrized cells: 0.00% (usually due to segmentation artefacts)
[INFO] (sopa.segmentation.shapes) Percentage of non-geometrized cells: 0.00% (usually due to segmentation artefacts)
[INFO] (sopa.segmentation.shapes) Percentage of non-geometrized cells: 0.12% (usually due to segmentation artefacts)
[INFO] (sopa.segmentation.shapes) Percentage of non-geometrized cells: 0.07% (usually due to segmentation artefacts)
[INFO] (sopa.segmentation.shapes) Percentage of non-geometrized cells: 0.00% (usually due to segmentation artefacts)

cells = sopa.segmentation.StainingSegmentation.read_patches_cells(cellpose_temp_dir)
cells = sopa.segmentation.shapes.solve_conflicts(cells)

shapes_key = "cellpose_boundaries" # name of the key given to the cells in sdata.shapes

sopa.segmentation.StainingSegmentation.add_shapes(sdata, cells, image_key, shapes_key)

Reading patches: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 251.48it/s]
[INFO] (sopa.segmentation.stainings) Found 11295 total cells
Resolving conflicts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2066/2066 [00:00<00:00, 6959.47it/s]
[INFO] (sopa.segmentation.stainings) Added 10459 cell boundaries in sdata['cellpose_boundaries']

aggregator = sopa.segmentation.Aggregator(sdata, image_key=image_key, shapes_key=shapes_key)

aggregator.compute_table(average_intensities=True)

NFO] (sopa.segmentation.aggregate) Averaging channels intensity over 10459 cells with expansion 0.0
[########################################] | 100% Completed | 1.52 sms
/Users/carlos/mambaforge/envs/sopa/lib/python3.10/site-packages/spatialdata/_core/_elements.py:92: UserWarning: Key cellpose_boundaries already exists. Overwriting it.
self._check_key(key, self.keys(), self._shared_keys)

sdata

SpatialData object with:
├── Images
│ └── 'image': MultiscaleSpatialImage[cyx] (4, 2947, 3921), (4, 1473, 1960), (4, 736, 980), (4, 368, 490), (4, 184, 245)
├── Shapes
│ ├── 'cellpose_boundaries': GeoDataFrame shape: (10459, 1) (2D shapes)
│ └── 'sopa_patches': GeoDataFrame shape: (6, 3) (2D shapes)
└── Tables
└── 'table': AnnData (10459, 4)
with coordinate systems:
▸ 'global', with elements:
image (Images), cellpose_boundaries (Shapes), sopa_patches (Shapes)

from sopa.annotation import higher_z_score

marker_cell_dict = {
    "CH1": "marker-1",
    "CH2": "marker-2",
    "CH3": "segment_marker",
    "CH4": "marker-3"
}

higher_z_score(sdata.tables["table"], marker_cell_dict)

[INFO] (sopa.annotation.fluorescence) Annotation counts: cell_type
marker-1 3794
segment_marker 2617
marker-3 2218
marker-2 1830
Name: count, dtype: int64

sopa.io.write_report("report.html", sdata)

[INFO] (sopa.io.report.generate) Writing general_section
[INFO] (sopa.io.report.generate) Writing cell_section
[INFO] (sopa.io.report.generate) Writing channel_section
[INFO] (sopa.io.report.generate) Writing transcripts_section
[INFO] (sopa.io.report.generate) Writing representation_section
[INFO] (sopa.io.report.generate) Computing UMAP on 10459 cells
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
/Users/carlos/mambaforge/envs/sopa/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
[INFO] (sopa.io.report.generate) Writing report to report.html

sdata.tables["table"]

AnnData object with n_obs × n_vars = 10459 × 4
obs: 'region', 'slide', 'cell_id', 'area', 'cell_type'
uns: 'sopa_attrs', 'spatialdata_attrs', 'pca', 'neighbors', 'umap', 'cell_type_colors'
obsm: 'spatial', 'z_scores', 'X_pca', 'X_umap'
varm: 'PCs'
obsp: 'distances', 'connectivities'

pip install

Dear developer. I have issues with installing/importing package through pip. I created new conda env python 3.10 and did 'pip install sopa', it was installed, but when I try to proceed
import sopa.segmentation
ImportError: cannot import name '_get_unique_label_values_as_index' from 'spatialdata._core.query.relational_query' (C:\GBW_MyPrograms\Anaconda\envs\VPT\lib\site-packages\spatialdata_core\query\relational_query.py)
Can you help me, please?

[Feature] Use .CZI data in SOPA

Feature request
I am using multiplexed images (Opal dyes - fluorescence data) acquired with a Zeiss microscope that generates .czi files, but this format is not compatible with SOPA.

Suggested solution
Update PhenoCycler technology to accept .czi data as raw inputs.

Alternative
Create a technology (e.g. ‘Opal System’) that accepts .czi as raw input.

Thank you!

[Feature] Support WSI analysis

High priority

  • Add a WSI reader (read a wsi into Xarray and then transform to a SpatialData object)
  • Add whole tissue segmentation + conversion to a shapely polygon
  • Support multi-polygon ROI in segmentation (or anything that uses ROI.KEY)
  • Tile embedding in order to encode tile-wise phenotypical information (with different pretrained models e.g., ResNet or in-domain representation learning models). The could be used in a similar fashion to spatial transcriptomics but they will be much sparser.
  • Tile clustering in order to generate cluster-maps with similar phenotypical characteristics. Since sopa is slide specific, the clustering should also be slide specific. Any cohort-specific clustering should be made outside of Sopa
  • Add the new dependencies into a new wsi extra

Low priority

  • Add openslide optionnal backend
  • Add sjoin in the API
  • Add some tutorials to the documentation

Will not be added for now

  • hover-net segmentation method (or CellVit?)
  • Save .zarr without the image, but instead keep a pointer to the original slide

Add patch embeddings with overlaps

It should be simple to add this functionality since the Patches2D class already allows for overlaps. I will take a look at this when I will get some time.

sopa.segmentation.baysor.resolve error: Raise ValueError('negative column index found') ValueError: negative column index found

Dear Quentin Blampey,

Thank you for developping this awsome framework.

We try it on MERFISH data using baysor, but we have an issue at the resolve step, while the segmentation step itself was working well.o0

This is the error I got after running the function sopa.segmentation.baysor.resolve()

NumbaDeprecationWarning: The 'nopython' keyword argu\ ment was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readth\ edocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. def twobit_1hamming(twobit: int, size: int) -> List[int]: Reading baysor outputs: 75%|██████████████████████████▎ | 6/8 [00:15<00:04, 2.21s/it][WARNING] (sopa.segmentation.shapes) Removing cell of unknown type <class 'shapely.geometry.mul\ tilinestring.MultiLineString'> Reading baysor outputs: 100%|███████████████████████████████████| 8/8 [00:18<00:00, 2.36s/it] Resolving conflicts: 100%|██████████████████████████████| 2480/2480 [00:00<00:00, 3637.14it/s] [INFO] (sopa.segmentation.baysor.resolve) Aggregating transcripts on merged cells [INFO] (sopa.segmentation.aggregate) Aggregating transcripts over 418 cells [############## ] | 37% Completed | 23.95 s Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_telocytes_sopa/lib/python3.10/site-packages/sopa/segmentation/baysor/resolve.py", line 131, in resolve table_conflicts = aggregate.count_transcripts(sdata, gene_column, geo_df=geo_df_new) File "/data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_telocytes_sopa/lib/python3.10/site-packages/sopa/segmentation/aggregate.py", line 289, in count_transcripts return _count_transcripts_aligned(geo_df, points, gene_column) File "/data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_telocytes_sopa/lib/python3.10/site-packages/sopa/segmentation/aggregate.py", line 318, in _count_transcripts_ali\ gned ).compute() File "/data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_telocytes_sopa/lib/python3.10/site-packages/dask/base.py", line 379, in compute (result,) = compute(self, traverse=False, **kwargs) File "/data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_telocytes_sopa/lib/python3.10/site-packages/dask/base.py", line 665, in compute results = schedule(dsk, keys, **kwargs) File "/data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_telocytes_sopa/lib/python3.10/site-packages/sopa/segmentation/aggregate.py", line 337, in _add_coo adata.X += coo_matrix( File "/data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_telocytes_sopa/lib/python3.10/site-packages/scipy/sparse/_coo.py", line 204, in __init__ self._check() File "/data/kdi_prod/.kdi/project_workspace_0/1928/acl/01.00/scripts/conda_telocytes_sopa/lib/python3.10/site-packages/scipy/sparse/_coo.py", line 297, in _check raise ValueError('negative column index found') ValueError: negative column index found

Here is the baysor config I used:
config = { "data": { "force_2d": True, "min_molecules_per_cell": 10, "x": "x", "y": "y", "z": "z", "gene": "gene", "min_molecules_per_gene": 0, "min_molecules_per_segment": 3, "confidence_nn_id": 6 }, "segmentation": { "scale": 10, # Important parameter: typical cell diameter, in microns (see our configs) "scale_std": "25%", "prior_segmentation_confidence": 0, "estimate_scale_from_centers": False, "n_clusters": 4, "iters": 500, "n_cells_init": 0, "nuclei_genes": "", "cyto_genes": "", "new_component_weight": 0.2, "new_component_fraction": 0.3 } }

I run it using the environment below on a CentOS7 cluster.
Let me know if you need more informations.

Regards,
Mamy

PS: here is my environment:
env.txt

[Bug] Baysor fails when too few transcripts

When a patch is very small, this edge case can happen, which leads to:
ERROR: Not enough prior cells pass the min_mols_per_cell=10 threshold. Please, specify scale manually.

A temporary (but bad) solution is to remove the index of the patch that fails inside the following file: <data_directory.zarr>/.sopa_cache/patches_file_baysor
When running snakemake, add --rerun-triggers params to the command line, else it will rerun all the baysor patches

Full log details
[14:57:28] Info: Run Rb9342f718
[14:57:28] Info: (2024-03-29) Run Baysor v0.6.2
[14:57:28] Info: Loading data...
[14:57:31] Info: Excluding genes: Blank-1, Blank-10, Blank-11, Blank-14, Blank-15, Blank-16, Blank-19, Blank-2, Blank-20, Blank-21, Blank-22, Blank-23, Blank-26, Blank-27, Blank-28, Blank-29, Blank-3, Blank-32, Blank-33, Blank-34, Blank-37, Blank-39, Blank-41, Blank-42, Blank-44, Blank-45, Blank-48, Blank-49, Blank-5, Blank-6, Blank-7, Blank-8
[14:57:32] Info: Loaded 1214 transcripts
ERROR: Not enough prior cells pass the min_mols_per_cell=10 threshold. Please, specify scale manually.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] estimate_scale_from_assignment(pos_data::Matrix{Float64}, assignment::Vector{Int64}; min_mols_per_cell::Int64)
    @ Baysor.DataLoading /home/viktor_petukhov/.julia/dev/Baysor/src/data_loading/prior_segmentation.jl:92
  [3] estimate_scale_from_assignment
    @ /home/viktor_petukhov/.julia/dev/Baysor/src/data_loading/prior_segmentation.jl:89 [inlined]
  [4] parse_prior_assignment(pos_data::Matrix{Float64}, prior_segmentation::Vector{Int64}; col_name::Symbol, min_molecules_per_segment::Int64, min_mols_per_cell::Int64)
    @ Baysor.DataLoading /home/viktor_petukhov/.julia/dev/Baysor/src/data_loading/cli_wrappers.jl:21
  [5] load_prior_segmentation!(path::String, df_spatial::DataFrames.DataFrame, pos_data::Matrix{Float64}; min_molecules_per_segment::Int64, min_mols_per_cell::Int64)
    @ Baysor.DataLoading /home/viktor_petukhov/.julia/dev/Baysor/src/data_loading/cli_wrappers.jl:212
  [6] load_prior_segmentation!(df_spatial::DataFrames.DataFrame, prior_segmentation::String, opts::Baysor.Utils.SegmentationOptions; min_molecules_per_cell::Int64, min_molecules_per_segment::Int64, plot::Bool)
    @ Baysor.CommandLine /home/viktor_petukhov/.julia/dev/Baysor/src/cli/main.jl:197
  [7] run(coordinates::String, prior_segmentation::String; config::Baysor.Utils.RunOptions, x_column::String, y_column::String, z_column::String, gene_column::String, min_molecules_per_cell::Int64, scale::Float64, scale_std::String, n_clusters::Int64, prior_segmentation_confidence::Float64, output::String, plot::Bool, save_polygons::String, no_ncv_estimation::Bool, count_matrix_format::String)
    @ Baysor.CommandLine /home/viktor_petukhov/.julia/dev/Baysor/src/cli/main.jl:108
  [8] command_main(ARGS::Vector{String})
    @ Baysor.CommandLine /home/viktor_petukhov/.julia/packages/Comonicon/HDhA6/src/codegen/julia.jl:343
  [9] command_main
    @ /home/viktor_petukhov/.julia/packages/Comonicon/HDhA6/src/codegen/julia.jl:90 [inlined]
 [10] julia_main()
    @ Baysor.CommandLine /home/viktor_petukhov/.julia/packages/Comonicon/HDhA6/src/frontend/cast.jl:481
 [11] julia_main(; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Baysor /home/viktor_petukhov/.julia/dev/Baysor/src/Baysor.jl:42
 [12] julia_main()
    @ Baysor /home/viktor_petukhov/.julia/dev/Baysor/src/Baysor.jl:42
 [13] top-level scope
    @ none:1

upstream analysis

Hi @quentinblampey, thanks for your contribution to Sopa! After reading your manuscript, I wonder if Sopa supports upstream analysis such as image registration, blob detection, and transcript decoding?

[advice]Cellpose troubleshooting

I trained a cellpose model and used it with a morphology2D file outside of sopa and got pretty good results, however when running it with the same image on sopa I don't get as many cells. I think it might have to do with the image settings ( e.g. contrast, etc), and maybe the patching??

What do you suggest for troubleshooting this? I guess I can use the CLI and try different parameters there? and then move to snakemake?

Thanks

snakemake pipeline on subset of FOVs

Hello,
I was trying to test sopa with a single FOV from my dataset in order to see everything is working, to do so I created separate file versions of each of the CosMX output files for a single FOV i.e. FOV99 in this case.
I'm not sure what the right naming or strategy would be to subset a dataset prior to input in sopa or if there is a way to select which FOV specifically to use with the snakemake pipeline.
This is how the beginning of each file looks like:

This is the transcripts file

tx_99
fov cell_ID cell x_local_px y_local_px x_global_px y_global_px z
1: 99 0 c_1_99_0 85 43 24039.4 22226.24 8
2: 99 0 c_1_99_0 121 21 24075.4 22248.24 9
3: 99 0 c_1_99_0 259 32 24213.4 22237.24 9
4: 99 0 c_1_99_0 205 67 24159.4 22202.24 7
5: 99 0 c_1_99_0 334 87 24288.4 22182.24 9
---
712596: 99 2270 c_1_99_2270 1438 4239 25392.4 18030.24 8
712597: 99 2270 c_1_99_2270 1454 4239 25408.4 18030.24 7
712598: 99 2270 c_1_99_2270 1441 4240 25395.4 18029.24 9
712599: 99 2270 c_1_99_2270 1451 4241 25405.4 18028.24 8
712600: 99 2270 c_1_99_2270 1461 4243 25415.4 18026.24 9
target CellComp
1: TNFSF13 None
2: Ptk6 None
3: Mfap5 None
4: Clec4d None
5: Tnfsf8 None
---
712596: Krt8 Nuclear
712597: Pdgfra Nuclear
712598: Ldha Nuclear
712599: Cd33 Nuclear
712600: Ppia Nuclear

This is the FOV locations file

fov_99
Slide X_mm Y_mm Z_mm ZOffset_mm ROI FOV Order Run_Tissue_name
1: 1 2.881258 2.678565 0.021493 -2 0 99 33 mw_mus_p1_04

When I try to run sopa with snakemake I get the following errors:

(sopa-snake) carlos@dhcp-32-058 sopa % snakemake --config data_path=/Users/carlos/projects/sopa_test_1/data_fov99 --configfile=/Users/carlos/projects/sopa_test_1/config_files/cellpose_baysor_localmodel.yaml --cores 2 --use-conda
SpatialData object path set to default: /Users/carlos/projects/sopa_test_1/data_fov99.zarr
To change this behavior, provide --config sdata_path=... when running the snakemake pipeline
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 2
Rules claiming more threads will be scaled down.
Job stats:
job count


aggregate 1
all 1
explorer 1
image_write 1
patchify_baysor 1
patchify_cellpose 1
report 1
resolve_baysor 1
resolve_cellpose 1
to_spatialdata 1
total 10

Select jobs to execute...

[Tue May 14 15:34:52 2024]
rule to_spatialdata:
input: /Users/carlos/projects/sopa_test_1/data_fov99
output: /Users/carlos/projects/sopa_test_1/data_fov99.zarr/.zgroup
jobid: 4
reason: Missing output files: /Users/carlos/projects/sopa_test_1/data_fov99.zarr/.zgroup
resources: tmpdir=/var/folders/2n/b_m3klds26l3_d3qlqd7qtlm0000gq/T, mem_mb=128000, mem_mib=122071

Activating conda environment: sopa-snake
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/carlos/mambaforge/envs/sopa-snake/lib/python3.10/site-packages/pandas/core/indexes/base.p │
│ y:3805 in get_loc │
│ │
│ 3802 │ │ """ │
│ 3803 │ │ casted_key = self._maybe_cast_indexer(key) │
│ 3804 │ │ try: │
│ ❱ 3805 │ │ │ return self._engine.get_loc(casted_key) │
│ 3806 │ │ except KeyError as err: │
│ 3807 │ │ │ if isinstance(casted_key, slice) or ( │
│ 3808 │ │ │ │ isinstance(casted_key, abc.Iterable) │
│ │
│ ╭───────────────────────────────────── locals ──────────────────────────────────────╮ │
│ │ casted_key = 'X_mm' │ │
│ │ key = 'X_mm' │ │
│ │ self = Index(['Slide', 'Y_mm', 'Z_mm', 'ZOffset_mm', 'ROI', 'FOV', 'Order', │ │
│ │ │ 'Run_Tissue_name'], │ │
│ │ │ dtype='object') │ │
│ ╰───────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ in pandas._libs.index.IndexEngine.get_loc:167 │
│ │
│ in pandas._libs.index.IndexEngine.get_loc:196 │
│ │
│ in pandas._libs.hashtable.PyObjectHashTable.get_item:7081 │
│ │
│ in pandas._libs.hashtable.PyObjectHashTable.get_item:7089 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'X_mm'

The above exception was the direct cause of the following exception:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/carlos/projects/sopa/sopa/cli/app.py:98 in read │
│ │
│ 95 │ │ io, technology │
│ 96 │ ), f"Technology {technology} unknown. Currently available: xenium, merscope, cosmx, │
│ 97 │ │
│ ❱ 98 │ sdata = getattr(io, technology)(data_path, **kwargs) │
│ 99 │ io.write_standardized(sdata, sdata_path, delete_table=True) │
│ 100 │
│ 101 │
│ │
│ ╭──────────────────────────────────────── locals ─────────────────────────────────────────╮ │
│ │ config_path = None │ │
│ │ data_path = '/Users/carlos/projects/sopa_test_1/data_fov99' │ │
│ │ io = <module 'sopa.io' from '/Users/carlos/projects/sopa/sopa/io/init.py'> │ │
│ │ kwargs = {} │ │
│ │ Path = <class 'pathlib.Path'> │ │
│ │ sdata_path = PosixPath('/Users/carlos/projects/sopa_test_1/data_fov99.zarr') │ │
│ │ technology = 'cosmx' │ │
│ ╰─────────────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /Users/carlos/projects/sopa/sopa/io/reader/cosmx.py:54 in cosmx │
│ │
│ 51 │ image_models_kwargs, imread_kwargs = _default_image_kwargs(image_models_kwargs, imre │
│ 52 │ │
│ 53 │ dataset_id = _infer_dataset_id(path, dataset_id) │
│ ❱ 54 │ fov_locs = _read_cosmx_fov_locs(path, dataset_id) │
│ 55 │ fov_id, fov = _check_fov_id(fov) │
│ 56 │ │
│ 57 │ protein_dir_dict = {} │
│ │
│ ╭───────────────────────────────────── locals ─────────────────────────────────────╮ │
│ │ dataset_id = '/Users/carlos/projects/sopa_test_1/data_fov99/F099' │ │
│ │ fov = None │ │
│ │ image_models_kwargs = {'chunks': (1, 1024, 1024), 'scale_factors': [2, 2, 2, 2]} │ │
│ │ imread_kwargs = {} │ │
│ │ path = PosixPath('/Users/carlos/projects/sopa_test_1/data_fov99') │ │
│ │ read_proteins = False │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /Users/carlos/projects/sopa/sopa/io/reader/cosmx.py:163 in _read_cosmx_fov_locs │
│ │
│ 160 │ │
│ 161 │ pixel_size = 0.120280945 # size of a pixel in microns │
│ 162 │ │
│ ❱ 163 │ fov_locs["xmin"] = fov_locs["X_mm"] * 1e3 / pixel_size │
│ 164 │ fov_locs["xmax"] = 0 # will be filled when reading the images │
│ 165 │ │
│ 166 │ fov_locs["ymin"] = 0 # will be filled when reading the images │
│ │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ dataset_id = '/Users/carlos/projects/sopa_test_1/data_fov99/F099' │ │
│ │ fov_file = PosixPath('/Users/carlos/projects/sopa_test_1/data_fov99/F099_fov_positions_fi… │ │
│ │ fov_locs = │ │ Slide Y_mm Z_mm ZOffset_mm ROI FOV Order │ │
│ │ Run_Tissue_name │ │
│ │ X_mm │ │
│ │ 2.881258 1 2.678565 0.021493 -2 0 99 33 │ │
│ │ mw_mus_p1_04 │ │
│ │ path = PosixPath('/Users/carlos/projects/sopa_test_1/data_fov99') │ │
│ │ pixel_size = 0.120280945 │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /Users/carlos/mambaforge/envs/sopa-snake/lib/python3.10/site-packages/pandas/core/frame.py:4102 │
│ in getitem
│ │
│ 4099 │ │ if is_single_key: │
│ 4100 │ │ │ if self.columns.nlevels > 1: │
│ 4101 │ │ │ │ return self._getitem_multilevel(key) │
│ ❱ 4102 │ │ │ indexer = self.columns.get_loc(key) │
│ 4103 │ │ │ if is_integer(indexer): │
│ 4104 │ │ │ │ indexer = [indexer] │
│ 4105 │ │ else: │
│ │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ is_mi = False │ │
│ │ is_single_key = True │ │
│ │ key = 'X_mm' │ │
│ │ self = │ │ Slide Y_mm Z_mm ZOffset_mm ROI FOV Order │ │
│ │ Run_Tissue_name │ │
│ │ X_mm │ │
│ │ 2.881258 1 2.678565 0.021493 -2 0 99 33 │ │
│ │ mw_mus_p1_04 │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /Users/carlos/mambaforge/envs/sopa-snake/lib/python3.10/site-packages/pandas/core/indexes/base.p │
│ y:3812 in get_loc │
│ │
│ 3809 │ │ │ │ and any(isinstance(x, slice) for x in casted_key) │
│ 3810 │ │ │ ): │
│ 3811 │ │ │ │ raise InvalidIndexError(key) │
│ ❱ 3812 │ │ │ raise KeyError(key) from err │
│ 3813 │ │ except TypeError: │
│ 3814 │ │ │ # If we have a listlike key, _check_indexing_error will raise │
│ 3815 │ │ │ # InvalidIndexError. Otherwise we fall through and re-raise │
│ │
│ ╭───────────────────────────────────── locals ──────────────────────────────────────╮ │
│ │ casted_key = 'X_mm' │ │
│ │ key = 'X_mm' │ │
│ │ self = Index(['Slide', 'Y_mm', 'Z_mm', 'ZOffset_mm', 'ROI', 'FOV', 'Order', │ │
│ │ │ 'Run_Tissue_name'], │ │
│ │ │ dtype='object') │ │
│ ╰───────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'X_mm'
[Tue May 14 15:34:55 2024]
Error in rule to_spatialdata:
jobid: 4
input: /Users/carlos/projects/sopa_test_1/data_fov99
output: /Users/carlos/projects/sopa_test_1/data_fov99.zarr/.zgroup
conda-env: sopa-snake
shell:

    sopa read /Users/carlos/projects/sopa_test_1/data_fov99 --sdata-path /Users/carlos/projects/sopa_test_1/data_fov99.zarr --technology "cosmx"
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-05-14T153451.958303.snakemake.log


This is the structure of the folder I use for sending to sopa

image

Brighter images in the explorer

When converting from a dtype larger than int8, we should scale the image values to make it brighter if the maximum intensity is very low. It will improve the user experience in the Xenium Explorer

run Baysor using cellpose segmentation as prior

Thanks for developing such a useful tool!

I was wondering if it is possible to run segmentation with Baysor using the output of cellpose segmentation as a prior (setting prior_segmentation_confidence to a value higher than 0). Would you recommend this type of analysis?

I was also wondering how sopa handles the information given by z coordinate (E.g. MERSCOPE produce 7 subsequent images: z0, z1, .., z6).

Thank you,
Giulia

Vizgen-merscope

Thank you for your great sopa project.
I hope, it will help more people as me, who are not bioinformaticians, but biologists.
I have a problem with processing Vizgen-merscope dataset. I have my real 6TB dataset, however, before working with that, I am trying to figure out coding with a sample dataset.
I used
https://vzg-web-resources.s3.amazonaws.com/202305010900_U2OS_small_set_VMSC00000.zip
from https://vizgen.com/vpt/

I am not sure, that files were read properly or I created too much something
RuntimeError: Multiple paths found between the two coordinate systems. Please specify an intermediate coordinate system. Available paths are:
.points['202305010900_U2OS_small_set_VMSC00000_region_0_transcripts'] -> 'microns' -> .images['202305010900_U2OS_small_set_VMSC00000_region_0_z3'] -> '_2274858091424_intrinsic'
.points['202305010900_U2OS_small_set_VMSC00000_region_0_transcripts'] -> 'microns' -> .shapes['sopa_patches'] -> '_2274858091424_intrinsic'
.points['202305010900_U2OS_small_set_VMSC00000_region_0_transcripts'] -> 'microns' -> .shapes['cellpose_boundaries'] -> '_2274858091424_intrinsic'

How can I specify path or what could I do wrong?

Also, when I tryed to make report, I got
UnicodeEncodeError: 'charmap' codec can't encode characters in position 209553-209555: character maps to

I used Spyder, and its UTF-8.

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.