Git Product home page Git Product logo

submission_version's Introduction

Cell Segmentation Competition

Background

Dataset

The Stereo-seq dataset [1] captures a whole adult mouse brain slice. The barcoded spots are arranged in a grid with a distance of 0.5 μm between spots. In total, this dataset profiled 26,177 genes in more than 42,000,000 spots with an average of 3.3 unique molecular identifier (UMI) counts per spot. The brain slice was imaged with nucleic acid staining, allowing for segmentation of the nucleus using image-based methods.

To get the raw complete data for this competition, you can download the transcriptomics data Mouse_brain_Adult_GEM_bin1.tsv.gz and the stain image data Mouse_brain_Adult.tif from MOSTA.

In this competition, we select only ten 1200*1200 patches from the original dataset for training and testing. These ten patches have the highest gene counts. The selected patches are as follows: '6000/9600/1200/1200', '7200/8400/1200/1200', '8400/3600/1200/1200', '4800/1200/1200/1200', '3600/1200/1200/1200', '8400/6000/1200/1200', '8400/4800/1200/1200', '4800/10800/1200/1200', '7200/1200/1200/1200', '6000/1200/1200/1200' ('start row index/start column index/patchsize/patchsize').

The dataset of this competition can be download from here.

The stain images are stored in tiff folder. They are 8-bit DNA Fluorescent stains. And the corresponding gene expressions are in gene folder. They follow the following format in a tab-delimited file:

geneID          row     column      counts
0610009B22Rik   426     1021        3

means a geneID 0610009B22Rik lies in (426,1021) with gene counts 3.

The segmentation[2] results by SCS are stored in seg folder. For each patch, there are several files:

  • mask_[patch_id].png: the segmentation mask of the patch
  • spot2cell_SCS_[patch_id].txt: the mapping from spot coordinates to cell indexes of the segmentation by SCS. Each line has the following format: row:column cell_id. Important: this is also the submission format of your result!
  • (For evaluation)spot2cell_cellpose_[patch_id].txt: the mapping from spot coordinates to cell indexes of the segmentation by Cellpose[3] method.
  • (For evaluation)spot2nucl_[patch_id].txt: the mapping from spot coordinates to cell indexes by nucleus segmentation[4].

By simply modify the path, you can reproduce the segmentation by running the source code.

You can also refer to the source code of the paper to know how to preprocess the data in details. A jupyter version of part of the preprocess code with additional note is available (Not exactly the same since SCS does downscale, and some code of multiprocessing is added).

File structure

.
├── dataset
│   ├── gene                                           # Gene expression data
│       ├── patch_tsv_6000/1200/1200/1200.tsv          # Gene expression data of patch 6000/1200/1200/1200
│       ├── ... (Other 9 patches)
│   ├── seg                                            # Segmentation results
│       ├── mask_6000_1200_1200_1200.png               # Segmentation mask fig by SCS of patch 6000/1200/1200/1200
│       ├── spot2cell_SCS_6000_1200_1200_1200.txt      # Spot to cell mapping by SCS of patch 6000/1200/1200/1200
│       ├── spot2cell_cellpose_6000_1200_1200_1200.txt # Spot to cell mapping by Cellpose of patch 6000/1200/1200/1200
│       ├── spot2nucl_6000_1200_1200_1200.txt          # Spot to cell mapping by nucleus segmentation of patch 6000/1200/1200/1200
│       ├── ... (Other 9 patches)
│   ├── tiff                                           # Stain images
│       ├── raw_stain_6000/1200/1200/1200.tif          # 8-bit Stain image of patch 6000/1200/1200/1200
│       ├── ... (Other 9 patches)
├── document
│   ├── README.md                                      # Competition description
│   ├── evaluation.ipynb                               # Evaluation demo
│   ├── preprocess_demo.ipynb                          # Preprocess demo
│   ├── evaluation.py                                  # Evaluation code from SCS
└── ...

Preprocess demo

See the notebook preprocess_demo.ipynb for details. Since the patch data provided in this competition is already preprocessed, this notebook is only for demonstration purpose and show you how to deal with the other patches in the large scale mouse brain dataset. You can use it to get familiar with the data. The ten pairs of stain image and gene expression data can be used directly for your model.

Evaluation

See the notebook evaluation.ipynb for details.

References

[1] Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 185, 1777–1792 (2022).

[2] Chen, H., Li, D. & Bar-Joseph, Z. SCS: cell segmentation for high-resolution spatial transcriptomics. Nat Methods (2023).

[3] Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021).

[4] Beucher, S. Use of watersheds in contour detection. In Proc. International Workshop on Image Processing 17–21 (CCETT, 1979).

submission_version's People

Contributors

vvvvdfasd avatar chenyhvvvv avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.