Git Product home page Git Product logo

bidcell's Introduction

BIDCell: Biologically-informed self-supervised learning for segmentation of subcellular spatial transcriptomics data

For more details, please refer to our paper: https://doi.org/10.1101/2023.06.13.544733

Recent advances in subcellular imaging transcriptomics platforms have enabled spatial mapping of the expression of hundreds of genes at subcellular resolution and provide topographic context to the data. This has created a new data analytics challenge to correctly identify cells and accurately assign transcripts, ensuring that all available data can be utilised. To this end, we introduce BIDCell, a self-supervised deep learning-based framework that incorporates cell type and morphology information via novel biologically-informed loss functions. We also introduce CellSPA, a comprehensive evaluation framework consisting of metrics in five complementary categories for cell segmentation performance. We demonstrate that BIDCell outperforms other state-of-the-art methods according to many CellSPA metrics across a variety of tissue types of technology platforms, including 10x Genomics Xenium. Taken together, we find that BIDCell can facilitate single-cell spatial expression analyses, including cell-cell interactions, enabling great potential in biological discovery.

alt text

Installation

Note: A GPU with 12GB VRAM is strongly recommended for the deep learning component, and 32GB RAM for data processing. We ran BIDCell on a Linux system with a 12GB NVIDIA GTX Titan V GPU, Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz with 16 threads, and 64GB RAM.

  1. Clone repository:

     git clone https://github.com/SydneyBioX/BIDCell.git
    
  2. Create virtual environment:

     conda create --name BIDCell python=3.7
    
  3. Activate virtual environment:

     conda activate BIDCell
    
  4. Install dependencies:

     pip install -r requirements.txt
    
     pip install torch==1.5.0 torchvision==0.6.0 -f https://download.pytorch.org/whl/torch_stable.html
    

    Installation of dependencies typically requires a few minutes.

Datasets and Preprocessing

Currently, our repository provides the processed single cell reference for breast cancer, with the positive and negative markers. We also provide the nuclei segmentation and nuclei cell-type classifications for a public dataset. We will be including instructions for performing these preprocessing tasks for other datasets shortly.

Unzip the provided nuclei segmentation data/nuclei.zip and place the image as such: data/nuclei.tif

To train and run BIDCell, download the dataset (Xenium Output Bundle In Situ Replicate 1) from https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast

Process the transcript data

  1. Put transcripts.csv.gz from the Xenium Output Bundle into the /preprocess folder, or note its path.

  2. Convert detected transcripts to image maps of gene expressions:

     cd preprocess
    

    then,

     python generate_expr_maps.py
    

    or,

     python generate_expr_maps.py --fp_transcripts /PATH/TO/transcripts.csv.gz --n_processes NUM_CPUS
    

    If you receive the error: pickle.UnpicklingError: pickle data was truncated, try reducing NUM_CPUS

    By default, the maps will be stored in /data/expr_maps

  3. Split expression maps to patches for the deep learning model:

     python split_expr_maps_to_patches.py
    

    By default, --patch_size is set to 48.

Running BIDCell:

Make sure the provided nuclei segmentation data/nuclei.zip has been extracted and you have data/nuclei.tif

cd BIDCell_model

Training the model

python train.py

Hyperparameters are defined in /configs/config.json, such as the weight of each type of loss function.

To specify the config file:

python train.py --config_file configs/config.json

Predicting from the trained model

Specify --test_epoch and --test_step of the saved model to generate predictions.

python predict.py --config_file configs/config.json --test_epoch 1 --test_step 4000

Postprocessing segmentation predictions

python postprocess_predictions.py --dir_id last --epoch 1 --step 4000 --nucleus_fp ../data/nuclei.tif

or, specify name of directory under /BIDCell/experiments/, e.g.:

python postprocess_predictions.py --dir_id 2023_April_18_19_31_46 --epoch 1 --step 4000 --nucleus_fp ../data/nuclei.tif

Extracting cell expressions

To extract the gene expressions of segmented cells:

cd analysis

python extract_cell_expressions.py --fp_seg /PATH/TO/SEGMENTATION.tif --fp_transcripts /PATH/TO/transcripts.csv.gz --output_dir /DIR_NAME --n_processes NUM_CPUS

For example,

python extract_cell_expressions.py --fp_seg ../BIDCell_model/experiments/2023_April_18_19_31_46/test_output/epoch_1_step_4000_connected.tif --fp_transcripts ../preprocess/transcripts.csv.gz --output_dir cell_gene_matrices/2023_April_18_19_31_46

If you receive the error: pickle.UnpicklingError: pickle data was truncated, try reducing NUM_CPUS

Additional information

Expected outputs:

  • .tif file of segmented cells, where the value corresponds to cell IDs
  • .csv file of gene expressions of segmented cells

Expected runtime (based on our system):

  • Training: ~10 mins for 4,000 steps
  • Inference: ~ 50 mins
  • Postprocessing: ~ 30 mins

Citation

If BIDCell has assisted you with your work, please kindly cite our paper:

Fu, X., Lin, Y., Lin, D., Mechtersheimer, D., Wang, C., Ameen, F., Ghazanfar, S., Patrick, E., Kim, J., & Yang, J. Y. H. (2023). Biologically-informed self-supervised learning for segmentation of subcellular spatial transcriptomics data. bioRxiv, 2023.2006.2013.544733. https://doi.org/10.1101/2023.06.13.544733

bidcell's People

Contributors

xhelenfu avatar nick-robo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.