Git Product home page Git Product logo

scdhmap's Introduction

scDHMap

Understanding the developmental process is a critical step in single-cell analysis. This repo proposes scDHMap, a model-based deep learning approach to visualize the complex hierarchical structures of single-cell sequencing data in a low dimensional hyperbolic space. ScDHMap can be used for various dimensionality reduction tasks including revealing trajectory branches, batch correction, and denoising highly dropout counts.

Table of contents

Network diagram

alt text

Requirements

Python: 3.9.5
PyTorch: 1.9.1 (https://pytorch.org)
Scanpy: 1.7.2 (https://scanpy.readthedocs.io/en/stable)
Numpy: 1.21.2 (https://numpy.org)
sklearn: 0.24.2 (https://scikit-learn.org/stable)
Scipy: 1.6.3 (https://scipy.org)
Pandas: 1.2.5 (https://pandas.pydata.org)
h5py: 3.2.1 (https://pypi.org/project/h5py)
Optional: harmonypy (https://github.com/slowkow/harmonypy)

Usage

For single-cell count data:

python run_scDHMap.py --data_file data.h5

For single-cell count data from multiple batches (requires harmonypy package):

python run_scDHMap_batch.py --data_file data.h5

The real single cell datasets used in this study can be found: https://figshare.com/s/64694120e3d2b87e21c3

In the data.h5 file, cell-by-gene count matrix is stored in "X". For dataset with batches, batch IDs are one-hot encoded matrix and stored in "Y".

Parameters

--batch_size: batch size, default = 512.
--data_file: data file name.
--select_genes: number of selected genes for embedding analysis, default = 1000. It will use the mean-variance relationship to select informative genes.
--n_PCA: number of principle components for the t-SNE part, default = 50.
--pretrain_iter: number of pretraining iterations, default = 400.
--maxiter: number of max iterations during training stage, default = 5000.
--patience: patience in training stage, default = 150.
--lr: learning rate in the Adam optimizer, default = 0.001.
--alpha: coefficient of the t-SNE regularization, default = 1000. The choice of alpha is to balance the number of genes in the ZINB reconstruction loss.
--beta: coefficient of the wrapped normal KLD loss, default = 10. If points in the embedding are all stacked near the boundary of the Poincare disk, you may choose a larger beta value.
--gamma: coefficient of the Cauchy kernel, default = 1. Larger gamma means greater repulsive force between non-neighboring points. Please note that larger gamma values will push points to the boundary of the Poincare ball. For better visualization, we recommend to choose larger beta values when using larger gamma values. In our experience, the KLD loss value < 10 during training stage step will result to nice visualization. See the effect of different gamma's in Supplementary Figure S23 in our manuscript.
--prob: dropout probability in encoder and decoder layers, default = 0.
--perplexity: perplexity of the t-SNE regularization, default = 30.
--final_latent_file: file name to output final latent Poincare representations, default = final_latent.txt.
--final_mean_file: file name to output denoised counts, default = denoised_mean.txt.

Outputs

  • final_latent: 2-dimensional embedding in Poincare space of single-cell data, shape (n_cells, 2).
  • final_mean: denoised (decoded) gene counts, shape (n_cells, n_genes).

Folders

Paul_Analysis: code for the analysis of Paul data, including isometric transformation and branch assignment.
competing_methods: code for running competing methods.
scATAC_seq_analysis: code for gene activity score in scATAC-seq data.
src: source code of scDHMap model.

Reference

Tian T., Cheng Z., Xiang L., Zhi W., & Hakon H. (2023). Complex hierarchical structures in single-cell genomics data unveiled by deep hyperbolic manifold learning. Genome Research 33 (2), 232-246. https://doi.org/10.1101/gr.277068.122

Visualization demo

Visualization demo of the Paul data (Credit: Joshua Ortiga)

https://hosua.github.io/scDHMap-visual/article/2022/11/09/paul-data-visualization.html

Contact

Tian Tian [email protected]

scdhmap's People

Contributors

ttgump avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.