scDHMap

Understanding the developmental process is a critical step in single-cell analysis. This repo proposes scDHMap, a model-based deep learning approach to visualize the complex hierarchical structures of single-cell sequencing data in a low dimensional hyperbolic space. ScDHMap can be used for various dimensionality reduction tasks including revealing trajectory branches, batch correction, and denoising highly dropout counts.

Network diagram
Requirements
Usage
Parameters
Outputs
Folders
Reference
Visualization demo
Contact

Network diagram

Requirements

Python: 3.9.5
PyTorch: 1.9.1 (https://pytorch.org)
Scanpy: 1.7.2 (https://scanpy.readthedocs.io/en/stable)
Numpy: 1.21.2 (https://numpy.org)
sklearn: 0.24.2 (https://scikit-learn.org/stable)
Scipy: 1.6.3 (https://scipy.org)
Pandas: 1.2.5 (https://pandas.pydata.org)
h5py: 3.2.1 (https://pypi.org/project/h5py)
Optional: harmonypy (https://github.com/slowkow/harmonypy)

Usage

For single-cell count data:

python run_scDHMap.py --data_file data.h5

For single-cell count data from multiple batches (requires harmonypy package):

python run_scDHMap_batch.py --data_file data.h5

The real single cell datasets used in this study can be found: https://figshare.com/s/64694120e3d2b87e21c3

In the data.h5 file, cell-by-gene count matrix is stored in "X". For dataset with batches, batch IDs are one-hot encoded matrix and stored in "Y".

Parameters

--batch_size: batch size, default = 512.
--data_file: data file name.
--select_genes: number of selected genes for embedding analysis, default = 1000. It will use the mean-variance relationship to select informative genes.
--n_PCA: number of principle components for the t-SNE part, default = 50.
--pretrain_iter: number of pretraining iterations, default = 400.
--maxiter: number of max iterations during training stage, default = 5000.
--patience: patience in training stage, default = 150.
--lr: learning rate in the Adam optimizer, default = 0.001.
--alpha: coefficient of the t-SNE regularization, default = 1000. The choice of alpha is to balance the number of genes in the ZINB reconstruction loss.
--beta: coefficient of the wrapped normal KLD loss, default = 10. If points in the embedding are all stacked near the boundary of the Poincare disk, you may choose a larger beta value.
--gamma: coefficient of the Cauchy kernel, default = 1. Larger gamma means greater repulsive force between non-neighboring points. Please note that larger gamma values will push points to the boundary of the Poincare ball. For better visualization, we recommend to choose larger beta values when using larger gamma values. In our experience, the KLD loss value < 10 during training stage step will result to nice visualization. See the effect of different gamma's in Supplementary Figure S23 in our manuscript.
--prob: dropout probability in encoder and decoder layers, default = 0.
--perplexity: perplexity of the t-SNE regularization, default = 30.
--final_latent_file: file name to output final latent Poincare representations, default = final_latent.txt.
--final_mean_file: file name to output denoised counts, default = denoised_mean.txt.

Outputs

final_latent: 2-dimensional embedding in Poincare space of single-cell data, shape (n_cells, 2).
final_mean: denoised (decoded) gene counts, shape (n_cells, n_genes).

Folders

Paul_Analysis: code for the analysis of Paul data, including isometric transformation and branch assignment.
competing_methods: code for running competing methods.
scATAC_seq_analysis: code for gene activity score in scATAC-seq data.
src: source code of scDHMap model.

Reference

Tian T., Cheng Z., Xiang L., Zhi W., & Hakon H. (2023). Complex hierarchical structures in single-cell genomics data unveiled by deep hyperbolic manifold learning. Genome Research 33 (2), 232-246. https://doi.org/10.1101/gr.277068.122

Visualization demo

Visualization demo of the Paul data (Credit: Joshua Ortiga)

https://hosua.github.io/scDHMap-visual/article/2022/11/09/paul-data-visualization.html

Contact

Tian Tian [email protected]

ttgump / scdhmap Goto Github PK

scdhmap's Introduction

scDHMap

Table of contents

Network diagram

Requirements

Usage

Parameters

Outputs

Folders

Reference

Visualization demo

Contact

scdhmap's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent