Git Product home page Git Product logo

joint's Introduction

JOINT for scRNA-seq

JOINT performs probability-based cell-type identification and DEG analysis for single-cell RNA sequencing (scRNA-seq) simultaneously without the need for imputation. It applies an EM algorithm on a generalized zero-inflated negative binomial mixture model. It supports arbitrary numbers of negative binomial components with and without zero inflation. JOINT performs soft-clustering by computing the probability of individual cells, i.e. each cell can belong to multiple cell types with different probabilities. The EM algorithm is implemented using the TensorFlow deep learning framework and can be run on both CPU and GPU.

There are several editorial errors in the online version of the JOINT paper. The corrected version can be found in JOINT_2021.pdf in this repo.

Installation

You can install JOINT after downloading this repo by running:

python setup.py install

Quick start

joint solves the generalized zero-inflated negative binomial mixture model given counts.

joint(
     data,                  # input array with row as gene and column as cell
     K,                     # number of cell types
     L = 2,                 # number of negative binomial components + 1
     filter_genes = False,  # whether do EM only on highly variable genes
     n_top_genes=2000,      # number of highly variable genes to keep if filter_genes=True
     impute=True,           # whether return imputation
     n_inits=5,             # number of initializations for the EM algorithm
     n_init_iter=10,        # number of runs of KMeans and Spectral clustering to initialize the EM algorithm
     n_em_iter=100,         # number of iterations to run the EM algorithm
     n_inner_iter=50,       # number of iterations to run the negative binomial inner loop inside the EM
     tol=1e-5,              # stop tolerance in EM algorithm
     zero_inflated=True,    # zero inflated or not
     b_overwrites=[],       # given beta in negative binomial components
     normalize_data=True,   # normalize data by library size
     skip_spectral=True,    # skip spectral clustering for initialization
)              

deg_unknown_labels DEG on two cell types without assuming true cell types are known.

deg_unknown_labels(
    data,                  # input array with row as gene and column as cell
    K,                     # number of cell types
    k1,                    # cell type 1 for DEG
    k2,                    # cell type 2 for DEG
    em_res=None,           # EM algorithm result. If None, it will run the EM algorithm first.
    filter_genes = False,  # whether do EM only on highly variable genes
    n_top_genes=2000,      # number of highly variable genes to keep if filter_genes=True
    impute=True,           # whether return imputation
    n_inits=5,             # number of initializations for the EM algorithm
    n_init_iter=10,        # number of runs of KMeans and Spectral clustering to initialize the EM
    n_em_iter=100,         # number of iterations to run the EM algorithm
    n_inner_iter=50,       # number of iterations to run the negative binomial inner loop inside the EM
    tol=1e-5,              # stop tolerance in EM algorithm
    zero_inflated=True,    # zero inflated or not
    b_overwrites=[],       # given beta in negative binomial components
    normalize_data=True,   # normalize data by library size
    skip_spectral=True,    # skip spectral clustering for initialization
)              

deg_known_labels DEG on two cell types assuming true cell types are known.

deg_known_labels(
    data,                  # input array with row as gene and column as cell
    K,                     # number of cell types
    k1,                    # cell type 1 for DEG
    k2,                    # cell type 2 for DEG
    labels=None,           # known cell types for each cell
    em_res=None,           # EM algorithm result. If None, it will run the EM algorithm first.
    filter_genes = False,  # whether do EM only on highly variable genes
    n_top_genes=2000,      # number of highly variable genes to keep if filter_genes=True
    impute=True,           # whether return imputation
    n_inits=5,             # number of initializations for the EM algorithm
    n_init_iter=10,        # number of runs of KMeans and Spectral clustering to initialize the EM algorithm
    n_em_iter=100,         # number of iterations to run the EM algorithm
    n_inner_iter=50,       # number of iterations to run the negative binomial inner loop inside the EM
    tol=1e-5,              # stop tolerance in EM algorithm
    zero_inflated=True,    # zero inflated or not
    b_overwrites=[],       # given beta in negative binomial components
    normalize_data=True,   # normalize data by library size
    skip_spectral=True,    # skip spectral clustering for initialization
)              

EM based clustering

A notebook to show how to do EM based clustering can be found at examples/2celltype_clustering.ipynb

JOINT for clustering and DEG

An example code to show how to use JOINT to do soft clustering and DEG can be found at examples/2celltype_sim.py

Reference

Cui, T., Wang, T. JOINT for large-scale single-cell RNA-sequencing analysis via soft-clustering and parallel computing. BMC Genomics 22, 47 (2021). https://doi.org/10.1186/s12864-020-07302-6

Support and Contribution

For technical issues particular to this repo, please report the issue on this GitHub repository.

New features, as well as bug fixes, by sending a pull request is always welcomed.

joint's People

Contributors

cuitao42 avatar wanglab-georgetown avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.