Git Product home page Git Product logo

scimpute's Introduction

scImpute: accurate and robust imputation of scRNA-seq data

Wei Vivian Li, Jingyi Jessica Li 2019-08-20

Latest News

2019/08/20:

  • Since the development of scImpute, new imputation methods have been proposed for scRNA-seq data. These methods have different model assumptions and diverse performances on different datasets. It contributes to both method development and bioinformatic applications to discuss and compare existing imputation methods. However, we realize several issues in existing evaluation and comparison of imputation methods and discuss these issue in our commentary, which is available at arxiv.

2018/08/15:

  • Version 0.0.9 is released!
  • More robust implementation of dimension reduction.
  • Faster calculation of cell similarity.

Introduction

scImpute is developed to accurately and robustly impute the dropout values in scRNA-seq data. scImpute can be applied to raw read count matrix before the users perform downstream analyses such as

  • dimension reduction of scRNA-seq data
  • normalization of scRNA-seq data
  • clustering of cell populations
  • differential gene expression analysis
  • time-series analysis of gene expression dynamics

The users can refer to our paper An accurate and robust imputation method scImpute for single-cell RNA-seq data for a detailed description of the modeling and applications.

Any suggestions on the package are welcome! For technical problems, please report to Issues. For suggestions and comments on the method, please contact Wei ([email protected]) or Dr. Jessica Li ([email protected]).

Installation

The package is not on CRAN yet. For installation please use the following codes in R

install.packages("devtools")
library(devtools)

install_github("Vivianstats/scImpute")

Quick start

scImpute can be easily incorporated into existing pipeline of scRNA-seq analysis. Its only input is the raw count matrix with rows representing genes and columns representing cells. It will output an imputed count matrix with the same dimension. In the simplest case, the imputation task can be done with one single function scimpute:

scimpute(# full path to raw count matrix
         count_path = system.file("extdata", "raw_count.csv", package = "scImpute"), 
         infile = "csv",           # format of input file
         outfile = "csv",          # format of output file
         out_dir = "./",           # full path to output directory
         labeled = FALSE,          # cell type labels not available
         drop_thre = 0.5,          # threshold set on dropout probability
         Kcluster = 2,             # 2 cell subpopulations
         ncores = 10)              # number of cores used in parallel computation

This function returns the column indices of outlier cells, and creates a new file scimpute_count.csv in out_dir to store the imputed count matrix. Please note that we recommend applying scImpute on the whole-genome count matrix. A filtering step on genes is acceptable but most genes should be present to ensure robust identification of dropouts.

For detailed usage, please refer to the package manual or vignette.

scimpute's People

Contributors

vivianstats avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.