Git Product home page Git Product logo

scpipe's Introduction

scPipe

Travis build status Coverage Status

scPipe is an R package that allows barcode demultiplexing, transcript mapping and quality control of raw sequencing data generated by multiple 3 prime end sequencing protocols including CEL-seq, MARS-seq, Chromium 10x and Drop-seq. scPipe produces a count matrix that is essential for downstream analysis along with a user-friendly HTML report that summarises data quality. These results can be used as input for downstream analyses including normalization, visualization and statistical testing.

The package is under active development. Feel free to ask any questions or submit a pull request.

  • [21/09/2017] scPipe now uses the SingleCellExperiment class.

Installation

From Bioconductor

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("scPipe")

From GitHub (Developmental version)

install.packages("devtools")
devtools::install_github("LuyiTian/scPipe")

Getting started

The general workflow of scPipe is illustrated in the following figure:

Data Preprocessing

  • The sc_trim_barcode function will reformat each read and put the cell barcode and UMI sequence into the fastq read names: @ACGATCGA_TAGAGC#SIMULATE_SEQ::002::000::0000::0 AAGACGTCTAAGGGCGGTGTACACCCTTTTGAGCAATGATTGCACAACCTGCGATCACCTTATACAGAATTAT+AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

  • After alignment, the sc_exon_mapping function will put the cell barcode and UMI into the bam file with different tags, together with gene information: AAAGTCAA_AACTCA#SIMULATE_SEQ::007::000::0013::10 0 ERCC-00171 142 40 73M * 0 0 GCCTCGGGAATAAGCTGACGGTGACAAGGTTTCCCCCTAATCGAGACGCTGCAATAACACAGGGGCATACAGT AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA HI:i:1 NH:i:1 NM:i:0 GE:Z:ERCC-00171 YC:Z:AAAGTCAA YM:Z:AACTCA YE:i:-364. In this example the cell barcode is AAAGTCAA with tag YC, the UMI is AACTCA with tag YM and the gene that this read maps to is ERCC-00171 with tag GE. This read is located 364 bp upstream of the transcription end site (TES), which is stored in the YE tag.

  • The sc_demultiplex function will look for the cell barcode in BAM file (by default in the YC tag) and compare it against the known cell barcode annotation file, which is a csv file consisting of two columns. The first column is the cell name and second column is the cell barcode. For Chromium 10x and Drop-seq data we can run sc_detect_bc to find the barcodes and generate the cell barcode annotation file before running sc_demultiplex. An example barcode annotation file is availab in the package from system.file("extdata", "barcode_anno.csv", package = "scPipe"). The output of sc_demultiplex will be multiple csv files corresponding to each cell. Each file has three columns, the first of which contains the gene id, the second column contains the UMI sequence and third column gives the relative location of the read to the TES. These files are used for sc_gene_counting.

For further examples see the vignette.

Acknowledgements

This package is inspired by the scater and scran packages. The idea to put cell barcode and UMI sequences into the BAM file is from Drop-seq tools. We thank Dr Aaron Lun for suggestions on package development.

scpipe's People

Contributors

shians avatar luyitian avatar petehaitch avatar mritchie avatar hpages avatar vobencha avatar link-ny avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.