Git Product home page Git Product logo

rrnaseq's Introduction

INTRODUCTION

rrnaseq provides a suite of programs to generate basic plots as well as QC-filtering of RNA-seq data. The programs are written in R and are executable from the command-line. It also provides a script that can run the whole suite of programs, called rqc. All programs can be found in the 'bin' sub-directory.

Currently the starting input is a tab-separated file with RPKM values and raw read counts output by rpkmforgenes.py. Most programs also require a file with meta-information about the samples, which can be generated by running 'make_summary_starlog.sh', see the "HOW TO RUN" section below.

INSTALLATION

  1. The latest stable release can be found here.

  2. Install R dependencies with install.packages or via biocLite. In R:

     pkgs = c('DESeq2', 'genefilter', 'statmod', 'gplots',
     'RColorBrewer', 'impute', 'moduleColor', 'graphics', 'getopt')
     source('http://www.bioconductor.org/biocLite.R')
     biocLite(pkgs)
    
  3. Add the directory with binaries to your shell path (to for example .profile on OS X or .bashrc in Linux):

export PATH="/home/user/prg/rrnaseq/bin:$PATH"

HOW TO RUN

Below you find an example of how to generate a script of rrnaseq commands. If you've set your directory names under "#IN" correctly, it should all work. The program 'make_summary_starlog.sh' generates a matrix with sample annotation, one row per sample, based on read alignment metrics output by STAR. The program 'get_expr' assumes a format exactly as that generated by 'rpkmforgenes.py', as to generate two data matrices with expression values, one with RPKM values and one with raw read counts. All other programs use the sample meta-information matrix and the expression matrices output by those two programs.

#Define input and output dirs and files
#IN
projectdir='/path/to/your/PROJECT'
stardir=${projectdir}'/star_hg19'
rpkmforgenes_file=${projectdir}/rpkmforgenes_star_hg19/refseq_rpkms.txt

#OUT
datadir=${projectdir}/'rqc/refseq/data'
sample_meta_file=${datadir}/'mapstats.tab'
pdfdir=${projectdir}/'rqc/refseq/pdf'
brenneckedir=${projectdir}'/rqc/diffexp/brennecke'

#Create and change dir
mkdir -p $datadir
cd $datadir

#Get mapping statistics from STAR logs
make_summary_starlog.pl ${stardir} >$sample_meta_file

#Dry-run the program 'rqc' to generate a shell script with possible commands to execute
rqc -m $sample_meta_file -e $rpkmforgenes_file -d $datadir -p $pdfdir -b $brenneckedir -y

#Executable commands in the shell script generated by rqc
cat rqc.sh
Further examples

Above, the program 'rqc' was dry-run to generate a shell script (rqc.sh) with possible commands to execute. Look in rqc.sh and change or add input arguments as you wish.

You can also see test/rqc.sh for a complete list of available programs and example program calls, but there the directories are set according to the test directory.

TEST AND EXAMPLE OUTPUT

Example output you find in the 'test/rqc' subdirectory. The file 'run.rqc.sh' in the 'test' subdirectory provides an example of how to run the script 'rqc' that with the dry-run flag will generate a file (rqc.sh) with commands that calls all of the available programs in the rrnaseq suite. See 'run.rqc.sh' and the generated 'rqc.sh' file for a test example:

cd test
sh run.rqc.sh 
cat rqc.sh

QC-filter

To filter genes use the program 'gene_filter'. To filter samples use the program 'sample_filter'. This program relies on an input file (default: qc.rds), which contains a data matrix with all samples as rows and different qc-metrics as columns. Elements in this qc-matrix is set to 1 if a sample failed QC for a particular QC-metric. The QC-metric columns of the qc-matrix is added when running the corresponding program, for example, if you want to add a QC-column relating to the number of expressed genes per sample, run the program 'sample2ngenes_expr'. To then apply the filter run 'sample_filter' with the column-name of that QC-metric as an argument. See test/rqc.sh for an example.

GETTING HELP

Each program have several input arguments that should be considered. For a list of all available arguments for a program use the -h flag, for example:

pca -h

rrnaseq's People

Contributors

edsgard avatar helena-s avatar eyay avatar

Stargazers

 avatar

Watchers

sandberglab avatar  avatar  avatar

rrnaseq's Issues

prc.mapped.cutoff is not able to fail samples in qc-file

Regardless of the cutoff set for prc.mapped no samples will be flagged as failed in the qc-file and subsequently samples below the cutoff can't be filtered out.

I have not been able to locate the error in the R.scrips. Hope you are able to help out with this issue.

Thanks In advance.

/Michael

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.