Git Product home page Git Product logo

smsk_tuxedo2's Introduction

smsk_tuxedo2: snakemake files of the tuxedo v2 pipeline from Pertea et al 2016

Build Status

1. Description

This is a SnakeMake workflow to apply the protocol described in Pertea et al. 2016. With some modifications.

The idea is to produce Differential Expression analysis given a bunch of FASTQ files,

2. First steps

Follow the contents of the .travis.yml file:

  1. Install (ana|mini)conda

  2. Installation

    git clone https://github.com/jlanga/smsk_tuxedo2.git smsk_tuxedo2
    cd smsk_tuxedo2
    snakemake --use-conda --create-envs-only
  3. Execute the test pipeline:

    snakemake --use-conda -j
  4. Modify the following files:

    • features.yml with your reference genome and annotation,
    • samples.tsv with the paths and info of your samples src/config.yaml
  5. Run the pipeline with your data:

    snakemake --use-conda -j

3. File organization

The hierarchy of the folder is the one described in Good enough practices in scientific computing:

smsk
├── bin/: external scripts/binaries
├── data/: test data.
├── doc/: documentation.
├── README.md
├── results:
|   ├── raw: links to your raw data.
|   ├── map: files from HISAT2 mapping: index and CRAM files.
|   ├── quant: files from StringTie assembly and quantification
|   └── de: files from Ballgown: differential expression tables, RData objects for closer inspection.
├── Snakefile: driver script of the project.
├── environment.yml: packages to execute the analysis.
└── src: snakefiles, installers, config.yaml, R scripts.

4. Workflow description

4.1 Mapping with HISAT2 (rule map)

  • Index is build from scratch

  • Exons and splicing sites are computed from the reference GTF file

  • Paired reads are mapped with HISAT2. Results are compressed to CRAM on the fly.

4.2 Transcript assembly and quantification with StringTie (rule quant)

  • Using the exact parameters from Pertea et al. 2016

  • CRAM -> SAM conversion on the fly

4.3 Differential expression analysis with Ballgown (rule de)

  • Performing DE with the R script provided in src/de_ballgown.R

  • Visualization should be done interactively.

rulegraph

Bibliography

  • Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Pertea et al 2016

  • The Sequence Alignment/Map format and SAMtools. Li et al.

  • HISAT: a fast spliced aligner with low memory requirements. Kim et al.

  • StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Pertea et al.

  • Flexible isoform-level differential expression analysis with Ballgown. Frazee et al.

  • RSkittleBrewer. Frazee et al. https://github.com/alyssafrazee/RSkittleBrewer

  • SnakeMake - A scalable workflow engine. Köster et al.

  • smsk - a snakemake skeleton to jumpstart your projects. Langa. http://github.com/jlanga/smsk

smsk_tuxedo2's People

Contributors

jlanga avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

smsk_tuxedo2's Issues

Installation: conda_env.sh is missing

Hello @jlanga, I am trying to install your project as a means of learning Snakemake for RNAseq, but I'm having a problem setting up the environment. Step 1 on installation, bash src/install/conda_env.sh. This file is missing from this directory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.