Git Product home page Git Product logo

power8-gem's Introduction

POWER8-GEM

This workflow contains bash scripts that perform the following tasks on an IBM POWER8 architecture:

  • Download RNA sequencing data in FASTQ format from the EMBL-EBI
  • Trim raw fastq files of poor quality reads and Illumina adapter sequences using Trimmomatic
  • Map cleaned reads to a reference genome using Hisat2
  • Quantify RNA transcript abundances using StringTie
  • Parse FPKM values from StringTie output into a Gene Expression Matrix (GEM)

This workflow utilizes Genome annotation files in GFF3 format to quantify transcript abudances as described in the following Nature Protocol:

Pertea, M., et al. (2016). "Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown." Nat. Protocols 11(9): 1650-1667.

Note that PBS-GEM does not perform transcript assembly, and will only quantify abundances of annotated reference transcripts.

Pre-Workflow User Input

Install Software and Create Conda Environment

To install software needed for the POWER8-GEM workflow, run:

$ chmod +x 00-Install-tools.sh
$ sudo ./00-Install-tools.sh

When prompted for an installation location for Miniconda, enter:

 /home/<user>/bin/miniconda2

The rest of the installation should proceed normally.

After all of the software has finished installing, the POWER8-GEM should be ready to run.

Download and Index Reference Genome

The reference genome must be indexed using Hisat2. Download a reference genome in FASTA (.fa) format, and place this file in the Reference directory of the workflow. To index this reference genome, execute the Index-Genome.sh script and provide a reference prefix as an argument:

    $ ./Index-Genome.sh $REF_PREFIX

For example:

    $ ./Index-Genome.sh chr21-GRCh38

Please note that only one .fa genome file can be present in the Reference directory. Please remove the example file, "chr21-GRCh38.fa", before using your own data.

Download GFF3 Genome Annotation

A GFF3 file that corresponds to the reference genome must be placed in the Reference directory. Please check that only one GFF3 file is present.

Identify SRA sample ID's and modify SRAList.txt file

SRA sample ID's must be specified in the "SRAList.txt" file. Please modify this file to specify the samples that you want to process. Each SRA ID must be present on a new line.

Execute the Workflow

The workflow contains a small reference genome for testing. To run the workflow, simply run the commands

./power8-gem.sh (ex. ./power8-gem.sh chr21-GRCh38)

If needed, you can execute each step of the pipeline as follows:

Download Input Data

$ ./01-Prepare-inputs.sh  

Trim Reads

$ ./02-Trim-reads.sh

Map Reads to Reference Genome

$ ./03-Map-reads.sh chr21-GRCh38

When using your own data, please replace "chr21-GRCh38" with the appropriate reference prefix (same as the $REF_PREFIX that you chose when indexing the reference genome).

Quantify Transcript Abundances

$ ./04-Count-transcripts.sh

Build Gene Expression Matrix (GEM)

$ ./05-GEM-parse.sh

Comments/Notes

With full datasets, each step of this workflow can take several hours. Please be sure that all PBS jobs have finished before moving onto the next step. A "Logs" directory will be created upon initiation of the workflow. Please inspect all log files for errors.

NOTE: This is an adaptation of the PBS-GEM, which is based off the OSG-GEM. William Poehlman and Dr. Alex Feltus deserve credit for the development of those workflows.

power8-gem's People

Contributors

cbmckni avatar

Stargazers

JMV avatar

Watchers

William Poehlman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.