Git Product home page Git Product logo

intro-to-rnaseq-hpc-o2's Introduction

THIS REPO IS ARCHIVED, PLEASE GO TO https://hbctraining.github.io/main FOR CURRENT LESSONS.

Introduction to RNA-seq using high-performance computing (HPC)

Audience Computational skills required Duration
Biologists None 2- or 3-day workshop (~13 - 19.5 hours of trainer-led time)

Description

This repository has teaching materials for a 2-day Introduction to RNA-sequencing data analysis workshop. This workshop focuses on teaching basic computational skills to enable the effective use of an high-performance computing environment to implement an RNA-seq data analysis workflow. It includes an introduction to shell (bash) and shell scripting. In addition to running the RNA-seq workflow from FASTQ files to count data, the workshop covers best practice guidlelines for RNA-seq experimental design and data organization/management.

These materials were developed for a trainer-led workshop, but are also amenable to self-guided learning.

Learning Objectives

  1. Understand the necessity for, and use of, the command line interface (bash) and HPC for analyzing high-throughput sequencing data.
  2. Understand best practices for designing an RNA-seq experiment and analysis the resulting data.

Lessons

Below are links to the lessons and suggested schedules:

Installation Requirements

All:

Mac users:

Windows users:

Dataset


These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

intro-to-rnaseq-hpc-o2's People

Contributors

kayleighrutherford avatar marypiper avatar mistrm82 avatar molecules avatar rkhetani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

intro-to-rnaseq-hpc-o2's Issues

SEQ description

Hi, thanks for the concise overview of a sam file but I am struggling with some of the descriptors. Like the one for SEQ, in the spec on www.htslib.org/doc/sam.html described as

10 | SEQ | query SEQuence on the same strand as the reference

which could be ambiguous when it is not known what strand of the reference has been used (upper, lower, both?) for the aligning.

In the intro on this site SEQ is described to be "the raw sequence" as found in the fastq file:

Finally, you have the data from the original FASTQ file stored for each read. That is the raw sequence (SEQ) ...

But does the SEQ not give the sequence present in the fasta file used as the reference for aligning, which is normally the sense strand? Not the raw sequence. This difference is important when the mapped raw read has been on the reverse (antisense) strand, which is annotated in the flag with 16.

Thus, for mapped reads antisense to the reference would one not expect the reverse compliment sequence as SEQ (and thus not the raw fastq sequence)?

workflow automation

Move the automation to after salmon.

Automate salmon run instead of STAR as the focus, and remove featurecounts. Add qualimap and multiqc.

reduce/remove samtools

keep a line or two describing and introducing it, link the old materials here as additional material

make IGV a demo instead

Link to IGV tutorial for now until we create the module materials for IGV and publication quality figures and we can link to that

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.