hbctraining / intro-to-rnaseq-hpc-o2 Goto Github PK

This repository has teaching materials for a 2 and 3-day Introduction to RNA-sequencing data analysis workshop using the O2 Cluster

Home Page: https://hbctraining.github.io/Intro-to-rnaseq-hpc-O2/

Shell 26.68% HTML 70.64% SCSS 2.68%

intro-to-rnaseq-hpc-o2's Introduction

THIS REPO IS ARCHIVED, PLEASE GO TO https://hbctraining.github.io/main FOR CURRENT LESSONS.

Introduction to RNA-seq using high-performance computing (HPC)

Audience	Computational skills required	Duration
Biologists	None	2- or 3-day workshop (~13 - 19.5 hours of trainer-led time)

Description

This repository has teaching materials for a 2-day Introduction to RNA-sequencing data analysis workshop. This workshop focuses on teaching basic computational skills to enable the effective use of an high-performance computing environment to implement an RNA-seq data analysis workflow. It includes an introduction to shell (bash) and shell scripting. In addition to running the RNA-seq workflow from FASTQ files to count data, the workshop covers best practice guidlelines for RNA-seq experimental design and data organization/management.

These materials were developed for a trainer-led workshop, but are also amenable to self-guided learning.

Learning Objectives

Understand the necessity for, and use of, the command line interface (bash) and HPC for analyzing high-throughput sequencing data.
Understand best practices for designing an RNA-seq experiment and analysis the resulting data.

Lessons

Below are links to the lessons and suggested schedules:

2 day schedule
3 day schedule

Installation Requirements

All:

FileZilla (make sure you get 'FileZilla Client')
Integrative Genomics Viewer (IGV) (scroll down on the page for Download options). If you have trouble opening IGV after installing it, you may need to install Java.

Mac users:

Plain text editor like Sublime text or similar

Windows users:

GitBash
Plain text editor like Notepad++ or similar

Dataset

Day 1 - Introduction to Shell: Dataset
Days 2 and 3 - RNA-seq analysis (coming soon)

These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Some materials used in these lessons were derived from work that is Copyright © Data Carpentry (http://datacarpentry.org/). All Data Carpentry instructional material is made available under the Creative Commons Attribution license (CC BY 4.0).

intro-to-rnaseq-hpc-o2's People

Contributors

Stargazers

Watchers

Forkers

jdmatute luciach83 talele inambioinfo kauralasoo juadiegaitan maozhitao samll-rookie brianpeiranyang pythseq omidalam yunjoonjung1 melakbet thyagoleal aerijman yejg2017 learning-jusue404 tranwin fsoubes y461650833y htnani molecules liangxl913 boxizhang maolin2017 owenwilkins ruixiangliu bycarol sky-xian esugis deena-b hooooooly alisaei muhendis gianasco cxhu zorrodong amrr101 xjyx zzzp21 genesislearn lwang18 marencc ahmedaboushanab nkandhari houruiyan catpham123 annabrest xtmgah shinyclub sygongcode howtofindme martamendez jkgitau standardgalactic jsacco1 jhpeach ashwini-girish jligm-hash abdalla-diaai fennecfish ylippi urkang fc1gjf tianhuaz rnandety tsjzz gsarfo-boateng sultan-mia sanjay7sngh training-resources mysoresparrow duhmanm gian77 shrishtee-kandoi fjrosser isilta trangng-th samordil denvern3 changxin-wang sachinkavindaa mdbabumiamssm aakilkhanemon yuricha02 danid77 humaasif jingmingcn balvisio tetukas

intro-to-rnaseq-hpc-o2's Issues

Change transcriptome for salmon to hg38

Modify directory name in unix section

Change unix_workshop to unix_lesson in all lessons.

split experimental design lessson

consider splitting it or in the way in which we teach it

Update repo to be master as 3dayworkshop and branch as 2dayworkshop

SEQ description

Hi, thanks for the concise overview of a sam file but I am struggling with some of the descriptors. Like the one for SEQ, in the spec on www.htslib.org/doc/sam.html described as

10 | SEQ | query SEQuence on the same strand as the reference

which could be ambiguous when it is not known what strand of the reference has been used (upper, lower, both?) for the aligning.

In the intro on this site SEQ is described to be "the raw sequence" as found in the fastq file:

Finally, you have the data from the original FASTQ file stored for each read. That is the raw sequence (SEQ) ...

But does the SEQ not give the sequence present in the fasta file used as the reference for aligning, which is normally the sense strand? Not the raw sequence. This difference is important when the mapped raw read has been on the reverse (antisense) strand, which is annotated in the flag with 16.

Thus, for mapped reads antisense to the reference would one not expect the reverse compliment sequence as SEQ (and thus not the raw fastq sequence)?