Git Product home page Git Product logo

intro-to-rnaseq-hpc-salmon-flipped's Introduction

Introduction to RNA-seq using high-performance computing (HPC)

Audience Computational skills required Duration
Biologists None 3-session online workshop (~7.5 hours of trainer-led time)

Description

This repository has teaching materials for a 2-day Introduction to RNA-sequencing data analysis workshop. This workshop focuses on teaching basic computational skills to enable the effective use of an high-performance computing environment to implement an RNA-seq data analysis workflow. It includes an introduction to shell (bash) and shell scripting. In addition to running the RNA-seq workflow from FASTQ files to count data using Salmon, the workshop covers best practice guidelines for RNA-seq experimental design and data organization/management.

Note for Trainers: Please note that the schedule linked below assumes that learners will spend between 3-4 hours on reading through, and completing exercises from selected lessons between classes. The online component of the workshop focuses on more exercises and discussion/Q & A.

These materials were developed for a trainer-led workshop, but are also amenable to self-guided learning.

Learning Objectives

  1. Understand the necessity for, and use of, the command line interface (bash) and HPC for analyzing high-throughput sequencing data.
  2. Understand best practices for designing an RNA-seq experiment and analyzing the resulting data.

Lessons

Installation Requirements

All:

Mac users:

Windows users:


Citation

To cite material from this course in your publications, please use:

Mary E. Piper, Meeta Mistry, Jihe Liu, William J. Gammerdinger, & Radhika S. Khetani. (2022, January 10). hbctraining/Intro-to-rnaseq-hpc-salmon-flipped: Introduction to RNA-seq using Salmon Lessons from HCBC (first release). Zenodo. https://doi.org/10.5281/zenodo.5833880

A lot of time and effort went into the preparation of these materials. Citations help us understand the needs of the community, gain recognition for our work, and attract further funding to support our teaching activities. Thank you for citing this material if it helped you in your data analysis.


These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

intro-to-rnaseq-hpc-salmon-flipped's People

Contributors

amelie-tghn avatar eberdan avatar gammerdinger avatar jihe-liu avatar marypiper avatar mistrm82 avatar rkhetani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

intro-to-rnaseq-hpc-salmon-flipped's Issues

FileZilla with 2F-authentication

I was having lots of trouble connecting to O2 through FileZilla. Then I thought it was strange that it was bypassing my 2-factor authentication, so decided to check Duo. I had approvals waiting there and once I gave it approval, I got it to work. It seems like we should edit the course materials to discuss the use of 2-factor authentication when connecting with FileZilla.

Experimental design considerations reminder

Less time going over experimental considerations that aren’t questions. B/c a lot of questions. Ask about if questions regarding the whole lesson prior to going through the lesson.

Update reservation

Provide information about updating reservation --reservation=HBC in FASTQC lesson.

Add GC bias information

look up Mike Love’s rationale for including the GC bias information. Contact Rob or Mike about nuances in biases. Maybe look up alpine too.

missing bias value for Mov10_oe_3

In the multiqc report there s missingvalue for 5'-3'bias in the table.

Troubleshoot this, especially since the value is computed by Qualimap

add an example paired-end script for automation

A question that comes up often is how to modify the script to work with PE data. For instructors that have handled this in a office hours, get a skeleton script together that we can link out to

Data to follow along

Hello all,

I try to follow along with the analysis in this workshop. However, when I click to download data, I am not sure I got the correct files because the file names are different from the tutorial. Would you confirm which files you used for this analysis? Is it the files in the folder raw_fastq? Thank you so much!

data link for self-learners broken

On this page: links-to-lessons.md

The link for Non-Harvard folks is broken (it also points to unix_lesson, which I don't think is what they need)

HPC lesson updates and clarifications from Kathleen

  • data storage & memory on O2 are in units using base 2, not units using base 10 (e.g. tebibytes not terabytes. TiB = 1024 GiB, TB = 1000 GB). HMS IT had been using the units that people colloquially use- such as terabyte, gigabyte, etc. - but have been technically incorrect with these units. The amount of storage that folks have been using/have access to has not changed with our change in terminology. The distinction is important for the billing aspect, as we charge for compute usage (with a RAM charge for GiB/hour, among other factors) and for storage usage (TiB/year). More details on billing rates here
  • for the sentence “There are several compute nodes on O2 available for performing your analysis/work”, do you mean several types of compute nodes? That is true, or you could also say “There are several hundred compute nodes…” which is also true. The sentence as is sounds like it is missing a word.
  • Memory request would be in gibibytes, not gigabytes for --mem 1G
  • This won’t be relevant for the workshop itself, but if folks are submitting jobs and are in multiple Slurm accounts (e.g. labs/groups), they’ll need to specify an account for an srun or sbatch job to count under with the -A parameter. You can check if you’re in multiple Slurm accounts by running sshare -Uu $USER. More details on -A and Slurm accounts/unix accounts here
  • The wiki link for -t is broken, missing a dash, use this: https://harvardmed.atlassian.net/wiki/spaces/O2/pages/1586793632/Using+Slurm+Basic#Time-limits
  • Same thing for -c: https://harvardmed.atlassian.net/wiki/spaces/O2/pages/1586793632/Using+Slurm+Basic#How-many-cores?
  • And --mem: https://harvardmed.atlassian.net/wiki/spaces/O2/pages/1586793632/Using+Slurm+Basic#Memory-requirements
  • And O2 wiki sbatch reference link: https://harvardmed.atlassian.net/wiki/spaces/O2/pages/1586793632/Using+Slurm+Basic#sbatch-options-quick-reference
  • sbatch job submission is using 400MiB
  • module load can modify additional environment variables than $PATH, specifics are probably not relevant to this workshop, though
  • We’re starting to move away from gcc/6.2.0 and are building new tools with gcc/9.2.0, but the majority of modules have been built with gcc/6.2.0
  • For the filesystems part, it’d be helpful to link to here, as it has links for requesting group directories (under the Active Compute section). Also, a caveat that off quad folks will have to pay for their group directories. Home, scratch directories are free for everyone. Also, /n/cluster/bin/scratch3_create.sh needs to be run from a login node. The script will give you an error message to this extent if you run it from a compute node, but sometimes folks don’t read :bloblul:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.