Git Product home page Git Product logo

scrna-seq's Introduction

THIS REPO IS ARCHIVED, PLEASE GO TO https://hbctraining.github.io/main FOR CURRENT LESSONS.

Single-cell RNA-seq analysis workshop

Audience Computational skills required Duration
Biologists Introduction to R 2-day workshop (~10 hours of trainer-led time)

Description

This repository has teaching materials for a 2-day, hands-on Introduction to single-cell RNA-seq analysis workshop. This 2-day hands-on workshop will instruct participants on how to design a single-cell RNA-seq experiment, and how to efficiently manage and analyze the data starting from count matrices. This will be a hands-on workshop in which we will focus on using the Seurat package using R/RStudio. Working knowledge of R is required or completion of the Introduction to R workshop.

Learning Objectives

  • Undertand the considerations when designing a single-cell RNA-seq experiment
  • Discuss the steps involved in taking raw single-cell RNA-sequencing data and generating a count (gene expression) matrix
  • Compute and assess QC metrics at every step in the workflow
  • Cluster cells based on expression data and derive the identity of the different cell types present
  • Perform integration of different sample conditions

These materials are developed for a trainer-led workshop, but also amenable to self-guided learning.

Lessons

Click here for links to lessons and proposed schedule

Installation Requirements

Applications

Download the most recent versions of R and RStudio for your laptop:

Packages for R

Note 1: Install the packages in the order listed below.

Note 2:  When installing the following packages, if you are asked to select (a/s/n) or (y/n), please select “a” or "y" as applicable.

Note 3: All the package names listed below are case sensitive!

(1) Install the 10 packages listed below from CRAN using the install.packages() function.

  1. tidyverse
  2. Matrix
  3. RCurl
  4. scales
  5. cowplot
  6. devtools
  7. BiocManager
  8. Seurat**

Please install them one-by-one as follows:

install.packages("tidyverse")
install.packages("Matrix")
install.packages("RCurl")
& so on ...

** If you have trouble installing Seurat, please install multtest using the following lines of code, then try installing Seurat again:

install.packages("BiocManager")

BiocManager::install("multtest")

(2) Install the 4 packages listed below from Bioconductor using the the BiocManager::install() function.

  1. SingleCellExperiment
  2. AnnotationHub
  3. ensembldb

Please install them one-by-one as follows:

BiocManager::install("SingleCellExperiment")
BiocManager::install("AnnotationHub")
& so on ...

(3) Finally, please check that all the packages were installed successfully by loading them one at a time using the library() function.

library(Seurat)
library(tidyverse)
library(Matrix)
library(RCurl)
library(scales)
library(cowplot)
library(SingleCellExperiment)
library(AnnotationHub)
library(ensembldb)

(4) Once all packages have been loaded, run sessionInfo().

sessionInfo()

These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

scrna-seq's People

Contributors

hackdna avatar jihe-liu avatar marypiper avatar mistrm82 avatar rkhetani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scrna-seq's Issues

8 pooled samples versus 1 sample?

Download BAM from SRA and check the paper.

We are not sure if there are only 2 BAMS, because the data was pooled (as described in the study) or if individual samples were also supplied to SRA.

Update the setup markdown to reflect this

base decision tree

Maybe one for experimental design (expanding on what Sarah has)

and one for the analysis: QC, integration, marker id, differential expression, traj analysis

split the clustering markdown

this is the first lesson (not exploration). May not need to split if using sctransform (reassess as we develop!)

Split at "Determining PCs" sections

Error in CellsByIdentities(object = object, cells = cells)

Hi,

I got an error while running the following code

# Filter out low quality reads using selected thresholds - these will change with experiment
filtered_seurat <- subset(x = merged_seurat, 
                         subset= (nUMI >= 500) & 
                           (nGene >= 250) & 
                           (log10GenesPerUMI > 0.80) & 
                           (mitoRatio < 0.20))

The error:
Error in CellsByIdentities(object = object, cells = cells) :
Cannot find cells provided

Could it be that the meta data with renamed variables are causing the error?

Intergration pipeline led to background expression on TSNE

Hello,

I have been following some of the tutorial provided by hbc training specifically on integrating different datasets: https://hbctraining.github.io/scRNA-seq/lessons/06_SC_SCT_and_integration.html

I believe I have encountered a slight issue. I followed much of the code that was given on the page; I had all of samples in one seuratobject and I split them then performed SCtransformation on EACH SEPARATELY(NOTE I didn;t regress out cell cylce):

split_srt <- SplitObject(sample.merge, split.by = "Sample.Name")

for (i in 1:length(split_srt)) {
split_srt[[i]] <- NormalizeData(split_srt[[i]], verbose = TRUE)
split_srt[[i]] <- SCTransform(split_srt[[i]], vars.to.regress = c("percent.MT"))
}

I then performed the suggested integration steps:

integ_features <- SelectIntegrationFeatures(object.list = split_srt,
nfeatures = 3000)

split_srt <- PrepSCTIntegration(object.list = split_srt,
anchor.features = integ_features)

integ_anchors <- FindIntegrationAnchors(object.list = split_srt,
normalization.method = "SCT",
anchor.features = integ_features)

seurat_integrated <- IntegrateData(anchorset = integ_anchors,
normalization.method = "SCT")

Running a PCA and TSNE yield dimensionality reduction that looked quite integrated:

image

But the issue is when I try to find marker genes, it appears that expression of most genes is seen as background; IE there are no white dots on a featureplot:

seurat_integrated <- FindNeighbors(seurat_integrated,dims = 1:30)
seurat_integrated <- FindClusters(seurat_integrated, resolution = 0.5)

Merged.markers <- FindAllMarkers(seurat_integrated, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)

plotting some of these markers

image

image

I am slightly unsure what I have done wrong/if I missed some steps. I would greatly appreciate any help I get.

need a new FindMarkers lesson

this new lesson will incorporate content from single sample marker identification + integration marker identification.

Since we will be using integrated data we will use FindConservedMarkers and run on all clusters, for clusters that have few cells per group this will fail giving us a chance to run/introduce FindAllMarkers as well

Having two 2-day workshops

Add cellranger and the process of getting counts as a separate 2-day workshop (kind of like how we have Intro to RNA-seq + DGE). Include Intro plus experiemental design.

The second workshop would be R-based starting with counts. Cellranger is not a prerequisite for this second. Include some points from Intro here too.

Accessing raw data after normalization

we should add a markdown? or some text about how we can specify whether to use raw counts or normalized. The following code will allow us to extract the raw counts after normalization:

RNA_raw_assay <- seurat_integrated@assays$RNA@counts

seurat_integrated[['RNA_raw']] <- CreateAssayObject(counts = RNA_raw_assay)

RNA_norm_assay <- seurat_integrated@assays$RNA@data

seurat_integrated[['RNA_norm']] <- CreateAssayObject(counts = RNA_norm_assay)

DefaultAssay(seurat_integrated) <- "RNA_norm"

QCMetrics error in DGE lesson

Zhu got the error: 'calculateQCMetrics' is defunct. Use 'perCellQCMetrics' instead. See help("Defunct")

She tried perCellQCMetrics but it is not the same as calculateQCMetrics - need to look into this error (especially using R 4.0)

start the analysis with integration

After integration, it feels a bit redundant. Maybe only go through with integration using sctransform.

Since we have two samples, start with integration of the two samples. Have a section that describes a single sample scenario (and the differences)

Error in calculating proportion of reads mapping to mitochondrial transcripts

https://github.com/hbctraining/scRNA-seq/blob/master/lessons/mitoRatio.md
I ran into a problem when I did the last step.
metadata$mtUMI <- Matrix::colSums(counts[which(rownames(counts) %in% mt),], na.rm = T)
error in evaluating the argument 'x' in selecting a method for function 'colSums': object of type 'closure' is not subsettableI am not familiar with the R language, and I have tried many ways to solve it. I am not familiar with the R language, and I have tried many ways to solve it. Hope someone can help me. Thank you.

QC lesson: change the seurat object name?

In QC lesson, merged unprocessed seurat object was saved as raw_seurat.RData. Filtered seurat object was saved as seurat_raw.RData. The names are confusing for students.

Warning messages in RunPCA()

#1. Warning in irlba(A = t(x = object), nv = npcs, ...) :
You're computing too large a percentage of total singular values, use a standard svd instead.

#2. In PrepDR(object = object, features = features, verbose = verbose) :
The following 15 features requested have not been scaled (running reduction without them): RAD51, CDC45, E2F8, DTL, EXO1, UHRF1, ANLN, GTSE1, NEK2, HJURP, DLGAP5, PIMREG, KIF2C, CDC25C, CKAP2L

Include as note, with how to run with svd instead

split Intro markdown

Split Introduction to scRNA-seq markdown (including replicates from DGE too) AND present before Sarah.

Have the raw counts to matrix content to be presented after Sarah

take out reticulate from install instructions

if mention of it in the lessons - remove it.

our new UMAP install instructions do not require this library.

To our README add a link to the UMAP installation markdown - students will need this for the workshop.

Update packages needed for installation

Some packages are no longer needed for the current analysis workflow: Matrix.utils, devtools, AnnotationHub, ensembldb. They could be removed from pre-work installation instruction. Note that AnnotationHub and ensembldb are still needed if people want to generate annotation file themselves.

How to split Seurat objec according to sample or condition

Thank you so much for your sharing,it has benefited me a lot.
I have 10 samples in 2 condition, every conditon has 5 samples. So when to integrate data to analysis.,How to split?
Should I use the condition or the 10 sample in Seurat.
data.list <- SplitObject(data, split.by = "sample") or by condition.
Looking forward to your reply

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.