hbctraining / scrna-seq Goto Github PK

Home Page: https://hbctraining.github.io/scRNA-seq/

SCSS 100.00%

scrna-seq's Introduction

THIS REPO IS ARCHIVED, PLEASE GO TO https://hbctraining.github.io/main FOR CURRENT LESSONS.

Single-cell RNA-seq analysis workshop

Audience	Computational skills required	Duration
Biologists	Introduction to R	2-day workshop (~10 hours of trainer-led time)

Description

This repository has teaching materials for a 2-day, hands-on Introduction to single-cell RNA-seq analysis workshop. This 2-day hands-on workshop will instruct participants on how to design a single-cell RNA-seq experiment, and how to efficiently manage and analyze the data starting from count matrices. This will be a hands-on workshop in which we will focus on using the Seurat package using R/RStudio. Working knowledge of R is required or completion of the Introduction to R workshop.

Learning Objectives

Undertand the considerations when designing a single-cell RNA-seq experiment
Discuss the steps involved in taking raw single-cell RNA-sequencing data and generating a count (gene expression) matrix
Compute and assess QC metrics at every step in the workflow
Cluster cells based on expression data and derive the identity of the different cell types present
Perform integration of different sample conditions

These materials are developed for a trainer-led workshop, but also amenable to self-guided learning.

Lessons

Click here for links to lessons and proposed schedule

Installation Requirements

Applications

Download the most recent versions of R and RStudio for your laptop:

R (version 3.6.0 or above)
RStudio

Packages for R

Note 1: Install the packages in the order listed below.

Note 2: When installing the following packages, if you are asked to select (a/s/n) or (y/n), please select “a” or "y" as applicable.

Note 3: All the package names listed below are case sensitive!

(1) Install the 10 packages listed below from CRAN using the install.packages() function.

tidyverse
Matrix
RCurl
scales
cowplot
devtools
BiocManager
Seurat**

Please install them one-by-one as follows:

install.packages("tidyverse")
install.packages("Matrix")
install.packages("RCurl")
& so on ...

** If you have trouble installing Seurat, please install multtest using the following lines of code, then try installing Seurat again:

install.packages("BiocManager")

BiocManager::install("multtest")

(2) Install the 4 packages listed below from Bioconductor using the the BiocManager::install() function.

SingleCellExperiment
AnnotationHub
ensembldb

Please install them one-by-one as follows:

BiocManager::install("SingleCellExperiment")
BiocManager::install("AnnotationHub")
& so on ...

(3) Finally, please check that all the packages were installed successfully by loading them one at a time using the library() function.

library(Seurat)
library(tidyverse)
library(Matrix)
library(RCurl)
library(scales)
library(cowplot)
library(SingleCellExperiment)
library(AnnotationHub)
library(ensembldb)

(4) Once all packages have been loaded, run sessionInfo().

sessionInfo()

These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

scrna-seq's People

Contributors

Stargazers

Watchers

Forkers

ruixiangliu danyking lizhaozhi biov ning-liang xhyuo naumenko-sa ssicreative83 jianzhangsmu aspirincode gianasco idelvalle marencc colorstorm m3hdad bacemdatascience yichangyu leonguos riccardogilmozzi bixbeta huuthoth shengxinbaixiaosheng xjyx studymeow ruixiangliu01 amrr101 vwtlin mtaom chen318liang biozhangjn etrope1 rekren weizhousjtu sridhar0605 zhaoliang0302 zhangyupisa srvaziri zhangbiao0902 weijie-guo chenmengpin croncakes ostae yixf-self volkan-ergin jiuxuan msq-123 bioflower kerwin12580 geng-lee maojie6509 eto-c hzaurzli masonalextian liupeng2117 dongxu-zheng xyfqwlzoe tuqiang2014 leachau drugintelligence lixiaopi1985 uiwjgensuali zhang-jiankun chenyilyu gaozenghong chenruipu irleader sdy2813 diennguyen8290 1995iowe yiluheihei tiramisutes zhiyil gnilihzeux songaqi woshijiaomu hasanalanya jiandewu tdw1221 morgen01 yyingchenyy nikolas73 shaneyyj beseasnow stdforml nkuyfq hongqin 1512474508 lilyanderssonlee zqw1103 kc-mcp yangchuhua harithaa-anandakumar ddche jackmineee pablormier xpl1986 huaichao2018 merckey duydn nina-lydia

scrna-seq's Issues

add a link to QC to show bad data

Maybe from the NGS single cell lessons - link to a markdown which show s what bad data would look like

update integration lesson with notes from Satija lecture

AND an overview of when to use integration

8 pooled samples versus 1 sample?

Download BAM from SRA and check the paper.

We are not sure if there are only 2 BAMS, because the data was pooled (as described in the study) or if individual samples were also supplied to SRA.

Update the setup markdown to reflect this

base decision tree

Maybe one for experimental design (expanding on what Sarah has)

and one for the analysis: QC, integration, marker id, differential expression, traj analysis

split the clustering markdown

this is the first lesson (not exploration). May not need to split if using sctransform (reassess as we develop!)

Split at "Determining PCs" sections

troubleshooting slide deck

slide deck to wrap-up like we do with RNA-seq/ChIP-seq

compress R objects for faster download

use readMM() to convert standard matrix into a sparse matrix

for creating count data object, you use readMM() function from the Matrix package to turn our standard matrix into a sparse matrix. However, the data I downloaded is not standard. from where I can download the standard data with zeros?

Add “label=T” to the plots for FeaturePlot()

so we have clusters labeled when presenting

Error in CellsByIdentities(object = object, cells = cells)

Hi,

I got an error while running the following code

# Filter out low quality reads using selected thresholds - these will change with experiment
filtered_seurat <- subset(x = merged_seurat, 
                         subset= (nUMI >= 500) & 
                           (nGene >= 250) & 
                           (log10GenesPerUMI > 0.80) & 
                           (mitoRatio < 0.20))

The error:
Error in CellsByIdentities(object = object, cells = cells) :
Cannot find cells provided

Could it be that the meta data with renamed variables are causing the error?

Check option to exit survey for 'very satisfied'

Intergration pipeline led to background expression on TSNE

Hello,

I have been following some of the tutorial provided by hbc training specifically on integrating different datasets: https://hbctraining.github.io/scRNA-seq/lessons/06_SC_SCT_and_integration.html

I believe I have encountered a slight issue. I followed much of the code that was given on the page; I had all of samples in one seuratobject and I split them then performed SCtransformation on EACH SEPARATELY(NOTE I didn;t regress out cell cylce):

split_srt <- SplitObject(sample.merge, split.by = "Sample.Name")

for (i in 1:length(split_srt)) {
split_srt[[i]] <- NormalizeData(split_srt[[i]], verbose = TRUE)
split_srt[[i]] <- SCTransform(split_srt[[i]], vars.to.regress = c("percent.MT"))
}

I then performed the suggested integration steps:

integ_features <- SelectIntegrationFeatures(object.list = split_srt,
nfeatures = 3000)

split_srt <- PrepSCTIntegration(object.list = split_srt,
anchor.features = integ_features)

integ_anchors <- FindIntegrationAnchors(object.list = split_srt,
normalization.method = "SCT",
anchor.features = integ_features)

seurat_integrated <- IntegrateData(anchorset = integ_anchors,
normalization.method = "SCT")

Running a PCA and TSNE yield dimensionality reduction that looked quite integrated:

But the issue is when I try to find marker genes, it appears that expression of most genes is seen as background; IE there are no white dots on a featureplot:

seurat_integrated <- FindNeighbors(seurat_integrated,dims = 1:30)
seurat_integrated <- FindClusters(seurat_integrated, resolution = 0.5)

Merged.markers <- FindAllMarkers(seurat_integrated, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)

plotting some of these markers

I am slightly unsure what I have done wrong/if I missed some steps. I would greatly appreciate any help I get.

update the answer key for QC

need a new FindMarkers lesson

this new lesson will incorporate content from single sample marker identification + integration marker identification.

Since we will be using integrated data we will use FindConservedMarkers and run on all clusters, for clusters that have few cells per group this will fail giving us a chance to run/introduce FindAllMarkers as well

Having two 2-day workshops

Add cellranger and the process of getting counts as a separate 2-day workshop (kind of like how we have Intro to RNA-seq + DGE). Include Intro plus experiemental design.

The second workshop would be R-based starting with counts. Cellranger is not a prerequisite for this second. Include some points from Intro here too.

Accessing raw data after normalization

we should add a markdown? or some text about how we can specify whether to use raw counts or normalized. The following code will allow us to extract the raw counts after normalization:

RNA_raw_assay <- seurat_integrated@assays$RNA@counts

seurat_integrated[['RNA_raw']] <- CreateAssayObject(counts = RNA_raw_assay)

RNA_norm_assay <- seurat_integrated@assays$RNA@data

seurat_integrated[['RNA_norm']] <- CreateAssayObject(counts = RNA_norm_assay)

DefaultAssay(seurat_integrated) <- "RNA_norm"

how many cells do we expect?

QC set-up: add unknown # of cells expected - look into paper and see if we can find this number

QCMetrics error in DGE lesson

Zhu got the error: 'calculateQCMetrics' is defunct. Use 'perCellQCMetrics' instead. See help("Defunct")

She tried perCellQCMetrics but it is not the same as calculateQCMetrics - need to look into this error (especially using R 4.0)

updating teaching laptop for UMAP

@mistrm82 did this - but double check to make sure it is using Anaconda and NOT requiring reticulate

start the analysis with integration

After integration, it feels a bit redundant. Maybe only go through with integration using sctransform.

Since we have two samples, start with integration of the two samples. Have a section that describes a single sample scenario (and the differences)

Error in calculating proportion of reads mapping to mitochondrial transcripts

https://github.com/hbctraining/scRNA-seq/blob/master/lessons/mitoRatio.md
I ran into a problem when I did the last step.
metadata$mtUMI <- Matrix::colSums(counts[which(rownames(counts) %in% mt),], na.rm = T)
error in evaluating the argument 'x' in selecting a method for function 'colSums': object of type 'closure' is not subsettableI am not familiar with the R language, and I have tried many ways to solve it. I am not familiar with the R language, and I have tried many ways to solve it. Hope someone can help me. Thank you.

finalized decision tree

QC lesson: change the seurat object name?

In QC lesson, merged unprocessed seurat object was saved as raw_seurat.RData. Filtered seurat object was saved as seurat_raw.RData. The names are confusing for students.

Warning messages in RunPCA()

#1. Warning in irlba(A = t(x = object), nv = npcs, ...) :
You're computing too large a percentage of total singular values, use a standard svd instead.

#2. In PrepDR(object = object, features = features, verbose = verbose) :
The following 15 features requested have not been scaled (running reduction without them): RAD51, CDC45, E2F8, DTL, EXO1, UHRF1, ANLN, GTSE1, NEK2, HJURP, DLGAP5, PIMREG, KIF2C, CDC25C, CKAP2L

Include as note, with how to run with svd instead

split Intro markdown

Split Introduction to scRNA-seq markdown (including replicates from DGE too) AND present before Sarah.

Have the raw counts to matrix content to be presented after Sarah

update images in marker identification

https://github.com/hbctraining/scRNA-seq/blob/master/lessons/09_merged_SC_marker_identification.md

Note on sc nuclear RNA-seq

Talk to Rory and Sarah

Remove all TSNE plots

For clustering and marker id

update the elbow plot

the elbow plot markdown (quantitave approach) needs a figure update

take out reticulate from install instructions

if mention of it in the lessons - remove it.

our new UMAP install instructions do not require this library.

To our README add a link to the UMAP installation markdown - students will need this for the workshop.

Update packages needed for installation

Some packages are no longer needed for the current analysis workflow: Matrix.utils, devtools, AnnotationHub, ensembldb. They could be removed from pre-work installation instruction. Note that AnnotationHub and ensembldb are still needed if people want to generate annotation file themselves.

How to split Seurat objec according to sample or condition

Thank you so much for your sharing,it has benefited me a lot.
I have 10 samples in 2 condition, every conditon has 5 samples. So when to integrate data to analysis.,How to split?
Should I use the condition or the 10 sample in Seurat.
data.list <- SplitObject(data, split.by = "sample") or by condition.
Looking forward to your reply

If pooled, separate the cells into the 8 samples

Find out from the paper if they have a list of barcodes to identify which cell came from which sample (of the 8)