hbctraining / dge_workshop_salmon Goto Github PK

View Code? Open in Web Editor NEW

65.0 65.0 47.0 61.03 MB

Home Page: https://hbctraining.github.io/DGE_workshop_salmon/

HTML 79.88% R 19.00% SCSS 1.12%

dge_workshop_salmon's People

Contributors

Stargazers

Watchers

dge_workshop_salmon's Issues

reg exp for extracting salmon files

For the code "samples <- list.files(path = "./data", full.names = T, pattern="\.salmon$")", it works when using pattern="salmon$". What is the purpose of using '\.' before it?

changing to AnnotationHub

The problem we encountered when trying to change to AnnotationHub is the one-to-many mappings of Ensembl to Entrez and the fact that it is stored as a list. Here is some code that will work if we choose to change it. If we change it would be worth exploring the difference between these and Ensv86 using AnnotationDbi.

library(AnnotationHub)
library(ensembldb)

# Connect to AnnotationHub
ah <- AnnotationHub()

# Query AnnotationHub
human_ens <- query(ah, c("Homo sapiens", "EnsDb"))

# Extract annotations of interest
human_ens <- human_ens[["AH64923"]]

# Extract gene-level information
genes(human_ens, return.type = "data.frame") %>% View()

# Create a gene-level dataframe (FOR LESSON)
annotations_ahb <- genes(human_ens, return.type = "data.frame")  %>%
  dplyr::select(gene_id, symbol, entrezid, gene_biotype) %>% 
  dplyr::filter(gene_id %in% res_tableOE_tb$gene)

# Wait a second, we don't have one-to-one mappings!
class(annotations_ahb$entrezid)
which(map(annotations_ahb$entrezid, length) > 1)

# So which one is right? And why do we have this problem?

# Okay let's just keep the first entrezID in the case that there are two mappings
annotations_ahb$entrezid <- map(annotations_ahb$entrezid,1) %>%  unlist()


# Determine the indices for the non-duplicated genes
non_duplicates_idx <- which(duplicated(annotations_ahb$symbol) == FALSE)

# Return only the non-duplicated genes using indices
annotations_ahb<- annotations_ahb[non_duplicates_idx, ]

modify README

use this as a template and update the link to lessons - https://github.com/hbctraining/Intro-to-ChIPseq/blob/master/README.md

It is okay if there is not an actual schedule, as long as each lesson has a time associated with it.

Dispersion - consolidate some of the descriptions (MP)

shorten the overview lesson (01)

taking some guidance from what we have in training modules: https://github.com/hbctraining/Training-modules/blob/master/planning_successful_rnaseq/lessons/count_modeling.md

update FA lesson for SPIA

biocLite --> BiocManager

Add figure for independent filtering

Discussion of possibly shortening Annotation session in the workshop

mean-variance plot

For the mean/variance count plot, will it be more clearer to set x and y axis of same range, so that red line is 45 degree?

WGCNA package link not working

WGCNA package link in functional analysis lesson.

Extract annotations of interest: more explanation

Problems arise when using gene() function

Reduce GO and add table of databases

See table in slide 4 here - https://hbctraining.github.io/Training-modules/planning_successful_rnaseq/slides/functional_analysis_mp.pdf

Consider reducing GO content and making the intro to FA more a about databases.

Change to log2FC shrinkage to apeglm following advice in Mike

make this a 1.5 day workshop?

AnnotationHub() note

Explain what will happen if answering yes/no to "AnnotationHub does not exist, create directory?"

add an instance of using save() and load()

Might be nice to have them save an object - e.g. save the ego object.

FA - assess what impact changing AnnotationDbi to AnnotationHub

Shorten GO part of functional analysis and some of the code (MP)

Reorder Wald test lesson

Discuss statistical model ->

Discuss the output from the model being the log2 fold changes with standard error estimates ->

Wald test ->

P values / multiple test correction ->

Log2 shrinkage

adjust timings for Wald test and R refresher

update a markdown report link

In workflow summarization there is a broken link. We need to update with an actual RMarkdown report

sleuth pca function plots pca on non-log transformed counts

code below to use log transformed values

`# Extract data from object
norm_counts <- sleuth_to_matrix(de, "obs_norm", "est_counts")
log_norm_counts <- de$transform_fun_counts(norm_counts)

Compute PCs

pc <- prcomp(t(log_norm_counts))
plot_pca <- data.frame(pc$x, summarydata)

Plot with sample names used as data points

ggplot(plot_pca, aes(PC1, PC2)) +
theme_bw() +
geom_point(aes(color=genotype)) +
xlab('PC1') +
ylab('PC2') +
scale_x_continuous(expand = c(0.3, 0.3)) +
#geom_text_repel(aes(x=PC1, y=PC2), label=name) +
theme(plot.title = element_text(size = rel(1.5)),
axis.title = element_text(size = rel(1.5)),
axis.text = element_text(size = rel(1.25)))`

An explanation for why the AnnotationDbi has so many more genes

The orgDb reduces genes to almost half ~25K; whereas EnsDb (albeit older release) retains 50K. Add an explanation and look into the differences

Run through functional analysis

Code was modified to use the AnnotationHub dataframe from the annotations lesson. Need someone to run through it and make sure it works.

I have done so once and put figures here: Dropbox (Harvard University)/HBC Team Folder (1)/Teaching/Courses/DGE_salmon/

Shorten annotations to not including Annotationdbi

change the wording of refresher boxplot question

"Plot a boxplot of the mean expression of Myc for the KO and WT samples using theme_minimal() and give the plot new axes names and a centered title."

It's confusing because we don't need them to compute a mean

Arguments and functions each on own line and comment all code chunks

change back to vst from rld?

change linked file in FA lesson

Currently an .RData object is linked for the annotation df. It contains many more objects than we need, so we should replace this with csv file of the annotations

set.seed()

for any computation using permutations or random sampling we should set.seed or demo set.seed so that we get the same results each time

Apply to map?

Change instances of apply() to purrr::map() for consistency in the 01 lesson.

Include a note about detach() getting an error if dpylr is not attached

update likelihood ratio definition

Good resource found by JIhe:

https://slideplayer.com/slide/12369265/73/images/25/Likelihood+Ratio+Test+vs.+Wald+Test.jpg

Add a link in QC lesson to single cell materials

change sleuth to mike's new package ?

https://f1000research.com/articles/7-952/v1

Need to look into this!

change the scales of mean versus variance plot

so red line goes through the middle - thanks @jihe-liu

Wald test lesson (05_ - Meeta)

will modify logFC section remove equation remove notes about older DESeq2 remove NOTE about LRT. We are teaching it in the lessons.

Remove design <- ~ age + treat_sex part of the design formula (MP)

apply -> map

Convert all uses of apply() to map()-
I have only noticed it here so far: https://github.com/hbctraining/DGE_workshop_salmon/blob/master/lessons/01_DGE_setup_and_overview.md#how-do-i-know-if-my-data-should-be-modeled-using-the-poisson-distribution-or-negative-binomial-distribution

reduce viz lesson

Remove the top20 plot and clean up code(?).

Update workflow image with log2FC shrinkage after DE testing

do we set lfcThreshold in results or lfcShrink or both?

question from the students, should we add this as a note?

https://support.bioconductor.org/p/110307/

coef vs contrast?

https://support.bioconductor.org/p/98833/#98837

See this post about using one or the other.

Also see vignette for more info ---

PCA link not working

In "Details regarding PCA are given below (based on materials from StatQuest, and if you would like a more thorough description, we encourage you to explore StatQuest's video and our longer lesson", the last link for "our longer lesson" is not working.

# Write the counts to an object
data <- txi$counts %>% 
  ....

instead of

# Write the counts to file
data <- txi$counts %>% 
  ....

clusterprofiler arguments

Change to ont = "ALL" instead of just "BP" when running clusterprofiler and creating ego object.

hbctraining / dge_workshop_salmon Goto Github PK

dge_workshop_salmon's People

Contributors

Stargazers

Watchers

Forkers

dge_workshop_salmon's Issues

Compute PCs

Plot with sample names used as data points

Recommend Projects

Recommend Topics

Recommend Org