Git Product home page Git Product logo

dge_workshop_salmon's People

Contributors

hackdna avatar jihe-liu avatar kant avatar marypiper avatar mistrm82 avatar rkhetani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dge_workshop_salmon's Issues

reg exp for extracting salmon files

For the code "samples <- list.files(path = "./data", full.names = T, pattern="\.salmon$")", it works when using pattern="salmon$". What is the purpose of using '\.' before it?

changing to AnnotationHub

The problem we encountered when trying to change to AnnotationHub is the one-to-many mappings of Ensembl to Entrez and the fact that it is stored as a list. Here is some code that will work if we choose to change it. If we change it would be worth exploring the difference between these and Ensv86 using AnnotationDbi.

library(AnnotationHub)
library(ensembldb)

# Connect to AnnotationHub
ah <- AnnotationHub()

# Query AnnotationHub
human_ens <- query(ah, c("Homo sapiens", "EnsDb"))

# Extract annotations of interest
human_ens <- human_ens[["AH64923"]]

# Extract gene-level information
genes(human_ens, return.type = "data.frame") %>% View()

# Create a gene-level dataframe (FOR LESSON)
annotations_ahb <- genes(human_ens, return.type = "data.frame")  %>%
  dplyr::select(gene_id, symbol, entrezid, gene_biotype) %>% 
  dplyr::filter(gene_id %in% res_tableOE_tb$gene)

# Wait a second, we don't have one-to-one mappings!
class(annotations_ahb$entrezid)
which(map(annotations_ahb$entrezid, length) > 1)

# So which one is right? And why do we have this problem?

# Okay let's just keep the first entrezID in the case that there are two mappings
annotations_ahb$entrezid <- map(annotations_ahb$entrezid,1) %>%  unlist()


# Determine the indices for the non-duplicated genes
non_duplicates_idx <- which(duplicated(annotations_ahb$symbol) == FALSE)

# Return only the non-duplicated genes using indices
annotations_ahb<- annotations_ahb[non_duplicates_idx, ]

mean-variance plot

For the mean/variance count plot, will it be more clearer to set x and y axis of same range, so that red line is 45 degree?

AnnotationHub() note

Explain what will happen if answering yes/no to "AnnotationHub does not exist, create directory?"

Reorder Wald test lesson

Discuss statistical model ->

Discuss the output from the model being the log2 fold changes with standard error estimates ->

Wald test ->

P values / multiple test correction ->

Log2 shrinkage

sleuth pca function plots pca on non-log transformed counts

code below to use log transformed values

`# Extract data from object
norm_counts <- sleuth_to_matrix(de, "obs_norm", "est_counts")
log_norm_counts <- de$transform_fun_counts(norm_counts)

Compute PCs

pc <- prcomp(t(log_norm_counts))
plot_pca <- data.frame(pc$x, summarydata)

Plot with sample names used as data points

ggplot(plot_pca, aes(PC1, PC2)) +
theme_bw() +
geom_point(aes(color=genotype)) +
xlab('PC1') +
ylab('PC2') +
scale_x_continuous(expand = c(0.3, 0.3)) +
#geom_text_repel(aes(x=PC1, y=PC2), label=name) +
theme(plot.title = element_text(size = rel(1.5)),
axis.title = element_text(size = rel(1.5)),
axis.text = element_text(size = rel(1.25)))`

Run through functional analysis

Code was modified to use the AnnotationHub dataframe from the annotations lesson. Need someone to run through it and make sure it works.

I have done so once and put figures here: Dropbox (Harvard University)/HBC Team Folder (1)/Teaching/Courses/DGE_salmon/

change the wording of refresher boxplot question

"Plot a boxplot of the mean expression of Myc for the KO and WT samples using theme_minimal() and give the plot new axes names and a centered title."

It's confusing because we don't need them to compute a mean

change linked file in FA lesson

Currently an .RData object is linked for the annotation df. It contains many more objects than we need, so we should replace this with csv file of the annotations

set.seed()

for any computation using permutations or random sampling we should set.seed or demo set.seed so that we get the same results each time

Apply to map?

Change instances of apply() to purrr::map() for consistency in the 01 lesson.

Wald test lesson (05_ - Meeta)

will modify logFC section remove equation remove notes about older DESeq2 remove NOTE about LRT. We are teaching it in the lessons.

PCA link not working

In "Details regarding PCA are given below (based on materials from StatQuest, and if you would like a more thorough description, we encourage you to explore StatQuest's video and our longer lesson", the last link for "our longer lesson" is not working.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.