carpentries-incubator / bioc-rnaseq Goto Github PK
View Code? Open in Web Editor NEWAnalysis and Interpretation of Bulk RNA-Seq Data using Bioconductor
Home Page: https://carpentries-incubator.github.io/bioc-rnaseq
License: Other
Analysis and Interpretation of Bulk RNA-Seq Data using Bioconductor
Home Page: https://carpentries-incubator.github.io/bioc-rnaseq
License: Other
I think emphasis could also be put on how to use visualization tools for RNA-Seq data and analysis output. I namely think about iSEE
or GeneTonic
.
For consistency with the suggestions made in episode 2, the assembled SE object in episode 3 should go in output/
rather than in data/
.
I added these extras to episode 4 at the BioC2023 workshop and they should be added in:
limma::plotDensities()
Glimma:glimmaMDS()
At the BioC2023 workshop, I jumped from episode 5 DE analysis to episode 8 to explore design matrices. My rationale was that episode 5 was just showing the mechanics of how DESeq2 works and how to pull out contrasts and to that end we had to throw in a design matrix of some sort without actually really considering what the coef/contrasts were measuring. [edited out something I had wrong before!]
I think we could leave episode 5 mostly as it is, but kind of gloss over what each coef/contrast is measuring other than generally time differences or sex differences. Then use ExploreModelMatrix to figure out what it was actually measuring, stepping through the simple to the more complex. Then write a new episode of how to analyze the dataset depending on what information/contrasts do we want and how to pull them out. Show examples of using DESeq2's contrast = c("time","Day8","Day0")
but possibly as a single factor with 6 levels to easily pull out specific pairwise comparisons. And maybe even how to do an interaction test. [NOTE: when I live-coded this in class, I resorted to limma's makeContrasts()
in order to get a numeric contrast vector to give to contrast =
.
In episode 5 I also added in an example of how to write out your results to a .csv or .tsv file (Examples in Day2_08-01-2023.R at https://uofi.box.com/s/o71rrvlfqb84seewl4n31wqkif13uy12)
FINAL THOUGHT: Also could go from episode 1, experimental design, directly into ExploreModelMatrix. The rationale being you need to understand how you will analyze your experiment before you actually conduct your experiment to make sure you don't confound anything or can't answer the question you are trying to answer. ExploreModelMatrix doesn't need data, just hypothetical samples belonging to various experimental factors. However, this could possibly break the attendees' brains by having it so early!
Add explanatory text to the differential expression episode. Some suggestions are provided in 'Contribute' boxes.
Add links to specific bioc-project sections in the relevant intro sections.
Same for bioc-intro
@lgatto @csoneson asking you this before kicking it upstairs. I tried to do the re-organization/collapse of the episodes which involved re-naming most of the espisode .Rmd files, which I did with git mv
. All seemed to go well but when I made the pull request https://github.com/carpentries-incubator/bioc-rnaseq/actions/runs/5003856029/jobs/8965634091?pr=42 there is an error that The /home/runner/work/bioc-rnaseq/bioc-rnaseq/episodes directory must have (R)markdown files
. I get the same error when I try to build locally:
> sandpaper::serve()
Error: The C:/GitHubRepos/bioc-rnaseq/episodes directory must have (R)markdown files
The files all have .Rmd so what could be the problem?
Add exercises to the differential expression episode.
Either add a short introduction to GRanges in episode 3, or alternatively just put the information in the rowData
. Later episodes use the chromosome information.
List of things still needing to be done:
It would be a bit of an overhaul, but we should consider switching everything over to tidySummarizedExperiment. This would help simplify the ggplot function coding. Would we also want to switch all transformations/subsetting to tidyverse instead of baseR?
Update Episode 2: "RStudio Project and Experimental Data":
Perhaps as an exercise
iSEE wasn't that useful with only 2 PCs in the sce object. We could also add in Glimma as quick alternative to just the clustering (and gives stand-alone html which can be more practical for some people than shiny).
Also consider removing the challenge to convert a list representation of gene sets to a data frame representation, since we demonstrate exactly how this is done just before the challenge.
Add a workflow diagram clarifying the overall analysis process and which type of data (counts, normalized/transformed values) is used for each step.
All other episodes have subheading except 1
In Finding the reference sequences section of Episode 1, remove UCSC and replace it with NCBI, which is much more heavily used in the US
The experimental design episode contains several examples of experimental designs, but needs some explanatory text.
Is pre-alpha still the most appropriate description of the maturity of this lesson? Looking through the lesson site, the content looks quite mature, and if I recall correctly it has been taught a few times already? If so, it would be better marked as alpha or even beta, if you think it is ready for other Instructors to use.
Write a short background/overview explaining how the data is generated, QC steps and gene expression quantification.
Add exercises to the exploratory data analysis episode.
Add explanatory text to data import episode
Hello, everyone.
The Introduction to RNA-seq is currently empty, so I would like to contribute to it.
As far as I understood it, this episode should contain instructions on how to go from raw FASTQ files to a matrix of transcript abundances, including pre-processing steps (e.g., sequence QC, trimming adapters and low-quality sequences, etc).
However, as there are several options of software tools to use in each step of the pipeline, I think we should first agree on a workflow to use. I think we can build on the Bioc workflow package rnaseqGene. My suggested workflow would be:
I'd love to hear what you all think.
Best,
Fabricio
Add diagnostic exercises to the experimental design episode.
https://github.com/swcarpentry/FIXME
https://swcarpentry.github.io/FIXME.
Change the color palette in the barplot of library sizes in episode 4, so that male and female samples have different shades of the same color.
Hello, I noticed that https://github.com/carpentries-incubator/bioc-rnaseq/actions/runs/5116496804 is failing due to an error in the JS of a dependency.
This will prevent updates to the package cache and the workflows from coming in. Please follow the instructions at https://carpentries.github.io/sandpaper-docs/update.html#via-r to update the workflows using the latest version of {sandpaper}.
Note: this is also an issue for the other bioc repositories as well.
Episode 1 does not cover what is listed in the questions, particularly "What are the different choices to consider when planning an RNA-seq experiment?". This episode needs to be greatly expanded to include information on experimental design considerations and batch effect avoidance in additional to sequencing and quantification options. See Harvard's excellent https://hbctraining.github.io/Intro-to-rnaseq-hpc-salmon-flipped/lessons/02_experimental_planning_considerations.html and credit them if re-using anything. I also have many slides/graphics that could be added.
To illustrate a typical (and a not-so-good) RNA-seq FastQC report, see if we can find links to public ones.
rowData(dds)
and explore again with iSEE
in episode 5 (can use iSEEde
once it is released)We currently have the github.io landing page titled "Summary and Setup", then episodes "Introduction and setup", "Introduction to RNA-seq" and "Setup". These should be condensed to be less redundant. Possible suggestion:
Landing page "Summary and Setup" has
Episode 1: "Introduction to RNA-Seq" has:
Episode 2: "RStudio Project and Experimental Data" has:
Episode 3 "Experimental design"
Episode 4: "Exploratory analysis and quality control" (previous episode 6)
results
output that the default significance threshold with DESeq2
is adj.P=0.1I tried to have sandpaper automatically create a "Download Lesson Handout" following @lgatto attempt in bioc-intro by adding options(sandpaper.handout = TRUE)
to the sandpaper-main.yaml. However, this didn't seem to do anything. Googling led me to here, which says you also need to add purl = TRUE
to all code chunks you want to include. I tried doing this for episode 3, but it also didn't seem to do anything when I built locally using sandpaper::serve()
. However, later after pushing all commits to this GitHub repo, merging the pull request and running the GitHub action, it did appear on the Episode sidebar of the github.io website.
purl=TRUE
to all/almost add R code chunksNeed to add authors here https://github.com/carpentries-incubator/bioc-rnaseq/blob/main/AUTHORS
Hi @zkamvar,
similarly to carpentries-incubator/bioc-intro#76 and carpentries-incubator/bioc-project#48, I would like to ask for your help with transitioning also this repository to the new workbench format.
Thanks a lot in advance, and let me know if I can do anything to help!
All the best,
Charlotte
Thanks for contributing! ❤️
If this contribution is for instructor training, please email the link to this contribution to
[email protected] so we can record your progress. You've completed your contribution
step for instructor checkout by submitting this contribution!
If this issue is about a specific episode within a lesson, please provide its link or filename.
Keep in mind that lesson maintainers are volunteers and it may take them some time to
respond to your contribution. Although not all contributions can be incorporated into the lesson
materials, we appreciate your time and effort to improve the curriculum. If you have any questions
about the lesson maintenance process or would like to volunteer your time as a contribution
reviewer, please contact The Carpentries Team at [email protected].
You may delete these instructions from your comment.
- The Carpentries
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.