jaleesr / trendcatcher Goto Github PK

TrendCatcher is an open source R-package that allows users to systematically analyze and visualize time course data. Please cite "Temporal transcriptomic analysis using TrendCatcher identifies early and persistent neutrophil activation in severe COVID-19" by Xinge Wang et al published in JCI Insight (2022) - https://insight.jci.org/articles/view/157255

R 100.00%

rna-seq transcriptomics transcriptome temporal-data visualization r single-cell

trendcatcher's Introduction

TrendCatcher

Introduction

TrendCatcher is a versatile R package for identifying dynamic differentially expressed genes (DDEGs) in RNA-seq longitudinal studies. A time course experiment is a widely used design in the study of cellular processes such as cell differentiation or response to external stimuli. Temporal changes to the gene expression, such as mRNA, is the key to characterizing such biological processes. Here, we present a versatile R package named TrendCatcher to identify the dynamic differentially expressed genes along the biological process time course.

Installation

Install latest development version from GitHub (requires devtools package):

if (!require("devtools")) {
  install.packages("devtools")
}
devtools::install_github("jaleesr/TrendCatcher", dependencies = TRUE, build_vignettes = FALSE)

Documentation and Demo Scripts

Instructions, documentation, and tutorials can be found at:

https://jaleesr.github.io/TrendCatcher

A PDF manual TrendCatcher_1.0.0.pdf can be found in the repository.

TrendCatcher Framework Overview

TrendCatcher requires 2 main inputs: the raw count table C of a temporal study with a dimension of m × n, where m denotes the number of genes and n denotes the number of samples, and a user-defined baseline time variable T, such as “0 hour”. Since samples may have different sequencing depths and batch effect, TrendCatcher integrates with limma and provides preprocessing steps, such as batch correction and normalization. For scRNA-Seq data sets, TrendCatcher extracts cells for each cell type annotated in the meta data slot of Seurat object and converts it into a cell type–specific “pseudobulk” time course RNA library. Based on a user-specified threshold, genes of relatively low abundance are removed from the count table, reads are normalized, and batch effects are removed. TrendCatcher’s core algorithm is composed of 5 main steps: (a) baseline fluctuation confidence interval estimation, (b) model dynamic longitudinal count, (c) time point dynamic P value calculation, (d) gene-wise dynamic P value calculation, and (e) break point screening and gene-wise dynamic pattern assignment. Mathematical details will be expanded in the following sections.

For the output of TrendCatcher, there are mainly 2 components: a master table and a set of functions for versatile visualization purposes. The master table contains all the dynamic details of each single gene, including its dynamic P value, its break point location time, and its dynamic trajectory pattern. In addition to the master table, TrendCatcher produces 5 main types of visualizations: (a) a figure showing the observed counts and fitted splines of each gene, (b) genes trajectories from each trajectory pattern group, (c) a hierarchical pie chart that represents trajectory pattern composition, (d) a TimeHeatmap to infer trajectory dynamics of top dynamic biological pathways, and (e) a 2-sided bar plot to show the top most positively and negatively changed (averaged accumulative log2FC) biological pathways.

Some highlights of using TrendCatcher.

Some quick examples to show how to use TrendCatcher.

1. Identify dynamic differentially expressed genes (DDEGs) and generate master.list object

library("TrendCatcher")
example.file.path<-system.file("extdata", "Brain_DemoCountTable.csv", package = "TrendCatcher")
master.list<-run_TrendCatcher(count.table.path = example.file.path, 
baseline.t = 0,
time.unit = "h",
min.low.count = 1,
para.core.n = NA,
dyn.p.thres = 0.05)

2. Draw individual gene trajectory with observed data and fitted data

gene.symbol.arr<-unique(master.list$master.table$Symbol)[1:6]
p<-draw_GeneTraj(master.list = master.list, gene.symbol.arr = gene.symbol.arr, ncol = 3, nrow = 2)
p

3. Group genes based on their trajectory pattern type

draw_TrajClusterGrid(
  master.list,
  min.traj.n = 10,
  save.as.PDF = NA,
  pdf.width = 10,
  pdf.height = 10
)

4. Visualize biological pathway dynamic progamming using TimeHeatmap

Generate a TimeHeatmap to visualize the most dynamic top N biological pathways enrichment change over time, we designed a window-sliding strategy to capture all the up-regulated or down-regulated genes for each time interval.

time_heatmap<-draw_TimeHeatmap_GO(master.list = master.list, logFC.thres = 0, top.n = 10, dyn.gene.p.thres = 0.05, keyType = "SYMBOL", OrgDb = "org.Mm.eg.db", ont = "BP", term.width = 80, GO.enrich.p = 0.05, figure.title = "TimeHeatmap")  
print(time_hetmap$time.heatmap)

5. Compare dynamic curves of a certain biological pathway between two experimental groups

Same biological pathway may show up in the 2 TimeHeatmap objects from two different experimental groups. To compare its temporal behavior, we calcualted the area difference between two curves and test the seperation significance using permutation approach.

perm_output<-draw_CurveComp_Perm(master.list.1 = master.list.severe, 
                                 master.list.2 = master.list.moderate, 
                                 ht.1 = ht.severe, 
                                 pathway = "neutrophil activation", 
                                 group.1.name = "severe", 
                                 group.2.name = "moderate", 
                                 n.perm = 100, 
                                 parall = FALSE, 
                                 pvalue.threshold = 0.05)
                                 
perm_output$plot

Documentation and Further details

Instructions, documentation, and tutorials can be found at:

https://jaleesr.github.io/TrendCatcher

To cite TrendCatcher

Wang X, Sanborn MA, Dai Y, Rehman J. Temporal transcriptomic analysis using TrendCatcher identifies early and persistent neutrophil activation in severe COVID-19. JCI Insight. 2022 Apr 8;7(7):e157255. doi: 10.1172/jci.insight.157255. PMID: 35175937; PMCID: PMC9057597.

trendcatcher's People

Contributors

Stargazers

Watchers

Forkers

musculusmus ivetyorda

trendcatcher's Issues

guide for converting Seurat object to trendcatcher data?

I can't find any info to do this? Could you share?

Raw or normalized microarray data for the analysis

Hi,

your tool is really outstanding. thanks for your handworks.

Can we use raw or normalized microarray data instead of bulk RNAseq for the analysis? I really appreciate your response.

best regards,

Bugie

"missing value in parameter"

Hello!
Thanks so much for creating this package! I'm looking forward to seeing my results after running it successfully. However, I am getting an error after trying to run the main function with my data. The file passes all of the first steps, but fails in the middle of running (60% done) and shoots this message: "Error in { : task 1975 failed - "missing value in parameter". I've looked at the csv and everything looks correct, so I'm wondering what parameter it means? If you could give me some insight into what it wants, I would really appreciate it.
Thanks!

task 2080 failed - "missing value in parameter

Thank you for your great jobs.
I am now writing our paper in which we would like to use TrendCatcher for our longitudinal Bulk RNA-seq data.

The following error output come out once I tried "run_TrendCatcher".
##############################################################
master.list<-run_TrendCatcher(count.table.path = file.path, baseline.t = 0, time.unit = "h", min.low.count = 1, para.core.n = NA, dyn.p.thres = 0.05)
##############################################################
Read count table.
Passed Format Check. Count table is 12 Samples, 60325 Genes
Found 39780 low count genes.
Count table format correct, finished loading.
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
Run TrendCatcher.
|===============================================================================| 100%Error in { : task 2080 failed - "missing value in parameter"
Timing stopped at: 21.74 14.07 516.2
##############################################################

I have achieved "run_TrendCatcher" when I used your "example.file.path" (system.file("extdata", "Brain_DemoCountTable.csv", package = "TrendCatcher")).

I would be happy if you could help me solve the issue.

Sincerely,

"Error reading from connection"

Hi,

I am working on the analysis of a time-course RNA-seq in wheat and I would love to apply your package to identify expression patterns over time.

While trying to run TrendCatcher, the job is aborted and I got this error message:

"Read count table.
Passed Format Check. Count table is 24 Samples, 63351 Genes
Found 1352 low count genes.
Count table format correct, finished loading.
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
Run TrendCatcher.
|======== | 5%Error in unserialize(socklist[[n]]) : error reading from connection
Timing stopped at: 2.41 0.33 82.64"

Note that preprocess_TrendCatcher works without any problem. Could it be due to resource limitations (CPU, RAM)?

When I ran the demo it also worked without any issue.

Any thoughts on that?

The additonal factor along with time course

To feed the run_TrendCather function, the column name of the CSV file contains all the time information ("ProjectName_Time_Rep1").

Besides the time factor, My data include genotype information, do you think is it possible to find the genotype effect in the context of time?

Is only count table fitting the format of input?

Hi! guys,
Thank you for your great work.
However, I am a little confused if TPM or FPKM, the kind of transformed expressions, could be the input of the function, ‘run_TrendCatcher’. If not, how could I use TPM or FPKM as input?

Thank u in advance.

gss error Issue with three time point samples

Hi there,

Thank you for a great package for time series analysis.
I was able to process my samples that had four time points but I was unable to process my samples with three time points. when I run run_TrendCatcher I am getting the following error.

Read count table.
Passed Format Check. Count table is 12 Samples, 8780 Genes
Found 101 low count genes.
Count table format correct, finished loading.
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
Run TrendCatcher.
`| | 0%Warning in .Internal(gc(verbose, reset, full)) :
closing unused connection 13 (<-localhost:11350)
Warning in .Internal(gc(verbose, reset, full)) :
closing unused connection 12 (<-localhost:11350)
Warning in .Internal(gc(verbose, reset, full)) :
closing unused connection 11 (<-localhost:11350)
Warning in .Internal(gc(verbose, reset, full)) :
closing unused connection 10 (<-localhost:11350)
Warning in .Internal(gc(verbose, reset, full)) :
closing unused connection 9 (<-localhost:11350)
Warning in .Internal(gc(verbose, reset, full)) :
closing unused connection 8 (<-localhost:11350)
Warning in .Internal(gc(verbose, reset, full)) :
closing unused connection 7 (<-localhost:11350)
Warning in .Internal(gc(verbose, reset, full)) :
closing unused connection 6 (<-localhost:11350)
Warning in .Internal(gc(verbose, reset, full)) :
closing unused connection 5 (<-localhost:11350)
Warning in .Internal(gc(verbose, reset, full)) :
closing unused connection 4 (<-localhost:11350)
Warning in .Internal(gc(verbose, reset, full)) :
closing unused connection 3 (<-localhost:11350)
|=================================================================| 100%Timing stopped at: 5.052 1.569 29.84
Show Traceback

Error in { : task 3 failed - "gss error in gssanova: unpenalized terms are linearly dependent"`

I would appreciate if you can point me to where the error is coming from ? Is it something with gss package ?

Download error

Hi, I'm getting this error when trying to download your package:

> devtools::install_github("jaleesr/TrendCatcher", dependencies = TRUE, build_vignettes = FALSE)
Downloading GitHub repo jaleesr/TrendCatcher@HEAD
Error in utils::download.file(url, path, method = method, quiet = quiet,  : 
  download from 'https://api.github.com/repos/jaleesr/TrendCatcher/tarball/HEAD' failed

Any help?

draw_TimeHeatmap_GO

Hi,
My data went through the run_TrendCatcher step.

However, I came across the following error when I executed the draw_TimeHeatmap_GO. It seems a mistake due to the gene symbol and ENSEMBL. I tried to add different columns using the same gene symbol value, such as Symbol or SYMBOL.

Can you help me look at my data(master.list)? What's wrong with the gene name?

DG_master.list.rds.zip

Processing up-regulated genes for time window 2month-6month
Processing up-regulated genes for time window 6month-8month
Processing up-regulated pathways for time window 2month-6month
--> No gene can be mapped....
--> Expected input gene ID: Paqr8,Prl8a1,Tssk5,Morn2,Mapk15,Ascl2
--> return NULL...
Processing up-regulated pathways for time window 6month-8month
--> No gene can be mapped....
--> Expected input gene ID: Piwil2,Prm1,Clgn,Cfap73,T,Mrm2
--> return NULL...
Error in act.go.list[[i]] : subscript out of bounds

Shuai

draw_TimeHeatmap_GO function

I performed the function of "draw_TimeHeatmap_GO function" using my own "master.list" data.
The following error comes back; "Error in is.na(t.arr) || is.na(t.unit) : 'length = 4' in coercion to 'logical(1)'"

I also tried it using your demo.master.list as the follows.
Then, I got the similar error message.
I would be happy if you could let me know how to solve it.

demo.master.list.path<-system.file("extdata", "BrainMasterList_Symbol.rda", package = "TrendCatcher")
load(demo.master.list.path)
time_heatmap=draw_TimeHeatmap_GO(
  master.list,
  logFC.thres = 0,
  top.n = 10,
  dyn.gene.p.thres = 0.05,
  keyType = "SYMBOL",
  OrgDb = "org.Mm.eg.db",
  ont = "BP",
  term.width = 80,
  GO.enrich.p = 0.05,
  figure.title = "",
  save.tiff.path = NA,
  tiff.res = 100,
  tiff.width = 1500,
  tiff.height = 1500
)

Error in is.na(t.arr) || is.na(t.unit) :   'length = 6' in coercion to 'logical(1)'