Comments (13)
Hi,
Can you paste in exactly which commands you ran? If you ran it from the command line there should be a horrible looking error message printed there, could you send that over as well please?
The gene_name_df is used in the make_cluster_plot function, which I've been tinkering with heavily for the past few weeks so it's probably something stupid in there.
from leafcutter.
To run the shiny app I did:
PC002047:leafviz resaf$ Rscript run_leafviz.R /Users/resaf/workspace/leafcutter/leafcutter_results.Rdata
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Attaching package: ‘DT’
The following objects are masked from ‘package:shiny’:
dataTableOutput, renderDataTable
Warning messages:
1: replacing previous import ‘dplyr::combine’ by ‘gridExtra::combine’ when loading ‘leafcutter’
2: replacing previous import ‘dplyr::filter’ by ‘stats::filter’ when loading ‘leafcutter’
3: replacing previous import ‘dplyr::lag’ by ‘stats::lag’ when loading ‘leafcutter’
Attaching package: ‘gridExtra’
The following object is masked from ‘package:dplyr’:
combine
[1] "Loading results from /Users/resaf/workspace/leafcutter/leafcutter_results.Rdata"
Listening on http://127.0.0.1:7826
Attaching package: ‘shinyjs’
The following object is masked from ‘package:intervals’:
show
The following object is masked from ‘package:Rcpp’:
show
The following object is masked from ‘package:shiny’:
runExample
The following objects are masked from ‘package:methods’:
removeClass, show
Warning: Error in make_cluster_plot: object 'gene_name_df' not found
Stack trace (innermost first):
103: make_cluster_plot
102: print
101: withCallingHandlers
100: suppressWarnings
99: renderPlot [/Users/resaf/workspace/leafcutter/leafviz/server.R#191]
89: reactive:plotObj
78: plotObj
77: origRenderFunc
76: output$select_cluster_plot
1: shiny::runApp
Warning: Error in : nrow(myclusters) > 0 is not TRUE
Stack trace (innermost first):
104: stopifnot
103: make_gene_plot
102: print
101: withCallingHandlers
100: suppressWarnings
99: renderPlot [/Users/resaf/workspace/leafcutter/leafviz/server.R#205]
89: reactive:plotObj
78: plotObj
77: origRenderFunc
76: output$select_gene_plot
1: shiny::runApp
from leafcutter.
To make the Rdata object I did:
Rscript prepare_results.R --iFolder /udd/resaf/data/leafcutter/21_08_17_2/
--outFolder /udd/resaf/data/leafcutter/21_08_17_2/
--support /udd/resaf/data/leafcutter/21_08_17_2/ds_file.txt
--annotation_code /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19
--code leafcutter
--FDR 0.1
from leafcutter.
And did the prepare_results.R step ran without any warnings or errors? Thank you for your patience, this is really useful.
from leafcutter.
One more thing, are your certain that the /Users/resaf/workspace/leafcutter/leafcutter_results.Rdata you used with run_leafvis.R is the same Rdata file that you generated with prepare_results.R? Did you move the file after creating it?
from leafcutter.
I'm certain that the Rdata file is the same, I moved it after creating it because I wanted to test if the shiny app creation would work on my laptop after it didn't work on our cluster.
Here is the output of prepare_results.R. There were no errors.
derby10:resaf 66% Rscript prepare_results.R --iFolder /udd/resaf/data/leafcutter/21_08_17_2/ \
? --outFolder /udd/resaf/data/leafcutter/21_08_17_2/ \
? --support /udd/resaf/data/leafcutter/21_08_17_2/ds_file.txt\
? --annotation_code /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19 \
? --code leafcutter \
? --FDR 0.1
Loading required package: leafcutter
Loading required package: Rcpp
Warning messages:
1: replacing previous import ‘dplyr::combine’ by ‘gridExtra::combine’ when loading ‘leafcutter’
2: replacing previous import ‘dplyr::filter’ by ‘stats::filter’ when loading ‘leafcutter’
3: replacing previous import ‘dplyr::lag’ by ‘stats::lag’ when loading ‘leafcutter’
>
> opt <- parse_args(
+ OptionParser(
+ option_list=list(
+ make_option( c("-i", "--iFolder"), help = "The folder where the LeafCutter differential splicing was run"),
+ make_option( c("-o","--outFolder"), help = "The folder where the results object will be saved" ),
+ make_option( "--support", default=NULL, help="The support file used in the differential splicing analysis. Columns should be file name and condition"),
+ make_option( "--annotation_code", default=NULL, help = "a path to the annotation files of exons, introns and splice sites"),
+ make_option( c("-f","--FDR"), default=0.05, help = "the adjusted p value threshold to use"),
+ make_option( "--code", default=NULL, help = "the same dataset-specific code used throughout the pipeline"))
+ )
+ )
>
> outFolder <- opt$outFolder
> iFolder <- opt$iFolder
> species <- opt$species
> groups_file <- opt$support
> # counts_file <- opt$counts_file
> annotation_code <- opt$annotation_code
> code <- opt$code
> FDR_limit <- opt$FDR
>
>
> cat("Preparing for visualisation\n")
Preparing for visualisation
>
> # results
> effect.sizes.file <- paste0(iFolder,"/",code,"_ds_effect_sizes.txt")
> results.file <- paste0(iFolder, "/",code,"_ds_cluster_significance.txt")
> counts_file <- paste0(iFolder, "/",code, "_perind_numers.counts.gz")
> # annotation
> exon_file <- paste0(annotation_code, "_all_exons.txt.gz")
> all_introns <- paste0(annotation_code,"_all_introns.bed.gz" )
> threeprime_file <- paste0( annotation_code,"_threeprime.bed.gz")
> fiveprime_file <- paste0( annotation_code,"_fiveprime.bed.gz")
>
>
> # CHECK DEPENDENCY FILES
> pass <- TRUE
> errorMessage <- c()
>
> # Dependent on
> # effects sizes and results file, not counts or groups
>
> for( file in
+ c(effect.sizes.file, results.file, counts_file, groups_file,
+ all_introns, threeprime_file, fiveprime_file, exon_file
+ )){
+ if( !file.exists(file)){
+ pass <- FALSE
+ errorMessage <- c(errorMessage, paste0(file, " does not exist\n") )
+ }
+ }
> if(!pass){
+ #print(errorMessage)
+ stop(errorMessage)
+ }
> cat(paste("results to be saved in:",outFolder, "\n") )
results to be saved in: /udd/resaf/data/leafcutter/21_08_17_2/
> cat(paste("using annotation found in:", annotation_code,"\n") )
using annotation found in: /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19
>
>
> library(data.table)
> #library(leafcutter)
> library(stringr)
> library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:data.table’:
between, first, last
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
>
>
> # CHECK BEDTOOLS IS INSTALLED
> if( !file.exists(system("which bedtools", intern=TRUE) )){
+ stop("bedtools is not in your $PATH - have you installed it?")
+ }
>
> if(file.exists(counts_file)){
+ cat("Loading counts from",counts_file,"\n")
+ counts <- read.table(counts_file)
+ }
Loading counts from /udd/resaf/data/leafcutter/21_08_17_2//leafcutter_perind_numers.counts.gz
>
> if(file.exists(groups_file)){
+ cat("Loading metadata from",groups_file,"\n")
+ meta <- read.table(groups_file, header=F, stringsAsFactors = F)
+ colnames(meta)=c("sample","group")
+
+ sample_table <- data.frame( group = names(table(meta$group) ), count = as.vector(table(meta$group)) )
+ }
Loading metadata from /udd/resaf/data/leafcutter/21_08_17_2/ds_file.txt
> # exon table no longer used for anything - just saved with the Rdata object at the end
>
> exons_table=if (!is.null( exon_file )) {
+ cat("Loading exons from",exon_file,"\n")
+ #read_table(exon_file)
+ as.data.frame(fread(paste("zless",exon_file)) )
+ } else {
+ cat("No exon_file provided.\n")
+ NULL
+ }
Loading exons from /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19_all_exons.txt.gz
>
> resultsFolder <- outFolder
> if( !dir.exists(resultsFolder)){
+ dir.create(resultsFolder)
+ }
>
>
> effectSizes <- fread(effect.sizes.file, stringsAsFactors = FALSE, header=TRUE )
> effectSizesSplit <- as.data.frame(str_split_fixed(effectSizes$intron, ":", 4), stringsAsFactors = FALSE )
> names(effectSizesSplit) <- c("chr","start","end","clusterID")
>
> print(head(effectSizes))
intron logef 0 1or2 deltapsi
1: 10:226068:243851:clu_20403 0.18969444 0.02006254 0.02648960 0.0064270604
2: 10:226068:255829:clu_20403 -0.09491357 0.02004795 0.01991388 -0.0001340759
3: 10:243945:255829:clu_20403 -0.09478087 0.95988950 0.95359652 -0.0062929844
4: 10:255988:267135:clu_20404 -0.03752149 0.09182582 0.08157108 -0.0102547384
5: 10:255988:282778:clu_20404 -0.06829662 0.07728741 0.06657554 -0.0107118705
6: 10:267296:282778:clu_20404 0.10581811 0.83088677 0.85185338 0.0209666089
> print(head(effectSizesSplit))
chr start end clusterID
1 10 226068 243851 clu_20403
2 10 226068 255829 clu_20403
3 10 243945 255829 clu_20403
4 10 255988 267135 clu_20404
5 10 255988 282778 clu_20404
6 10 267296 282778 clu_20404
>
> effectSizes <- cbind( effectSizes, effectSizesSplit)
> effectSizes$cluster <- paste("chr", effectSizesSplit$chr, ":", effectSizesSplit$clusterID, sep = "")
> effectSizes$chr<-paste("chr", effectSizes$chr, sep = "")
>
> results <- fread(results.file, stringsAsFactors = F)
>
> results$FDR <- p.adjust( results$p, method = "fdr")
>
> all.introns <- merge(x = results, y = effectSizes, by = "cluster")
> all.introns <- all.introns[ order(all.introns$FDR),]
>
> all.introns <- subset( all.introns, FDR <= FDR_limit )
> all.introns$start <- as.numeric(all.introns$start)
> all.introns$end <- as.numeric(all.introns$end)
>
>
> # for each splice site write out a bed file
> all.junctions <- dplyr::select(all.introns, chr, start, end, clusterID)
>
> all.fiveprime <- data.frame( chr = all.introns$chr,
+ start = all.introns$start,
+ end = as.numeric( as.character(all.introns$start) ) + 1,
+ clusterID = all.introns$clusterID)
> all.threeprime <- data.frame( chr = all.introns$chr,
+ start = all.introns$end,
+ end = as.numeric( as.character(all.introns$end) ) + 1,
+ clusterID = all.introns$clusterID)
> all.file <- paste0(resultsFolder, "/all_junctions.bed")
> all.fiveprime.file <- paste0(resultsFolder, "/all_fiveprime.bed")
> all.threeprime.file <- paste0(resultsFolder, "/all_threeprime.bed")
>
> write.table( all.junctions, all.file, col.names = FALSE, row.names = FALSE, quote = FALSE, sep = "\t" )
> write.table( all.threeprime, all.threeprime.file, col.names = FALSE, row.names = FALSE, quote = FALSE, sep = "\t" )
> write.table( all.fiveprime, all.fiveprime.file, col.names = FALSE, row.names = FALSE, quote = FALSE, sep = "\t" )
>
> print( "BedTools intersect junctions with list of known splice sites")
[1] "BedTools intersect junctions with list of known splice sites"
> # first match junctions
> all.introns.cmd <- paste0("/app/[email protected]/bin/bedtools intersect -a ", all.file, " -b ", all_introns, " -wa -wb -loj -f 1" )
>
> all.introns_intersect <- fread(all.introns.cmd)
***** WARNING: File /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19_all_introns.bed.gz has inconsistent naming convention for record:
GL000220.1 154617 154624 "CH507-513H4.5" "ENSG00000281383.1_2" - "ENST00000629969.1_1" 1 "lincRNA" "OTTHUMG00000189717.1_2"
***** WARNING: File /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19_all_introns.bed.gz has inconsistent naming convention for record:
GL000220.1 154617 154624 "CH507-513H4.5" "ENSG00000281383.1_2" - "ENST00000629969.1_1" 1 "lincRNA" "OTTHUMG00000189717.1_2"
>
> # intersect with bedtools to find the annotations of each splice site
> threeprime.cmd <- paste0( "/app/[email protected]/bin/bedtools intersect -a ", all.threeprime.file, " -b ",threeprime_file, " -wa -wb -loj -f 1" )
>
> threeprime_intersect <- fread(threeprime.cmd)
***** WARNING: File /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19_threeprime.bed.gz has inconsistent naming convention for record:
GL000220.1 154624 154625 "CH507-513H4.5" "ENSG00000281383.1_2" - "ENST00000629969.1_1" 1 "lincRNA" "OTTHUMG00000189717.1_2"
***** WARNING: File /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19_threeprime.bed.gz has inconsistent naming convention for record:
GL000220.1 154624 154625 "CH507-513H4.5" "ENSG00000281383.1_2" - "ENST00000629969.1_1" 1 "lincRNA" "OTTHUMG00000189717.1_2"
>
> fiveprime.cmd <- paste0( "/app/[email protected]/bin/bedtools intersect -a ", all.fiveprime.file, " -b ", fiveprime_file, " -wa -wb -loj -f 1" )
>
> fiveprime_intersect <- fread(fiveprime.cmd)
***** WARNING: File /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19_fiveprime.bed.gz has inconsistent naming convention for record:
GL000220.1 154617 154618 "CH507-513H4.5" "ENSG00000281383.1_2" - "ENST00000629969.1_1" 1 "lincRNA" "OTTHUMG00000189717.1_2"
***** WARNING: File /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19_fiveprime.bed.gz has inconsistent naming convention for record:
GL000220.1 154617 154618 "CH507-513H4.5" "ENSG00000281383.1_2" - "ENST00000629969.1_1" 1 "lincRNA" "OTTHUMG00000189717.1_2"
>
> # remove temporary files
> rm.cmd <- paste("rm ", all.file, all.fiveprime.file, all.threeprime.file)
> system(rm.cmd)
>
> # now I have two lists of splice site annotation
> # for testing
> #cluster <- all.introns[ all.introns$clusterID == "clu_4879" , ]
>
> print("Annotating junctions")
[1] "Annotating junctions"
>
> verdict.list <- list()
> coord.list <- list()
> gene.list <- list()
> ensemblID.list <- list()
> transcripts.list <- list()
> constitutive.list <- list()
> classification.list <- list()
>
> # for testing!
> clu <- "clu_59455"
>
> clusters <- unique( all.introns$clusterID )
> for( clu in clusters ){
+ # for each intron in the cluster, check for coverage of both
+ # output a vector of string descriptions
+ cluster <- all.introns[ all.introns$clusterID == clu , ]
+
+ # for each intron in the cluster:
+ # create vector of overlapping splice sites, indexed by the row of the intersect
+ # five prime splice sites
+ fprime <- apply( cluster, MAR = 1, FUN = function(x) {
+ chr <- which( names(cluster) == "chr" )
+ start <- which( names(cluster) == "start" )
+ fiveprime_intersect[
+ fiveprime_intersect$V1 == x[chr] &
+ fiveprime_intersect$V2 == as.numeric( x[start] ),]
+ } )
+ # three prime splice sites
+ tprime <- apply( cluster, MAR = 1, FUN = function(x) {
+ chr <- which( names(cluster) == "chr" )
+ end <- which( names(cluster) == "end" )
+ threeprime_intersect[
+ threeprime_intersect$V1 == x[chr] &
+ threeprime_intersect$V2 == as.numeric( x[end] ),]
+ } )
+
+ # both splice sites
+ bothSS <- apply( cluster, MAR = 1, FUN = function(x) {
+ chr <- which( names(cluster) == "chr" )
+ start <- which(names(cluster) == "start")
+ end <- which( names(cluster) == "end" )
+ all.introns_intersect[
+ all.introns_intersect$V1 == x[chr] &
+ all.introns_intersect$V2 == x[start] &
+ all.introns_intersect$V3 == as.numeric( x[end] ) &
+ all.introns_intersect$V4 == clu,]
+ } )
+
+ # find gene and ensemblID by the most represented gene among all the splice sites
+ cluster_genes <- names(sort(table(do.call( what = rbind, tprime )$V8), decreasing = TRUE ))
+
+ cluster_gene <- cluster_genes[ cluster_genes != "." ][1]
+ # if no cluster gene found then leave as "."
+ if( length(cluster_gene) == 0){
+ cluster_gene == "."
+ }
+ # do the same for EnsemblID
+ cluster_ensemblIDs <- names(sort(table(do.call( what = rbind, tprime )$V9), decreasing = TRUE ))
+ cluster_ensemblID <- cluster_ensemblIDs[ cluster_ensemblIDs != "." ][1]
+ if( length( cluster_ensemblID ) == 0 ){
+ cluster_ensemblID == "."
+ }
+
+ verdict <- c()
+ coord <- c()
+ gene <- c()
+ ensemblID <- c()
+ transcripts <- list()
+
+ for( intron in 1:nrow(cluster) ){
+ coord[intron] <- paste(cluster[intron]$chr,cluster[intron]$start, cluster[intron]$end )
+
+ gene[intron] <- cluster_gene
+ ensemblID[intron] <- cluster_ensemblID
+
+ # for each intron create vector of all transcripts that contain both splice sites
+ transcripts[[intron]] <- unique( intersect( tprime[[intron]]$V10, fprime[[intron]]$V10 ) )
+
+ verdict[intron] <- "error"
+ if( # if neither are annotated
+ all( tprime[[intron]]$V5 == ".") & all( fprime[[intron]]$V5 == "." )
+ ){ verdict[intron] <- "cryptic_unanchored"
+ }
+ if( # if only one is annotated
+ all( tprime[[intron]]$V5 == ".") & all( fprime[[intron]]$V5 != "." )
+ ){ verdict[intron] <- "cryptic_threeprime"
+ }
+ if(
+ all( tprime[[intron]]$V5 != ".") & all( fprime[[intron]]$V5 == "." )
+ ){ verdict[intron] <- "cryptic_fiveprime"
+ }
+ if( # if both splice sites are annotated
+ all( tprime[[intron]]$V5 != "." ) & all( fprime[[intron]]$V5 != "." )
+ ){
+ # test if the splice sites are paired in a known intron
+ if( all(bothSS[[intron]]$V5 != ".") ){
+ verdict[intron] <- "annotated"
+ }else{ # both are annotated but never in the same junction
+ verdict[intron] <- "novel annotated pair"
+ }
+ }
+ verdict.list[[clu]] <- verdict
+ coord.list[[clu]] <- coord
+ gene.list[[clu]] <- gene
+ ensemblID.list[[clu]] <- ensemblID
+ #transcripts.list[[clu]] <- transcripts
+
+ # once all the transcripts for all the introns are found, go back and work out how many constitutive each junction is. Does the junction appear in every transcript?
+
+ if( intron == nrow(cluster)){ # only on final intron
+ all_transcripts <- unique( unlist( transcripts ) )
+ # remove "." - non-existent transcripts
+ all_transcripts <- all_transcripts[ all_transcripts != "." ]
+
+ constitutive <- lapply( transcripts, FUN = function(x) {
+ # for each intron how many transcripts is it seen in?
+ x <- x[ x != "." ]
+ length(x) / length( all_transcripts)
+
+ })
+
+ constitutive.list[[clu]] <- constitutive
+
+ # collapse all.introns transcripts for each intron into a single string
+ transcripts.list[[clu]] <- lapply(transcripts, FUN = function(x) paste( x, collapse = "+" ) )
+
+ }
+
+ }
+
+ # predicting the event type from the shape of the junctions
+ #print(clu)
+
+ if( nrow(cluster) != 3){
+ classification.list[[clu]] <- "."
+ next
+ }else{
+ classification.list[[clu]] <- "."
+
+ tab <- select(cluster, start, end)
+
+ # the junctions are sorted by start and end coordinates
+
+ # check for the presence of a junction that spans the entire length of the cluster
+ if( !any( which( tab$start == min(tab$start) ) %in% which( tab$end == max(tab$end) ) ) ){
+ classification.list[[clu]] <- "."
+ next
+ }
+
+ # therefore for a cassette exon arrangement the longest junction always comes second
+ if( which( tab$start == min(tab$start) & tab$end == max(tab$end ) ) != 2 ){
+ classification.list[[clu]] <- "."
+ next
+ }
+
+ # now we know that junction 2 is the parent, junction 1 is the left most child and junction 3 is the right most
+ # check that the end of junction 1 comes before the start of junction 3
+
+ if( tab[1,"end"] > tab[3,"start"] ){
+ classification.list[[clu]] <- "."
+ next
+ }
+
+ # double check the starts and ends
+ if( tab[1, "start"] != tab[2,"start"] | tab[3,"end"] != tab[2,"end"] ){
+ classification.list[[clu]] <- "."
+ next
+ }
+
+ # work out direction of change
+ if( cluster[1, "deltapsi"] > 0 & cluster[3, "deltapsi"] > 0 & cluster[2,"deltapsi"] < 0){
+ classification.list[[clu]] <- "cassette exon - increased"
+ }
+ if( cluster[1, "deltapsi"] < 0 & cluster[3, "deltapsi"] < 0 & cluster[2,"deltapsi"] > 0){
+ classification.list[[clu]] <- "cassette exon - decreased"
+ }
+
+ # work out annotation status
+ if( all( verdict.list[[clu]] == "annotated") ){
+ classification.list[[clu]] <- paste0( classification.list[[clu]], " - annotated")
+ }
+
+ if( verdict.list[[clu]][2] == "annotated" & verdict.list[[clu]][1] != "annotated" & verdict.list[[clu]][3] != "annotated" ){
+ classification.list[[clu]] <- paste0( classification.list[[clu]], " - cryptic")
+ }
+
+ if( verdict.list[[clu]][2] == "novel annotated pair" & verdict.list[[clu]][1] == "annotated" & verdict.list[[clu]][3] == "annotated" ){
+ classification.list[[clu]] <- paste0( classification.list[[clu]], " - skiptic")
+ }
+
+ }
+
+ }
>
> print("Preparing results")
[1] "Preparing results"
>
> # match the lists together
> all.introns$verdict <- unlist(verdict.list)[ match( paste( all.introns$chr, all.introns$start, all.introns$end ), unlist(coord.list)) ]
>
> all.introns$gene <- unlist(gene.list)[ match( paste( all.introns$chr, all.introns$start, all.introns$end ), unlist(coord.list)) ]
>
> all.introns$ensemblID <- unlist(ensemblID.list)[ match( paste( all.introns$chr, all.introns$start, all.introns$end ), unlist(coord.list)) ]
>
> all.introns$transcripts <- unlist( transcripts.list )[ match( paste( all.introns$chr, all.introns$start, all.introns$end ), unlist(coord.list)) ]
>
> all.introns$constitutive.score <- unlist( constitutive.list )[ match( paste( all.introns$chr, all.introns$start, all.introns$end ), unlist(coord.list)) ]
>
> #all.introns$prediction <- unlist( classification.list )[ match( paste( all.introns$chr, all.introns$start, all.introns$end ), unlist(coord.list)) ]
>
> # replace NA values with "."
> all.introns$gene[ is.na( all.introns$gene) ] <- "."
> all.introns$ensemblID[ is.na( all.introns$ensemblID) ] <- "."
> # replace missing transcripts with "."
> all.introns[ all.introns$transcripts == "", ]$transcripts <- "."
> all.introns$constitutive.score <- signif(all.introns$constitutive.score, digits = 2)
>
> # prepare results
> results$clusterID <- str_split_fixed(results$cluster, ":", 2)[,2]
> results$N <- results$df + 1
> sig <- subset(results, FDR < FDR_limit)
> sig$clusterID <- str_split_fixed(sig$cluster, ":", 2)[,2]
>
> all.clusters <- lapply(sig$clusterID, FUN = function(clu){
+ cluster <- all.introns[ all.introns$clusterID == clu, ]
+ chr <- unique( cluster$chr )[1] # this should always be one number
+ start <- min( cluster$start )
+ end <- max( cluster$end )
+ # get most common gene name that is not "."
+ gene <- names( sort( table( unique(cluster$gene) ), decreasing = TRUE ) )[1]
+ ensemblID <- names( sort( table( unique(cluster$ensemblID) ), decreasing = TRUE ) )[1]
+ annotation <- "annotated"
+ if( any(grepl( "cryptic", cluster$verdict)) | any( grepl("novel annotated pair", cluster$verdict)) ){
+ annotation <- "cryptic"
+ }
+ return(
+ data.frame(
+ clusterID = clu,
+ chr = chr,
+ start = start,
+ end = end,
+ gene = gene,
+ ensemblID = ensemblID,
+ annotation = annotation ) )
+ })
> all.clusters <- do.call( what = rbind, args = all.clusters)
>
> all.clusters$FDR <- results$FDR[ match( all.clusters$clusterID, results$clusterID)]
> all.clusters$FDR <- signif( all.clusters$FDR, digits = 3)
> all.clusters$N <- results$N[ match( all.clusters$clusterID, results$clusterID)]
>
> # add classification
> all.clusters$verdict <- unlist(classification.list)[ match(all.clusters$clusterID, names(classification.list))]
>
> # write out
> cluster_results <- paste0(resultsFolder,"/per_cluster_results.tab")
> intron_results <- paste0(resultsFolder, "/per_intron_results.tab")
>
> write.table( all.clusters, cluster_results, quote = FALSE, row.names = FALSE, sep = "\t" )
> write.table( all.introns, intron_results, quote = FALSE, row.names = FALSE, sep = "\t" )
>
> # prepare for PCA
> counts <- counts[,meta$sample]
> meta$group <- as.factor(meta$group)
>
>
> make_pca <- function(counts,meta){
+ dev <- apply( counts, MAR = 1, FUN = sd )
+ # remove rows with 0 variance
+ counts <- counts[ dev != 0, ]
+ pca <- prcomp( t(counts), scale = TRUE )
+ importance <- signif( summary(pca)$importance[2,], digits = 2) * 100
+ pca <- as.data.frame(pca$x)
+ #names(pca) <- paste0( names(pca), " (", signif( importance, digits = 2) * 100, "%)" )
+ pca$groups <- meta$group[ match( rownames(pca), meta$sample )]
+ return(list( pca, importance) )
+ }
>
> # sort out clusters table
>
> fix_clusters <- function(clusters){
+ clusters$FDR <- signif( clusters$FDR, digits = 3)
+ clusters$coord <- paste0( clusters$chr, ":", clusters$start, "-", clusters$end)
+ clusters <- clusters[ order(clusters$FDR, decreasing = FALSE),]
+ # removed ensemblID - this could be an option?
+ #clusters <- select( clusters, clusterID, N, coord, gene, annotation, FDR, verdict)
+ clusters <- select(
+ clusters,
+ clusterID,
+ N,
+ coord,
+ gene,
+ annotation,
+ FDR)
+ clusters$gene <- paste0("<i>",clusters$gene,"</i>")
+ return(clusters)
+ }
>
> # use on all.introns
> fix_introns <- function(introns){
+ introns <- select(introns,
+ clusterID,
+ gene,
+ ensemblID,
+ chr,
+ start,
+ end,
+ verdict,
+ deltapsi,
+ #constitutive.score,
+ transcripts)
+ #introns$constitutive.score <- signif(introns$constitutive.score, digits = 3)
+ introns$deltapsi<- round(introns$deltapsi, digits = 3)
+ return(introns)
+ }
>
>
> cluster_summary <- function(clusters){
+ summary <- data.frame(
+ Results = c(
+ paste0("Number of differentially spliced clusters at FDR = ", FDR_limit) ,
+ "Fully annotated",
+ "Contain unannotated junctions"),
+ n = c( nrow(clusters),
+ nrow( clusters[ clusters$annotation == "annotated", ]),
+ nrow( clusters[ clusters$annotation == "cryptic", ])
+ )
+ )
+ return(summary)
+ }
>
> intron_summary <- function(all.introns){
+ summary <- data.frame(
+ Results = c( "Number of fully annotated junctions",
+ "Number of junctions with cryptic 5' splice site",
+ "Number of junctions with cryptic 3' splice site",
+ "Number of junctions with two cryptic splice sites",
+ "Number of novel junctions that connect two annotated splice sites"),
+
+ n = c( nrow(all.introns[ all.introns$verdict == "annotated",]),
+ nrow(all.introns[ all.introns$verdict == "cryptic_fiveprime",]),
+ nrow(all.introns[ all.introns$verdict == "cryptic_threeprime",]),
+ nrow(all.introns[ all.introns$verdict == "cryptic_unanchored",]),
+ nrow(all.introns[ all.introns$verdict == "novel annotated pair",])
+ )
+ )
+ return( summary )
+ }
>
> # create all the objects for visualisation
>
> clusters <- fix_clusters(all.clusters)
> introns <- fix_introns(all.introns)
> intron_summary <- intron_summary(all.introns)
> cluster_summary <- cluster_summary(all.clusters)
> introns_to_plot <- get_intron_meta(rownames(counts))
> cluster_ids <- introns_to_plot$clu
> pca <- make_pca(counts,meta)
>
> # save all the objects needed by Leafcutter viz into single Rdata file
> # include the mode variable
>
> save( introns,
+ clusters,
+ counts,
+ meta,
+ exons_table,
+ pca,
+ intron_summary,
+ cluster_summary,
+ introns_to_plot,
+ cluster_ids,
+ sample_table,
+ annotation_code,
+ code,
+ file = paste0( resultsFolder, "/",code,"_results.Rdata")
+ )
>
>
> quit()
derby10:resaf 67%
from leafcutter.
I do get the same issue when I run run_leafviz.R on the cluster.
nantucket:resaf 48% Rscript run_leafviz.R /udd/resaf/data/leafcutter/21_08_17_2/leafcutter_results.Rdata
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Attaching package: ‘DT’
The following objects are masked from ‘package:shiny’:
dataTableOutput, renderDataTable
Warning messages:
1: replacing previous import ‘dplyr::combine’ by ‘gridExtra::combine’ when loading ‘leafcutter’
2: replacing previous import ‘dplyr::filter’ by ‘stats::filter’ when loading ‘leafcutter’
3: replacing previous import ‘dplyr::lag’ by ‘stats::lag’ when loading ‘leafcutter’
Attaching package: ‘gridExtra’
The following object is masked from ‘package:dplyr’:
combine
[1] "Loading results from /udd/resaf/data/leafcutter/21_08_17_2/leafcutter_results.Rdata"
Listening on http://127.0.0.1:4152
Attaching package: ‘shinyjs’
The following object is masked from ‘package:intervals’:
show
The following object is masked from ‘package:Rcpp’:
show
The following object is masked from ‘package:shiny’:
runExample
The following objects are masked from ‘package:methods’:
removeClass, show
[1] "new row selected!"
[1] "VALUE: 1"
clusterID gene ensemblID chr start end verdict
1: clu_28703 SULT1A1 ENSG00000196502.11_3 chr16 28619709 28619799 annotated
2: clu_28703 SULT1A1 ENSG00000196502.11_3 chr16 28619709 28620029 annotated
3: clu_28703 SULT1A1 ENSG00000196502.11_3 chr16 28619924 28620029 annotated
deltapsi transcripts
1: 0.005 -
2: -0.016 -
3: 0.010 -
chr start end clu middle verdict dPSI rank
1 16 28619709 28619799 clu_28703 28619754 annotated 0.005 c
2 16 28619709 28620029 clu_28703 28619869 annotated -0.016 a
3 16 28619924 28620029 clu_28703 28619976 annotated 0.010 b
[1] chr start end strand gene_name
<0 rows> (or 0-length row.names)
Warning: Error in make_cluster_plot: object 'gene_name_df' not found
Stack trace (innermost first):
103: make_cluster_plot
102: print
101: withCallingHandlers
100: suppressWarnings
99: renderPlot [/udd/resaf/data/leafcutter/leafcutter/leafviz/server.R#206]
89: <reactive:plotObj>
78: plotObj
77: origRenderFunc
76: output$select_cluster_plot
1: shiny::runApp
[1] "gene is: SULT1A1"
Warning: Error in : nrow(clusters) > 0 is not TRUE
Stack trace (innermost first):
105: stopifnot
104: make_gene_plot
103: selectGenePlotInput [/udd/resaf/data/leafcutter/leafcutter/leafviz/server.R#228]
102: print
101: withCallingHandlers
100: suppressWarnings
99: renderPlot [/udd/resaf/data/leafcutter/leafcutter/leafviz/server.R#251]
89: <reactive:plotObj>
78: plotObj
77: origRenderFunc
76: output$select_gene_plot
1: shiny::runApp
from leafcutter.
Looking through the error message there it looks like you have differing chromosome names in two crucial tables. One table has "chr16" and another has "16".
Would you mind running this R script for me and sending over what gets printed out please? Do you also mind telling me when you ran the leafcutter differential splicing analysis?
library(leafcutter)
load("/Users/resaf/workspace/leafcutter/leafcutter_results.Rdata")
cluster_to_plot <- "clu_28703"
y <- t(counts[ cluster_ids==cluster_to_plot, ])
intron_meta <- get_intron_meta(colnames(y))
head(intron_meta)
head(exons_table)
from leafcutter.
ok, it's definitely a naming issue. I removed the "chr" from my chromosomes from the Leafcutter counts and I can replicate both your error messages exactly. Bug fix incoming.
from leafcutter.
Ok thanks!
from leafcutter.
try re-installing from GitHub and try again. Please let me know if it works now.
from leafcutter.
This is still not working. I'm not sure if this is causing the issue, but when I look at the effect sizes output file it looks like this (no 'chr' before intron name):
derby10:resaf 126% head leafcutter_ds_effect_sizes.txt
intron logef 0 1or2 deltapsi
10:226068:243851:clu_20403 0.189694441262024 0.0200625431623083 0.0264896035170921 0.00642706035478383
10:226068:255829:clu_20403 -0.0949135675988039 0.0200479538705905 0.0199138779275679 -0.000134075943022623
10:243945:255829:clu_20403 -0.0947808736632202 0.959889502967101 0.95359651855534 -0.00629298441176118
10:255988:267135:clu_20404 -0.0375214947634933 0.0918258221045608 0.0815710837012038 -0.010254738403357
10:255988:282778:clu_20404 -0.0682966159348038 0.0772874111756758 0.0665755406502442 -0.0107118705254316
10:267296:282778:clu_20404 0.105818110698297 0.830886766719763 0.851853375648552 0.0209666089287885
10:285465:285996:clu_20406 -0.13039841528462 0.969573724746609 0.960862609083137 -0.00871111566347194
10:285465:285999:clu_20406 0.13039841528462 0.0304262752533915 0.0391373909168633 0.00871111566347181
10:889008:890917:clu_20409 0.029054722634462 0.0673955698186409 0.07353126345163 0.00613569363298906
While the cluster significance file looks like this:
cluster status loglr df p
chr10:clu_20402 <=1 sample with coverage>min_coverage NA NA NA
chr10:clu_20403 Success -1.22606892216902 2 1
chr10:clu_20404 Success 4.03529305625125 2 0.017680497973431
chr10:clu_20405 <=1 sample with coverage>min_coverage NA NA NA
chr10:clu_20406 Success 0.706298349754775 1 0.234625804248477
chr10:clu_20407 <=1 sample with coverage>min_coverage NA NA NA
chr10:clu_20408 Not enough valid samples NA NA NA
chr10:clu_20409 Success -0.990728937957101 2 1
chr10:clu_20410 Success 1.63190522102923 3 0.352711401620416
This causes a problem when I run prepare_results.R as the two files cannot be merged. As a result I modified your script slightly to add 'chr' to the cluster names that come out of the effect sizes. Basically I added these 2 lines after line 119 in prepare_results.R. This allows 'all.introns' to be formed correctly.
effectSizes$cluster <- paste("chr", effectSizesSplit$chr, ":", effectSizesSplit$clusterID, sep = "")
effectSizes$chr<-paste("chr", effectSizes$chr, sep = "")
from leafcutter.
Actually never mind, this does work now! I hadn't reinstalled the R package. The plots look great! Thank you
from leafcutter.
Related Issues (20)
- Does Leafviz work on leafcutterMD result? HOT 3
- error in leafcutter_cluster_regtools.py
- Not reproducible results
- Redundant Splice Event Clustering HOT 1
- Confusion about Proper Input File Extension and Content
- Question about applying PCA on intron usage ratios
- example_data missing
- Bug found in hg38 annotation codes file for LeafViz HOT 1
- how to identify the alternative splicing events for differential cluster, such as ES/MXE/IR/Alt5’/Alt3’
- shinyapp leafcutter HOT 1
- enhancement HOT 2
- Behavior of -p --mincluratio: unexpectedly removing intron clusters HOT 1
- Potentially incorrect use of minclureads and minreads HOT 4
- confusion of normalization method for sQTL input date prepare HOT 1
- Alternative splicing signal on tandem duplicated genes HOT 1
- Creating co-splicing networks with leafcutter output?
- this is broken
- differential splicing with continuous independent variable
- Positive splicing events are not be detected HOT 2
- Start and end sites of junctions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from leafcutter.