Git Product home page Git Product logo

Comments (13)

jackhump avatar jackhump commented on July 18, 2024

Hi,

Can you paste in exactly which commands you ran? If you ran it from the command line there should be a horrible looking error message printed there, could you send that over as well please?

The gene_name_df is used in the make_cluster_plot function, which I've been tinkering with heavily for the past few weeks so it's probably something stupid in there.

from leafcutter.

asaferali avatar asaferali commented on July 18, 2024

To run the shiny app I did:
PC002047:leafviz resaf$ Rscript run_leafviz.R /Users/resaf/workspace/leafcutter/leafcutter_results.Rdata

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

Attaching package: ‘DT’

The following objects are masked from ‘package:shiny’:

dataTableOutput, renderDataTable

Warning messages:
1: replacing previous import ‘dplyr::combine’ by ‘gridExtra::combine’ when loading ‘leafcutter’
2: replacing previous import ‘dplyr::filter’ by ‘stats::filter’ when loading ‘leafcutter’
3: replacing previous import ‘dplyr::lag’ by ‘stats::lag’ when loading ‘leafcutter’

Attaching package: ‘gridExtra’

The following object is masked from ‘package:dplyr’:

combine

[1] "Loading results from /Users/resaf/workspace/leafcutter/leafcutter_results.Rdata"

Listening on http://127.0.0.1:7826

Attaching package: ‘shinyjs’

The following object is masked from ‘package:intervals’:

show

The following object is masked from ‘package:Rcpp’:

show

The following object is masked from ‘package:shiny’:

runExample

The following objects are masked from ‘package:methods’:

removeClass, show

Warning: Error in make_cluster_plot: object 'gene_name_df' not found
Stack trace (innermost first):
103: make_cluster_plot
102: print
101: withCallingHandlers
100: suppressWarnings
99: renderPlot [/Users/resaf/workspace/leafcutter/leafviz/server.R#191]
89: reactive:plotObj
78: plotObj
77: origRenderFunc
76: output$select_cluster_plot
1: shiny::runApp
Warning: Error in : nrow(myclusters) > 0 is not TRUE
Stack trace (innermost first):
104: stopifnot
103: make_gene_plot
102: print
101: withCallingHandlers
100: suppressWarnings
99: renderPlot [/Users/resaf/workspace/leafcutter/leafviz/server.R#205]
89: reactive:plotObj
78: plotObj
77: origRenderFunc
76: output$select_gene_plot
1: shiny::runApp

from leafcutter.

asaferali avatar asaferali commented on July 18, 2024

To make the Rdata object I did:
Rscript prepare_results.R --iFolder /udd/resaf/data/leafcutter/21_08_17_2/
--outFolder /udd/resaf/data/leafcutter/21_08_17_2/
--support /udd/resaf/data/leafcutter/21_08_17_2/ds_file.txt
--annotation_code /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19
--code leafcutter
--FDR 0.1

from leafcutter.

jackhump avatar jackhump commented on July 18, 2024

And did the prepare_results.R step ran without any warnings or errors? Thank you for your patience, this is really useful.

from leafcutter.

jackhump avatar jackhump commented on July 18, 2024

One more thing, are your certain that the /Users/resaf/workspace/leafcutter/leafcutter_results.Rdata you used with run_leafvis.R is the same Rdata file that you generated with prepare_results.R? Did you move the file after creating it?

from leafcutter.

asaferali avatar asaferali commented on July 18, 2024

I'm certain that the Rdata file is the same, I moved it after creating it because I wanted to test if the shiny app creation would work on my laptop after it didn't work on our cluster.

Here is the output of prepare_results.R. There were no errors.

derby10:resaf 66% Rscript prepare_results.R --iFolder /udd/resaf/data/leafcutter/21_08_17_2/ \
?                          --outFolder /udd/resaf/data/leafcutter/21_08_17_2/ \
?                          --support /udd/resaf/data/leafcutter/21_08_17_2/ds_file.txt\
?                          --annotation_code  /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19 \
?                          --code leafcutter \
?                          --FDR 0.1    
Loading required package: leafcutter
Loading required package: Rcpp
Warning messages:
1: replacing previous import ‘dplyr::combine’ by ‘gridExtra::combine’ when loading ‘leafcutter’ 
2: replacing previous import ‘dplyr::filter’ by ‘stats::filter’ when loading ‘leafcutter’ 
3: replacing previous import ‘dplyr::lag’ by ‘stats::lag’ when loading ‘leafcutter’ 
> 
> opt <- parse_args(
+   OptionParser(
+     option_list=list(
+       make_option( c("-i", "--iFolder"), help = "The folder where the LeafCutter differential splicing was run"),
+       make_option( c("-o","--outFolder"), help = "The folder where the results object will be saved" ),
+       make_option( "--support", default=NULL, help="The support file used in the differential splicing analysis. Columns should be file name and condition"),
+       make_option( "--annotation_code", default=NULL, help = "a path to the annotation files of exons, introns and splice sites"),
+       make_option( c("-f","--FDR"), default=0.05, help = "the adjusted p value threshold to use"),
+       make_option( "--code", default=NULL, help = "the same dataset-specific code used throughout the pipeline"))
+ 	 )
+   )
> 
> outFolder <- opt$outFolder
> iFolder <- opt$iFolder
> species <- opt$species
> groups_file <- opt$support
> # counts_file <- opt$counts_file
> annotation_code <- opt$annotation_code
> code <- opt$code
> FDR_limit <- opt$FDR
> 
> 
> cat("Preparing for visualisation\n")
Preparing for visualisation
> 
> # results
> effect.sizes.file <- paste0(iFolder,"/",code,"_ds_effect_sizes.txt") 
> results.file <- paste0(iFolder, "/",code,"_ds_cluster_significance.txt")
> counts_file <- paste0(iFolder, "/",code, "_perind_numers.counts.gz")
> # annotation
> exon_file <- paste0(annotation_code, "_all_exons.txt.gz")
> all_introns <- paste0(annotation_code,"_all_introns.bed.gz" )
> threeprime_file <- paste0( annotation_code,"_threeprime.bed.gz")
> fiveprime_file <- paste0( annotation_code,"_fiveprime.bed.gz")
> 
> 
> # CHECK DEPENDENCY FILES
> pass <- TRUE
> errorMessage <- c()
> 
> # Dependent on
> # effects sizes and results file, not counts or groups
> 
> for( file in 
+      c(effect.sizes.file, results.file, counts_file, groups_file, 
+        all_introns, threeprime_file, fiveprime_file, exon_file
+      )){
+   if( !file.exists(file)){
+     pass <- FALSE
+     errorMessage <- c(errorMessage,  paste0(file, " does not exist\n") )
+   }
+ }
> if(!pass){
+   #print(errorMessage)
+   stop(errorMessage)
+ }
> cat(paste("results to be saved in:",outFolder, "\n") )
results to be saved in: /udd/resaf/data/leafcutter/21_08_17_2/ 
> cat(paste("using annotation found in:", annotation_code,"\n") )
using annotation found in: /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19 
> 
> 
> library(data.table)
> #library(leafcutter)
> library(stringr)
> library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:data.table’:

    between, first, last

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

> 
> 
> # CHECK BEDTOOLS IS INSTALLED
> if( !file.exists(system("which bedtools", intern=TRUE) )){
+   stop("bedtools is not in your $PATH - have you installed it?")
+ }
> 
> if(file.exists(counts_file)){
+   cat("Loading counts from",counts_file,"\n")
+   counts <- read.table(counts_file)
+ }
Loading counts from /udd/resaf/data/leafcutter/21_08_17_2//leafcutter_perind_numers.counts.gz 
> 
> if(file.exists(groups_file)){
+   cat("Loading metadata from",groups_file,"\n")
+   meta <- read.table(groups_file, header=F, stringsAsFactors = F)
+   colnames(meta)=c("sample","group")
+   
+   sample_table <- data.frame( group = names(table(meta$group) ), count = as.vector(table(meta$group)) )
+ }
Loading metadata from /udd/resaf/data/leafcutter/21_08_17_2/ds_file.txt 
> # exon table no longer used for anything - just saved with the Rdata object at the end
> 
> exons_table=if (!is.null( exon_file )) {
+   cat("Loading exons from",exon_file,"\n")
+   #read_table(exon_file)
+   as.data.frame(fread(paste("zless",exon_file)) )
+ } else {
+   cat("No exon_file provided.\n")
+   NULL
+ }
Loading exons from /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19_all_exons.txt.gz 
> 
> resultsFolder <- outFolder
> if( !dir.exists(resultsFolder)){
+   dir.create(resultsFolder)
+ }
> 
> 
> effectSizes <- fread(effect.sizes.file, stringsAsFactors = FALSE, header=TRUE   )
> effectSizesSplit <-  as.data.frame(str_split_fixed(effectSizes$intron, ":", 4), stringsAsFactors = FALSE )
> names(effectSizesSplit) <- c("chr","start","end","clusterID")
> 
> print(head(effectSizes))
                       intron       logef          0       1or2      deltapsi
1: 10:226068:243851:clu_20403  0.18969444 0.02006254 0.02648960  0.0064270604
2: 10:226068:255829:clu_20403 -0.09491357 0.02004795 0.01991388 -0.0001340759
3: 10:243945:255829:clu_20403 -0.09478087 0.95988950 0.95359652 -0.0062929844
4: 10:255988:267135:clu_20404 -0.03752149 0.09182582 0.08157108 -0.0102547384
5: 10:255988:282778:clu_20404 -0.06829662 0.07728741 0.06657554 -0.0107118705
6: 10:267296:282778:clu_20404  0.10581811 0.83088677 0.85185338  0.0209666089
> print(head(effectSizesSplit))
  chr  start    end clusterID
1  10 226068 243851 clu_20403
2  10 226068 255829 clu_20403
3  10 243945 255829 clu_20403
4  10 255988 267135 clu_20404
5  10 255988 282778 clu_20404
6  10 267296 282778 clu_20404
> 
> effectSizes <- cbind( effectSizes, effectSizesSplit)
> effectSizes$cluster <- paste("chr", effectSizesSplit$chr, ":", effectSizesSplit$clusterID, sep = "")
> effectSizes$chr<-paste("chr", effectSizes$chr, sep = "")
> 
> results <- fread(results.file, stringsAsFactors = F)
> 
> results$FDR <- p.adjust( results$p, method = "fdr")
> 
> all.introns <- merge(x = results, y = effectSizes, by = "cluster")
> all.introns <- all.introns[ order(all.introns$FDR),]
> 
> all.introns <- subset( all.introns, FDR <= FDR_limit )
> all.introns$start <- as.numeric(all.introns$start)
> all.introns$end <- as.numeric(all.introns$end)
> 
> 
> # for each splice site write out a bed file  
> all.junctions <- dplyr::select(all.introns, chr, start, end, clusterID)
> 
> all.fiveprime <- data.frame( chr = all.introns$chr,
+                              start = all.introns$start,
+                              end = as.numeric( as.character(all.introns$start) ) + 1,
+                              clusterID = all.introns$clusterID)
> all.threeprime <- data.frame( chr = all.introns$chr,
+                              start = all.introns$end,
+                              end = as.numeric( as.character(all.introns$end) ) + 1,
+                              clusterID = all.introns$clusterID)
> all.file <- paste0(resultsFolder, "/all_junctions.bed")
> all.fiveprime.file <- paste0(resultsFolder, "/all_fiveprime.bed")
> all.threeprime.file <- paste0(resultsFolder, "/all_threeprime.bed")
> 
> write.table( all.junctions, all.file, col.names = FALSE, row.names = FALSE, quote = FALSE, sep = "\t" )
> write.table( all.threeprime, all.threeprime.file, col.names = FALSE, row.names = FALSE, quote = FALSE, sep = "\t" )
> write.table( all.fiveprime, all.fiveprime.file, col.names = FALSE, row.names = FALSE, quote = FALSE, sep = "\t" )
> 
> print( "BedTools intersect junctions with list of known splice sites")
[1] "BedTools intersect junctions with list of known splice sites"
> # first match junctions 
> all.introns.cmd <- paste0("/app/[email protected]/bin/bedtools intersect -a ", all.file, " -b ", all_introns, " -wa -wb -loj -f 1" )
>   
> all.introns_intersect <- fread(all.introns.cmd)
***** WARNING: File /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19_all_introns.bed.gz has inconsistent naming convention for record:
GL000220.1	154617	154624	"CH507-513H4.5"	"ENSG00000281383.1_2"	-	"ENST00000629969.1_1"	1	"lincRNA"	"OTTHUMG00000189717.1_2"

***** WARNING: File /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19_all_introns.bed.gz has inconsistent naming convention for record:
GL000220.1	154617	154624	"CH507-513H4.5"	"ENSG00000281383.1_2"	-	"ENST00000629969.1_1"	1	"lincRNA"	"OTTHUMG00000189717.1_2"

> 
> # intersect with bedtools to find the annotations of each splice site
> threeprime.cmd <- paste0( "/app/[email protected]/bin/bedtools intersect -a ", all.threeprime.file, " -b ",threeprime_file, " -wa -wb -loj -f 1" )
> 
> threeprime_intersect <- fread(threeprime.cmd)
***** WARNING: File /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19_threeprime.bed.gz has inconsistent naming convention for record:
GL000220.1	154624	154625	"CH507-513H4.5"	"ENSG00000281383.1_2"	-	"ENST00000629969.1_1"	1	"lincRNA"	"OTTHUMG00000189717.1_2"

***** WARNING: File /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19_threeprime.bed.gz has inconsistent naming convention for record:
GL000220.1	154624	154625	"CH507-513H4.5"	"ENSG00000281383.1_2"	-	"ENST00000629969.1_1"	1	"lincRNA"	"OTTHUMG00000189717.1_2"

> 
> fiveprime.cmd <- paste0( "/app/[email protected]/bin/bedtools intersect -a ", all.fiveprime.file, " -b ", fiveprime_file, " -wa -wb -loj -f 1" )
> 
> fiveprime_intersect <- fread(fiveprime.cmd)
***** WARNING: File /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19_fiveprime.bed.gz has inconsistent naming convention for record:
GL000220.1	154617	154618	"CH507-513H4.5"	"ENSG00000281383.1_2"	-	"ENST00000629969.1_1"	1	"lincRNA"	"OTTHUMG00000189717.1_2"

***** WARNING: File /udd/resaf/data/leafcutter/leafcutter/leafviz/annotation_codes/gencode_hg19/gencode_hg19_fiveprime.bed.gz has inconsistent naming convention for record:
GL000220.1	154617	154618	"CH507-513H4.5"	"ENSG00000281383.1_2"	-	"ENST00000629969.1_1"	1	"lincRNA"	"OTTHUMG00000189717.1_2"

> 
> # remove temporary files
> rm.cmd <- paste("rm ", all.file, all.fiveprime.file, all.threeprime.file) 
> system(rm.cmd)
> 
> # now I have two lists of splice site annotation
> # for testing
> #cluster <- all.introns[ all.introns$clusterID == "clu_4879" , ]
> 
> print("Annotating junctions")
[1] "Annotating junctions"
> 
> verdict.list <- list()
> coord.list <- list()
> gene.list <- list()
> ensemblID.list <- list()
> transcripts.list <- list()
> constitutive.list <- list()
> classification.list <- list()
> 
> # for testing!
> clu <- "clu_59455"
> 
> clusters <- unique( all.introns$clusterID ) 
> for( clu in clusters ){
+   # for each intron in the cluster, check for coverage of both
+   # output a vector of string descriptions 
+   cluster <- all.introns[ all.introns$clusterID == clu , ]
+   
+   # for each intron in the cluster:
+   #   create vector of overlapping splice sites, indexed by the row of the intersect
+   # five prime splice sites
+   fprime <- apply( cluster, MAR = 1, FUN = function(x) {
+     chr <- which( names(cluster) == "chr" )
+     start <- which( names(cluster) == "start" )
+     fiveprime_intersect[   
+       fiveprime_intersect$V1 == x[chr] & 
+       fiveprime_intersect$V2 == as.numeric( x[start] ),]
+   } )
+   # three prime splice sites
+   tprime <- apply( cluster, MAR = 1, FUN = function(x) {
+     chr <- which( names(cluster) == "chr" )
+     end <- which( names(cluster) == "end" )
+     threeprime_intersect[   
+       threeprime_intersect$V1 == x[chr] & 
+       threeprime_intersect$V2 == as.numeric( x[end] ),]
+   } )
+ 
+   # both splice sites
+   bothSS <-  apply( cluster, MAR = 1, FUN = function(x) {
+       chr <- which( names(cluster) == "chr" )
+       start <- which(names(cluster) == "start")
+       end <- which( names(cluster) == "end" )
+       all.introns_intersect[   
+         all.introns_intersect$V1 == x[chr] &
+           all.introns_intersect$V2 == x[start] &
+           all.introns_intersect$V3 == as.numeric( x[end] ) &
+           all.introns_intersect$V4 == clu,]
+     } )
+ 
+   # find gene and ensemblID by the most represented gene among all the splice sites
+   cluster_genes <- names(sort(table(do.call( what = rbind, tprime )$V8), decreasing = TRUE ))
+ 
+   cluster_gene <- cluster_genes[ cluster_genes != "." ][1]
+   # if no cluster gene found then leave as "."
+   if( length(cluster_gene) == 0){
+     cluster_gene == "."
+   }
+   # do the same for EnsemblID
+   cluster_ensemblIDs <- names(sort(table(do.call( what = rbind, tprime )$V9), decreasing = TRUE ))
+   cluster_ensemblID <- cluster_ensemblIDs[ cluster_ensemblIDs != "." ][1]
+   if( length( cluster_ensemblID ) == 0 ){
+     cluster_ensemblID == "."
+   }
+ 
+   verdict <- c()
+   coord <- c()
+   gene <- c()
+   ensemblID <- c()
+   transcripts <- list() 
+   
+   for( intron in 1:nrow(cluster) ){
+     coord[intron] <- paste(cluster[intron]$chr,cluster[intron]$start, cluster[intron]$end )
+ 
+     gene[intron] <- cluster_gene
+     ensemblID[intron] <- cluster_ensemblID
+ 
+     # for each intron create vector of all transcripts that contain both splice sites
+     transcripts[[intron]] <- unique( intersect( tprime[[intron]]$V10, fprime[[intron]]$V10 ) ) 
+ 
+     verdict[intron] <- "error"
+     if( # if neither are annotated
+     all( tprime[[intron]]$V5 == ".") & all( fprime[[intron]]$V5 == "." )
+     ){ verdict[intron] <- "cryptic_unanchored"
+     }
+     if( # if only one is annotated
+     all( tprime[[intron]]$V5 == ".") & all( fprime[[intron]]$V5 != "." )
+     ){ verdict[intron] <- "cryptic_threeprime"
+     }
+     if(
+     all( tprime[[intron]]$V5 != ".") & all( fprime[[intron]]$V5 == "." )
+     ){ verdict[intron] <- "cryptic_fiveprime"
+     }
+     if( # if both splice sites are annotated
+       all( tprime[[intron]]$V5 != "." ) & all( fprime[[intron]]$V5 != "." )
+     ){ 
+       # test if the splice sites are paired in a known intron
+       if( all(bothSS[[intron]]$V5 != ".") ){
+         verdict[intron] <- "annotated"
+       }else{ # both are annotated but never in the same junction
+         verdict[intron] <- "novel annotated pair"
+       }
+     }
+     verdict.list[[clu]] <- verdict
+     coord.list[[clu]] <- coord
+     gene.list[[clu]] <- gene
+     ensemblID.list[[clu]] <- ensemblID
+     #transcripts.list[[clu]] <- transcripts
+ 
+     # once all the transcripts for all the introns are found, go back and work out how many constitutive each junction is. Does the junction appear in every transcript? 
+ 
+     if( intron == nrow(cluster)){ # only on final intron
+       all_transcripts <- unique( unlist( transcripts ) )
+       # remove "." - non-existent transcripts
+       all_transcripts <- all_transcripts[ all_transcripts != "." ]
+  
+       constitutive <- lapply( transcripts, FUN = function(x) {
+         # for each intron how many transcripts is it seen in?
+         x <- x[ x != "." ]
+         length(x) / length( all_transcripts)
+ 
+         })
+ 
+       constitutive.list[[clu]] <- constitutive
+ 
+       # collapse all.introns transcripts for each intron into a single string
+       transcripts.list[[clu]] <- lapply(transcripts, FUN = function(x) paste( x, collapse = "+" ) )
+ 
+     }
+ 
+   }
+ 
+   # predicting the event type from the shape of the junctions
+   #print(clu)
+ 
+   if( nrow(cluster) != 3){ 
+     classification.list[[clu]] <- "." 
+     next
+   }else{
+     classification.list[[clu]] <- "."
+ 
+     tab <- select(cluster, start, end)
+     
+     # the junctions are sorted by start and end coordinates
+ 
+     # check for the presence of a junction that spans the entire length of the cluster
+     if( !any(  which( tab$start == min(tab$start) ) %in% which( tab$end == max(tab$end) )  ) ){
+       classification.list[[clu]] <- "."
+       next
+     }
+ 
+     # therefore for a cassette exon arrangement the longest junction always comes second 
+     if( which( tab$start ==  min(tab$start) & tab$end == max(tab$end ) ) != 2 ){
+      classification.list[[clu]] <- "." 
+      next 
+     }
+ 
+     # now we know that junction 2 is the parent, junction 1 is the left most child and junction 3 is the right most
+     # check that the end of junction 1 comes before the start of junction 3
+ 
+     if( tab[1,"end"] > tab[3,"start"] ){
+       classification.list[[clu]] <- "."
+       next
+     }
+ 
+     # double check the starts and ends
+     if( tab[1, "start"] != tab[2,"start"] | tab[3,"end"] != tab[2,"end"] ){
+       classification.list[[clu]] <- "."
+       next
+     }
+ 
+     # work out direction of change
+     if( cluster[1, "deltapsi"] > 0 & cluster[3, "deltapsi"] > 0 & cluster[2,"deltapsi"] < 0){
+       classification.list[[clu]] <- "cassette exon - increased"
+     }
+     if( cluster[1, "deltapsi"] < 0 & cluster[3, "deltapsi"] < 0 & cluster[2,"deltapsi"] > 0){
+       classification.list[[clu]] <- "cassette exon - decreased"
+     }
+ 
+     # work out annotation status
+     if( all( verdict.list[[clu]] == "annotated") ){
+       classification.list[[clu]] <- paste0( classification.list[[clu]], " - annotated")
+     }
+ 
+     if( verdict.list[[clu]][2] == "annotated" & verdict.list[[clu]][1] != "annotated" & verdict.list[[clu]][3] != "annotated"  ){
+       classification.list[[clu]] <- paste0( classification.list[[clu]], " - cryptic")
+     }
+ 
+     if( verdict.list[[clu]][2] == "novel annotated pair" & verdict.list[[clu]][1] == "annotated" & verdict.list[[clu]][3] == "annotated"  ){
+       classification.list[[clu]] <- paste0( classification.list[[clu]], " - skiptic")
+     }
+ 
+   }
+   
+ }
> 
> print("Preparing results")
[1] "Preparing results"
> 
> # match the lists together
> all.introns$verdict <- unlist(verdict.list)[ match( paste( all.introns$chr, all.introns$start, all.introns$end ), unlist(coord.list)) ]
> 
> all.introns$gene <- unlist(gene.list)[ match( paste( all.introns$chr, all.introns$start, all.introns$end ), unlist(coord.list)) ]
> 
> all.introns$ensemblID <- unlist(ensemblID.list)[ match( paste( all.introns$chr, all.introns$start, all.introns$end ), unlist(coord.list)) ]
> 
> all.introns$transcripts <- unlist( transcripts.list )[ match( paste( all.introns$chr, all.introns$start, all.introns$end ), unlist(coord.list)) ]
> 
> all.introns$constitutive.score <-  unlist( constitutive.list )[ match( paste( all.introns$chr, all.introns$start, all.introns$end ), unlist(coord.list)) ]
> 
> #all.introns$prediction <-  unlist( classification.list )[ match( paste( all.introns$chr, all.introns$start, all.introns$end ), unlist(coord.list)) ]
> 
> # replace NA values with "."
> all.introns$gene[ is.na( all.introns$gene) ] <- "."
> all.introns$ensemblID[ is.na( all.introns$ensemblID) ] <- "."
> # replace missing transcripts with "."
> all.introns[ all.introns$transcripts == "", ]$transcripts <- "."
> all.introns$constitutive.score <- signif(all.introns$constitutive.score, digits = 2)
> 
> # prepare results
> results$clusterID <- str_split_fixed(results$cluster, ":", 2)[,2]
> results$N <- results$df + 1
> sig <- subset(results, FDR < FDR_limit)
> sig$clusterID <- str_split_fixed(sig$cluster, ":", 2)[,2]
> 
> all.clusters <- lapply(sig$clusterID, FUN = function(clu){
+   cluster <- all.introns[ all.introns$clusterID == clu, ]
+   chr <- unique( cluster$chr )[1] # this should always be one number
+   start <- min( cluster$start )
+   end <- max( cluster$end )
+   # get most common gene name that is not "."
+   gene <- names( sort( table( unique(cluster$gene) ), decreasing = TRUE ) )[1]
+   ensemblID <- names( sort( table( unique(cluster$ensemblID) ), decreasing = TRUE ) )[1]
+   annotation <- "annotated"
+   if( any(grepl( "cryptic", cluster$verdict)) | any( grepl("novel annotated pair", cluster$verdict)) ){
+     annotation <- "cryptic"
+   }  
+   return( 
+     data.frame( 
+       clusterID = clu, 
+       chr = chr, 
+       start = start, 
+       end = end, 
+       gene = gene, 
+       ensemblID = ensemblID, 
+       annotation = annotation ) )
+   })
> all.clusters <- do.call( what = rbind, args = all.clusters)
> 
> all.clusters$FDR  <- results$FDR[ match( all.clusters$clusterID, results$clusterID)]
> all.clusters$FDR <- signif( all.clusters$FDR, digits = 3)
> all.clusters$N  <- results$N[ match( all.clusters$clusterID, results$clusterID)]
> 
> # add classification 
> all.clusters$verdict <- unlist(classification.list)[ match(all.clusters$clusterID, names(classification.list))]
> 
> # write out 
> cluster_results <- paste0(resultsFolder,"/per_cluster_results.tab")
> intron_results <- paste0(resultsFolder, "/per_intron_results.tab")
> 
> write.table( all.clusters, cluster_results, quote = FALSE, row.names = FALSE, sep = "\t" )
> write.table( all.introns, intron_results, quote = FALSE, row.names = FALSE, sep = "\t" )
> 
> # prepare for PCA
> counts <- counts[,meta$sample]
> meta$group <- as.factor(meta$group)
> 
> 
> make_pca <- function(counts,meta){
+   dev <- apply( counts, MAR = 1, FUN = sd )
+   # remove rows with 0 variance
+   counts <- counts[ dev != 0, ]
+   pca <- prcomp( t(counts), scale = TRUE )
+   importance <- signif( summary(pca)$importance[2,], digits = 2) * 100
+   pca <- as.data.frame(pca$x)
+   #names(pca) <- paste0( names(pca), " (", signif( importance, digits =  2) * 100, "%)" )
+   pca$groups <- meta$group[ match( rownames(pca), meta$sample )]
+   return(list( pca, importance) )
+ }
> 
> # sort out clusters table
> 
> fix_clusters <- function(clusters){
+   clusters$FDR <- signif( clusters$FDR, digits = 3)
+   clusters$coord <- paste0( clusters$chr, ":", clusters$start, "-", clusters$end)
+   clusters <- clusters[ order(clusters$FDR, decreasing = FALSE),]
+   # removed ensemblID - this could be an option?
+   #clusters <- select( clusters, clusterID, N, coord, gene, annotation, FDR, verdict)
+   clusters <- select( 
+     clusters, 
+     clusterID, 
+     N, 
+     coord, 
+     gene, 
+     annotation, 
+     FDR)
+   clusters$gene <- paste0("<i>",clusters$gene,"</i>")
+ return(clusters)
+ }
> 
> # use on all.introns
> fix_introns <- function(introns){
+   introns <- select(introns, 
+     clusterID, 
+     gene, 
+     ensemblID, 
+     chr, 
+     start, 
+     end, 
+     verdict, 
+     deltapsi, 
+     #constitutive.score, 
+     transcripts)
+   #introns$constitutive.score <- signif(introns$constitutive.score, digits = 3)
+   introns$deltapsi<- round(introns$deltapsi, digits = 3)
+ return(introns)
+ }
> 
> 
> cluster_summary <- function(clusters){
+   summary <- data.frame( 
+               Results = c(
+                 paste0("Number of differentially spliced clusters at FDR = ", FDR_limit) , 
+                         "Fully annotated",
+                         "Contain unannotated junctions"),
+                 n = c( nrow(clusters),
+                        nrow( clusters[ clusters$annotation == "annotated", ]),
+                        nrow( clusters[ clusters$annotation == "cryptic", ]) 
+                        ) 
+                 )
+   return(summary)
+ }
> 
> intron_summary <- function(all.introns){
+     summary <- data.frame( 
+                 Results = c( "Number of fully annotated junctions",
+                              "Number of junctions with cryptic 5' splice site", 
+                               "Number of junctions with cryptic 3' splice site",  
+                               "Number of junctions with two cryptic splice sites",
+                               "Number of novel junctions that connect two annotated splice sites"),
+ 
+                   n = c( nrow(all.introns[ all.introns$verdict == "annotated",]),
+                          nrow(all.introns[ all.introns$verdict == "cryptic_fiveprime",]),
+                          nrow(all.introns[ all.introns$verdict == "cryptic_threeprime",]),
+                          nrow(all.introns[ all.introns$verdict == "cryptic_unanchored",]),
+                          nrow(all.introns[ all.introns$verdict == "novel annotated pair",])  
+                   )
+                 )
+     return( summary )
+ }
> 
> # create all the objects for visualisation
> 
> clusters <- fix_clusters(all.clusters)
> introns <- fix_introns(all.introns)
> intron_summary <- intron_summary(all.introns)
> cluster_summary <- cluster_summary(all.clusters) 
> introns_to_plot <- get_intron_meta(rownames(counts))
> cluster_ids <- introns_to_plot$clu 
> pca <- make_pca(counts,meta)
> 
> # save all the objects needed by Leafcutter viz into single Rdata file
> # include the mode variable 
> 
> save( introns, 
+       clusters, 
+       counts, 
+       meta, 
+       exons_table, 
+       pca, 
+       intron_summary, 
+       cluster_summary, 
+       introns_to_plot,
+       cluster_ids,
+       sample_table,
+       annotation_code,
+       code,
+       file = paste0( resultsFolder, "/",code,"_results.Rdata")
+ )
> 
> 
> quit()
derby10:resaf 67% 

from leafcutter.

asaferali avatar asaferali commented on July 18, 2024

I do get the same issue when I run run_leafviz.R on the cluster.

nantucket:resaf 48% Rscript run_leafviz.R /udd/resaf/data/leafcutter/21_08_17_2/leafcutter_results.Rdata  

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Attaching package: ‘DT’

The following objects are masked from ‘package:shiny’:

    dataTableOutput, renderDataTable

Warning messages:
1: replacing previous import ‘dplyr::combine’ by ‘gridExtra::combine’ when loading ‘leafcutter’ 
2: replacing previous import ‘dplyr::filter’ by ‘stats::filter’ when loading ‘leafcutter’ 
3: replacing previous import ‘dplyr::lag’ by ‘stats::lag’ when loading ‘leafcutter’ 

Attaching package: ‘gridExtra’

The following object is masked from ‘package:dplyr’:

    combine

[1] "Loading results from /udd/resaf/data/leafcutter/21_08_17_2/leafcutter_results.Rdata"

Listening on http://127.0.0.1:4152

Attaching package: ‘shinyjs’

The following object is masked from ‘package:intervals’:

    show

The following object is masked from ‘package:Rcpp’:

    show

The following object is masked from ‘package:shiny’:

    runExample

The following objects are masked from ‘package:methods’:

    removeClass, show

[1] "new row selected!"
[1] "VALUE: 1"
   clusterID    gene            ensemblID   chr    start      end   verdict
1: clu_28703 SULT1A1 ENSG00000196502.11_3 chr16 28619709 28619799 annotated
2: clu_28703 SULT1A1 ENSG00000196502.11_3 chr16 28619709 28620029 annotated
3: clu_28703 SULT1A1 ENSG00000196502.11_3 chr16 28619924 28620029 annotated
   deltapsi transcripts
1:    0.005           -
2:   -0.016           -
3:    0.010           -
  chr    start      end       clu   middle   verdict   dPSI rank
1  16 28619709 28619799 clu_28703 28619754 annotated  0.005    c
2  16 28619709 28620029 clu_28703 28619869 annotated -0.016    a
3  16 28619924 28620029 clu_28703 28619976 annotated  0.010    b
[1] chr       start     end       strand    gene_name
<0 rows> (or 0-length row.names)
Warning: Error in make_cluster_plot: object 'gene_name_df' not found
Stack trace (innermost first):
    103: make_cluster_plot
    102: print
    101: withCallingHandlers
    100: suppressWarnings
     99: renderPlot [/udd/resaf/data/leafcutter/leafcutter/leafviz/server.R#206]
     89: <reactive:plotObj>
     78: plotObj
     77: origRenderFunc
     76: output$select_cluster_plot
      1: shiny::runApp
[1] "gene is: SULT1A1"
Warning: Error in : nrow(clusters) > 0 is not TRUE
Stack trace (innermost first):
    105: stopifnot
    104: make_gene_plot
    103: selectGenePlotInput [/udd/resaf/data/leafcutter/leafcutter/leafviz/server.R#228]
    102: print
    101: withCallingHandlers
    100: suppressWarnings
     99: renderPlot [/udd/resaf/data/leafcutter/leafcutter/leafviz/server.R#251]
     89: <reactive:plotObj>
     78: plotObj
     77: origRenderFunc
     76: output$select_gene_plot
      1: shiny::runApp

from leafcutter.

jackhump avatar jackhump commented on July 18, 2024

Looking through the error message there it looks like you have differing chromosome names in two crucial tables. One table has "chr16" and another has "16".

Would you mind running this R script for me and sending over what gets printed out please? Do you also mind telling me when you ran the leafcutter differential splicing analysis?

library(leafcutter)
load("/Users/resaf/workspace/leafcutter/leafcutter_results.Rdata")

cluster_to_plot <- "clu_28703"

y <- t(counts[ cluster_ids==cluster_to_plot, ])
intron_meta <- get_intron_meta(colnames(y))

head(intron_meta)

head(exons_table)

from leafcutter.

jackhump avatar jackhump commented on July 18, 2024

ok, it's definitely a naming issue. I removed the "chr" from my chromosomes from the Leafcutter counts and I can replicate both your error messages exactly. Bug fix incoming.

from leafcutter.

asaferali avatar asaferali commented on July 18, 2024

Ok thanks!

from leafcutter.

jackhump avatar jackhump commented on July 18, 2024

try re-installing from GitHub and try again. Please let me know if it works now.

from leafcutter.

asaferali avatar asaferali commented on July 18, 2024

This is still not working. I'm not sure if this is causing the issue, but when I look at the effect sizes output file it looks like this (no 'chr' before intron name):

derby10:resaf 126% head leafcutter_ds_effect_sizes.txt
intron logef 0 1or2 deltapsi
10:226068:243851:clu_20403 0.189694441262024 0.0200625431623083 0.0264896035170921 0.00642706035478383
10:226068:255829:clu_20403 -0.0949135675988039 0.0200479538705905 0.0199138779275679 -0.000134075943022623
10:243945:255829:clu_20403 -0.0947808736632202 0.959889502967101 0.95359651855534 -0.00629298441176118
10:255988:267135:clu_20404 -0.0375214947634933 0.0918258221045608 0.0815710837012038 -0.010254738403357
10:255988:282778:clu_20404 -0.0682966159348038 0.0772874111756758 0.0665755406502442 -0.0107118705254316
10:267296:282778:clu_20404 0.105818110698297 0.830886766719763 0.851853375648552 0.0209666089287885
10:285465:285996:clu_20406 -0.13039841528462 0.969573724746609 0.960862609083137 -0.00871111566347194
10:285465:285999:clu_20406 0.13039841528462 0.0304262752533915 0.0391373909168633 0.00871111566347181
10:889008:890917:clu_20409 0.029054722634462 0.0673955698186409 0.07353126345163 0.00613569363298906

While the cluster significance file looks like this:
cluster status loglr df p
chr10:clu_20402 <=1 sample with coverage>min_coverage NA NA NA
chr10:clu_20403 Success -1.22606892216902 2 1
chr10:clu_20404 Success 4.03529305625125 2 0.017680497973431
chr10:clu_20405 <=1 sample with coverage>min_coverage NA NA NA
chr10:clu_20406 Success 0.706298349754775 1 0.234625804248477
chr10:clu_20407 <=1 sample with coverage>min_coverage NA NA NA
chr10:clu_20408 Not enough valid samples NA NA NA
chr10:clu_20409 Success -0.990728937957101 2 1
chr10:clu_20410 Success 1.63190522102923 3 0.352711401620416

This causes a problem when I run prepare_results.R as the two files cannot be merged. As a result I modified your script slightly to add 'chr' to the cluster names that come out of the effect sizes. Basically I added these 2 lines after line 119 in prepare_results.R. This allows 'all.introns' to be formed correctly.

effectSizes$cluster <- paste("chr", effectSizesSplit$chr, ":", effectSizesSplit$clusterID, sep = "")
effectSizes$chr<-paste("chr", effectSizes$chr, sep = "")

from leafcutter.

asaferali avatar asaferali commented on July 18, 2024

Actually never mind, this does work now! I hadn't reinstalled the R package. The plots look great! Thank you

from leafcutter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.