c-mertes / fraser Goto Github PK
View Code? Open in Web Editor NEWFRASER - Find RAre Splicing Events in RNA-seq
License: MIT License
FRASER - Find RAre Splicing Events in RNA-seq
License: MIT License
Hello everyone,
I have been familiarizing myself with FRASER for a week now. Read the article and the vignette, great tool in general. Congrats!
The design I have been trying to use for your tool is to have a number of "control" samples with no apparent events detected and a "study" sample to be contrasted. For this initial test, this sample has a confirmed aberrant splicing event that truncates the transcript. I don't know if this design fits in the FRASER use cases. Does it?
The FraserDataSet
looks like this:
sampleID group bamFile pairedEnd
<fct> <dbl> <chr> <lgl>
1 control-1 1 control-1.Aligned.out.sorted.q11_q21.bam TRUE
2 control-2 2 control-2.Aligned.out.sorted.q11_q21.bam TRUE
3 control-3 3 control-3.Aligned.out.sorted.q11_q21.bam TRUE
4 control-4 4 control-4.Aligned.out.sorted.q11_q21.bam TRUE
5 control-5 5 control-5.Aligned.out.sorted.q11_q21.bam TRUE
6 control-6 6 control-6.Aligned.out.sorted.q11_q21.bam TRUE
7 control-7 7 control-7.Aligned.out.sorted.q11_q21.bam TRUE
8 control-8 8 control-8.Aligned.out.sorted.q11_q21.bam TRUE
9 control-9 9 control-9.Aligned.out.sorted.q11_q21.bam TRUE
10 control-10 10 control-10.Aligned.out.sorted.q11_q21.bam TRUE
11 study 11 study.Aligned.out.sorted.q11_q21.bam TRUE
----------------------- Settings -----------------------
Analysis name: Data Analysis
Analysis is strand specific: reverse
Working directory: './temp/study'
-------------------- BAM parameters --------------------
class: ScanBamParam
bamFlag (NA unless specified):
bamSimpleCigar: FALSE
bamReverseComplement: FALSE
bamTag:
bamTagFilter:
bamWhich: 0 ranges
bamWhat:
bamMapqFilter: 20
I would like to know how to tell Fraser how to group samples? I have seen in the vignette sample grouping is performed based on the group
column and the condition
column in the input sample table. However, even though I have been playing around with these two categories I haven't seen differences in the obtained results. Using both column names and using different grouping designs, for example all control samples as group 1 and study sample as group 2, etc.
I have been able to extract the aberrant event from the study sample by using the zScore > abs(2.5)
and psiValue > 0.6
columns. However, filtering by padjust
returns an empty table. That is actually my major concern. I haven't been able to obtain a value for the padjust
different from 1. This actually happens to all events regardless of the sample.
Is it related to the grouping? I need more samples in the analysis? Is there a parameter that could decrease the statistic stringency? What would you suggest?
Thanks in advance!
David
for fraser junction count analysis, in default setting: ScanBamParam(mapqFilter=0), but sometime, we can see several duplicated reads with mapQ=0 and reads with multiple locations with mapQ=0, it can also cause the false positive abrrent splicing events. what's your opinion for mapqfilter setting? if we set mapqFilter >0, whether it results other false negative events?
@c-mertes @ischeller It will be great if you guys can explain more about the metrics sigh3 and sigh5 and theta 5 and theta3 used for the calculations of FRASER. It will be really helpful.
Dear C. Mertes,
I would like to run FRASER on a dataset of 47 BAM files. However, when running the countRNAData function I keep getting out of memory errors. Is it possible to run fds <- countRNAData(createTestFraserSettings())
on each sample individually, and then merge the resulting fds objects? If not, do you know how I can best get around this issue? Thanks in advance!
Hello,
I installed the package on an Ubuntu 22 server.
When running
fds <- countRNAData(settings)
with my own data (ONT bam files), the first steps work ok, but then I get
Mon Aug 29 17:58:49 2022: In total 239257 splice sites (acceptor/donor) will be counted ...
Error in reducer$value.cache[[as.character(idx)]] <- values :
wrong args for environment subassignment
In addition: Warning message:
In parallel::mccollect(wait = FALSE, timeout = 1) :
1 parallel job did not deliver a result
What could be the issue?
Thanks
Ensure that all the badges work after moving it to Gagneurlab.
I have saved to a file a FRASER object that has been generated with version 1.2.1.
Then, I have tried to load it with the latest version of FRASER (1.99.2) and I got the following error:
"Error in validObject(object) :
invalid class "FraserDataSet" object: The nonSplicedReads do not have corresponding splitReads. This is probably the result of merging"
Indeed, the counts are from an external tool and I have formatted them in order to be compatible with FRASER requirements. But in version 1.2.1 I can get the outlier results without any problem.
I would like to know if the format of the counting matrices has changed in the latest release. I have checked the last manual and I have not found any new formatting specification.
The parallelization should be turned on by the following code:
if(.Platform$OS.type == "unix") { register(MulticoreParam(workers=min(10, multicoreWorkers())))
} else { register(SnowParam(workers=min(10, multicoreWorkers()))) }
It will be really helpful if you can provide the bam files for all the example data which includes sample4,sample5....sample9.
Hello,
I have two sets of data that come from different studies. I want to run the countRNAData on the first set of data first and save the output (maybe a rds?). When I get my second set of data, is it possible to run the countRNAData on my second set, load the output of the first set, then use the mergecount function to merge everything together?
Thank you very much!
Hi, thanks for making this great library.
I'm seeing an issue with the loadFraserDataSet
function.
When I run
setwd("/tmp/fraser/")
saveFraserDataSet(fds)
I can later do
fds <- loadFraserDataSet(file="/tmp/fraser/savedObjects/Data_Analysis/fds-object.RDS")
to restore the dataset.
But if I rename /tmp/fraser
to /tmp/fraser2
and then run
fds <- loadFraserDataSet(file="/tmp/fraser2/savedObjects/Data_Analysis/fds-object.RDS")
I get this error:
> fds <- loadFraserDataSet(file="/tmp/fraser2/savedObjects/Data_Analysis/fds-object.RDS")
Error in value[[3L]](cond) :
'assay(<FraserDataSet>, i="character", ...)' invalid subscript 'i'
failed to open file '/tmp/fraser/savedObjects/Data_Analysis/rawCountsJ.h5'
which tells me it's not updating the absolute path.
The same thing happens if I use the dir=
arg instead of file=
.
Hi.
I follow vignette with my bam file.
However, When I entered annotateRangesWithTxDb (fds, txdb=txdb, orgDb=orgDb), i got warning and error message :
403 genes were dropped because they have exons located on both strands of the same reference sequence or on more than one reference sequence, so cannot be represented by a single genomic range.
Use 'single.strand.genes.only=FALSE' to get all the genes in a GRangesList object
Error in mergeNamedAtomicVectors(seqlengths(x), seqlengths(y), what=c("seqeunce", : sequence chrM has incompatible seqlengths:
- in 'x' : 16569
- in 'y' : 16571
I don't know what I can do with the fds object and how I can solve the problem.
Thank you for your help.
When using comprehensive gene annotations, unlocalized and unplaced sequences result in an error when attempting to annotate potential impact of splice junctions with annotatePotentialImpact
.
txdb
are located on unlocalized chr*_random
and unplaced chrUn_*
sequences.min(distance(test_junction, refseq.genes), na.rm = TRUE)
to evaluate to Inf
.distance(test_junction, refseq.genes)
in turn evaluates to a vector of exclusively NA's.min(c(NA, NA, ...), na.rm=TRUE)
is equivalent to min(numeric(0))
, which evaluates to Inf
.Inf
is greater than 0, the conditional dist > 0
evaluates to TRUE
, whereas the code block under the if-statement is erroneously executed.This error can be resolved by modifying the condition of the if-statement from dist > 0
to is.finite(dist) && dist > 0
.
https://github.com/c-mertes/FRASER/blob/e775832aef77c14aa3c414832735bd1b0b9a4e47/R/resultAnnotations.R#L818
1. annotatePotentialImpact(result = res_junc_dt, txdb = txdb, fds = fds)
6. addPotentialImpactLabels(annoResult, fds, txdb)
8. sapply(noLaps, function(i) {
. overlap <- to(findOverlaps(junctions_gr[i], exons))
. if (length(overlap) == 0) {
. return(checkIntergenic(junctions_gr, i, refseq.genes))
. }
. for (j in overlap) {
. exon_start = start(exons[j])
. exon_end = end(exons[j])
. if (exon_start <= start(junctions_gr[i]) & exon_end >=
. end(junctions_gr[i])) {
. if ((end(junctions_gr[i]) - start(junctions_gr[i]) +
. 1)%%3 != 0) {
. frs = "likely"
. }
. else {
. frs = "unlikely"
. }
. return(c("exonTruncation", frs))
. }
. }
. return(c("complex", "inconclusive"))
. })
9. lapply(X = X, FUN = FUN, ...)
10. FUN(X[[i]], ...)
11. checkIntergenic(junctions_gr, i, refseq.genes)
12. start(refseq.genes[nearest(junctions_gr[i], refseq.genes)])
13. refseq.genes[nearest(junctions_gr[i], refseq.genes)]
14. refseq.genes[nearest(junctions_gr[i], refseq.genes)]
15. subset_along_ROWS(x, i, drop = drop)
16. extractROWS(x, i)
17. extractROWS(x, i)
18. normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE)
19. NSBS(i, x, exact = exact, strict.upper.bound = !allow.append,
. allow.NAs = allow.NAs)
20. NSBS(i, x, exact = exact, strict.upper.bound = !allow.append,
. allow.NAs = allow.NAs)
21. .subscript_error("subscript contains NAs")
22. stop(wmsg(...), call. = FALSE)
23. .handleSimpleError(function (cond)
. .Internal(C_tryCatchHelper(addr, 1L, cond)), "subscript contains NAs",
. base::quote(NULL))
24. h(simpleError(msg, call))
Hi, Mr Mertes and your team!
I've read your manual and relative literature and found your package really worth a try of best effort!
However, when running your example data as well as my own data, I met the same error from the very first step as follows:
#######Example data in your manual#######
settings<-FraserDataSet(colData=sampleTable,workingDir=tempdir())
settings
-------------------- Sample data table -----------------
sampleID bamFile group gene pairedEnd
1 sample1 extdata/bam/sample1.bam 1 TIMMDC1 TRUE
2 sample2 extdata/bam/sample2.bam 3 CLPP TRUE
3 sample3 extdata/bam/sample3.bam 2 MCOLN1 TRUE
----------------------- Settings -----------------------
Analysis name: Data Analysis
Analysis is strand specific: no
Working directory: 'C:\Users\ADMINI~1\AppData\Local\Temp\RtmpQ9cxvH'
-------------------- BAM parameters --------------------
Fri Dec 11 09:26:15 2020: The given working directory 'C:\Users\biocbuild\bbs-3.12-bioc\tmpdir\Rtmpw1fLb2/FRASER' does not exists. We will create it.
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'scanBamParam': invalid class “FraserDataSet” object: Make sure we can write to the given working directory ' C:\Users\biocbuild\bbs-3.12-bioc\tmpdir\Rtmpw1fLb2/FRASER '.
In addition: Warning message:
In dir.create(object@workingDir, recursive = TRUE) :
cannot create dir 'C:\Users\biocbuild', reason 'Permission denied'
When working on my own data, I set my own working directory at the first step, and other script is almost the same. Nonetheless, the same error still exists:
#######My own data#######
----------------------- Settings -----------------------
Analysis name: Data Analysis
Analysis is strand specific: no
Working directory: 'Y:/HCM-newly-sequenced/200820-WGX'
-------------------- BAM parameters --------------------
Fri Dec 11 09:51:35 2020: The given working directory 'C:\Users\biocbuild\bbs-3.12-bioc\tmpdir\Rtmpw1fLb2/FRASER' does not exists. We will create it.
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'scanBamParam': invalid class “FraserDataSet” object: Make sure we can write to the given working directory ' C:\Users\biocbuild\bbs-3.12-bioc\tmpdir\Rtmpw1fLb2/FRASER '.
In addition: Warning message:
In dir.create(object@workingDir, recursive = TRUE) :
cannot create dir 'C:\Users\biocbuild', reason 'Permission denied'
Is the working directory completely fixed in ' C:\Users\biocbuild\bbs-3.12-bioc\tmpdir\Rtmpw1fLb2/FRASER ' ? How can I alter it into my own path? Otherwise, must I run R as an administrator? Yet this might lead to unpredictable risk?
Looking forward to your reply!
Thanks a lot!
Steps to reproduce:
Expected:
fds dataset is successfully loaded
Actual:
The following error is thrown
Error: no slot of name "seed" for this object of class "HDF5ArraySeed"
Code issue:
https://github.com/c-mertes/FRASER/blob/faa0a207156e9d575dcbb61d774d16e6d24f9371/R/saveHDF5Objects.R#L114
From what I understand, the logic (just preceding the line above) checks if the HDF datasets are available (as per the fds object), otherwise they are loaded from the specified 'dir' parameter. This bug is surfaced in the R major version > 3 portion.
The slot( .... "seed") directive doesn't need another @seed accessor. That's what causing this scenario to fail
My crude fix:
else if (afile != path(assay(fds, aname, withDimnames = FALSE))) {
if (R.Version()$major == "3") {
path(assay(fds, aname, withDimnames = FALSE)) <- afile
}
else {
if ("seed" %in% slotNames(assay(fds, aname, withDimnames = FALSE))) {
if ("filepath" %in% slotNames(slot(assay(fds, aname, withDimnames = FALSE), "seed"))) {
slot(assay(fds, aname, withDimnames = FALSE), "seed")@filepath <- afile
}
}
if ("seed" %in% slotNames(assay(fds, aname, withDimnames = FALSE))) {
if ("seed" %in% slotNames(slot(assay(fds, aname, withDimnames = FALSE), "seed"))) {
if ("filepath" %in% slotNames(slot(slot(assay(fds, aname, withDimnames = FALSE), "seed"), "seed"))) {
slot(assay(fds, aname, withDimnames = FALSE), "seed")@seed@filepath <- afile
}
}
}
}
}
Sessioninfo:
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggpubr_0.4.0 ggplot2_3.3.2 FRASER_1.0.2 SummarizedExperiment_1.18.2
[5] DelayedArray_0.14.1 matrixStats_0.57.0 Biobase_2.48.0 Rsamtools_2.4.0
[9] Biostrings_2.56.0 XVector_0.28.0 GenomicRanges_1.40.0 GenomeInfoDb_1.24.2
[13] IRanges_2.22.2 S4Vectors_0.26.1 BiocGenerics_0.34.0 data.table_1.13.0
[17] BiocParallel_1.22.0
loaded via a namespace (and not attached):
[1] VGAM_1.1-3 colorspace_1.4-1 ggsignif_0.6.0 ellipsis_0.3.1 rio_0.5.16
[6] rstudioapi_0.11 ggrepel_0.8.2 bit64_4.0.5 AnnotationDbi_1.50.3 fansi_0.4.1
[11] splines_4.0.2 R.methodsS3_1.8.1 PRROC_1.3.1 knitr_1.30 jsonlite_1.7.1
[16] broom_0.7.1 dbplyr_1.4.4 R.oo_1.24.0 pheatmap_1.0.12 HDF5Array_1.16.1
[21] compiler_4.0.2 httr_1.4.2 backports_1.1.10 assertthat_0.2.1 Matrix_1.2-18
[26] lazyeval_0.2.2 cli_2.0.2 htmltools_0.5.0 prettyunits_1.1.1 tools_4.0.2
[31] gtable_0.3.0 glue_1.4.2 GenomeInfoDbData_1.2.3 dplyr_1.0.2 rappdirs_0.3.1
[36] Rcpp_1.0.5 carData_3.0-4 cellranger_1.1.0 vctrs_0.3.4 rtracklayer_1.48.0
[41] DelayedMatrixStats_1.10.1 xfun_0.18 stringr_1.4.0 openxlsx_4.2.2 lifecycle_0.2.0
[46] rstatix_0.6.0 XML_3.99-0.5 zlibbioc_1.34.0 scales_1.1.1 BSgenome_1.56.0
[51] pcaMethods_1.80.0 hms_0.5.3 rhdf5_2.32.4 RColorBrewer_1.1-2 BBmisc_1.11
[56] yaml_2.2.1 curl_4.3 memoise_1.1.0 biomaRt_2.44.1 stringi_1.5.3
[61] RSQLite_2.2.1 checkmate_2.0.0 GenomicFeatures_1.40.1 zip_2.1.1 Rsubread_2.2.6
[66] rlang_0.4.8 pkgconfig_2.0.3 bitops_1.0-6 evaluate_0.14 lattice_0.20-41
[71] purrr_0.3.4 Rhdf5lib_1.10.1 GenomicAlignments_1.24.0 htmlwidgets_1.5.2 cowplot_1.1.0
[76] bit_4.0.4 tidyselect_1.1.0 magrittr_1.5 R6_2.4.1 generics_0.0.2
[81] DBI_1.1.0 pillar_1.4.6 haven_2.3.1 foreign_0.8-80 withr_2.3.0
[86] abind_1.4-5 RCurl_1.98-1.2 tibble_3.0.3 crayon_1.3.4 car_3.0-10
[91] utf8_1.1.4 BiocFileCache_1.12.1 plotly_4.9.2.1 rmarkdown_2.4 progress_1.2.2
[96] grid_4.0.2 readxl_1.3.1 blob_1.2.1 forcats_0.5.0 digest_0.6.25
[101] tidyr_1.1.2 extraDistr_1.9.1 R.utils_2.10.1 openssl_1.4.3 munsell_0.5.0
[106] viridisLite_0.3.0 askpass_1.1
Hello, I am having some problems running your package due to (lack of) internet connectivity.
Our group works in a network which blocks internet connection for security reasons. We can download any necessary resources locally, but when it comes to execution we have no connection available.
The issue comes with the countSplitReads function, which ends up calling get_chrom_info_for_registered_UCSC_genome from genomeInfoDb. This function attempts to download chromInfo.txt.gz, which is not possible in our setting.
An issue was submitted to GenomeInfoDb describing a related problem (Bioconductor/GenomeInfoDb#26), and a solution was implemented to the package (quote reply). Would it be possible for countSplitReads to have a parameter passed down through function calls to use this "offline mode"? If so, we could first download chromosome information locally and access that cache in offline execution.
Thank you,
Álvaro.
Hello. I am new to bioinformatics and curious how to use FRASER to look for aberrant splicing events in a single sample (If this is possible)? I know the recommended sample size is 30. Lets say if I run 1 patient sample and 30 normal samples from GTEx together, will this sufficient enough to look for aberrant splicing events in my sample?
Any help will be much appreciated.
Hi everyone,
I am having an issue when trying to create my fds table. Below is my Rscript:
library(FRASER)
sampleTable <- fread(system.file(
"scripts_objects", "fraser_settings_object.tsv", package="FRASER", mustWork=TRUE))
bamFiles <- sampleTable[ ,bamFile]
sampleTable[,bamFile:=bamFiles]
settings <- FraserDataSet(colData=sampleTable,
workingDir=tempdir())
if(.Platform$OS.type == "unix") {
register(MulticoreParam(workers=multicoreWorkers()))
} else {
register(SnowParam(workers=multicoreWorkers()))
}
fds <- countRNAData(settings)
write.table(fds, "fds.txt")
I get the following error:
ERROR: Paired-end reads were detected in single-end read library : /HOME/BAM_files/STAR_1010Aligned.sortedByCoord.out.bam
No counts were generated.
Error: BiocParallel errors
element index: 1
first error:
In addition: Warning message:
stop worker failed:
attempt to select less than one element in OneIndex
Execution halted
I have run the code a couple times with different iterations of MulticoreParam() and everytime a different BAM file is flagged as being "paired-end reads detected in single-end read library". I checked my BAMs and they are definitely paired-end, so I am not sure why the BAMs are being recognized as single-ended. Is there additional syntax I have to add to specify the BAMs are paired-end? I couldn't find anything in the vignette.
That seems to be my main issue-- however if there are any suggestions on how to solve the "stop worker failed" issue, that would be appreciated as well! I believe it has to do with my memory not being comparable for all the BAM files I have, but I am not sure.
thank you very much,
Alison
Dear Team,
I am using Fraser on count table generated by featurecounts(split reads + non split reads) for splice sites. I have given unique ID to each splice site. Both tables shares ID( junction count endID == splicesiteCount, spliceSiteID). For junction count table "startID" is the just number of rows(1,2,3,4 and so on). Each table is sorted by chromosome and start coordinate. This is how, I created table for FRASER.
Fraserdata set creation was successful. But I am facing problem in PSI value calculation:
fds <- calculatePSIValues(fds)
After calculating psi3 and 5values, It is giving following error:
Calculating the PSI site values:
0% Error: subscript contains invalid names
Execution halted.
Can you please help me with this? Am I using wrong format of table?
Thank you.
As we do not currently use the group
or condition
column in colData
we need to remove it from the package to not confuse users as in #26.
When doing the count split read by using our own bam file.
I use the command as follow:
settings <- FraserDataSet(colData=anno, workingDir="../")
fds <- countRNAData(settings)
Tue Dec 1 06:50:58 2020: Start counting the split reads ...
Tue Dec 1 06:50:58 2020: Count split reads for sample: RNA-JXY-001A
Tue Dec 1 06:57:31 2020: Count split reads for sample: RNA-YCS-004A
Tue Dec 1 07:04:45 2020: Count split reads for sample: RNA-CSW-006A
Tue Dec 1 07:13:19 2020: Count split reads for sample: RNA-WCJ-007A
Tue Dec 1 07:18:48 2020: Count split reads for sample: RNA-SYL-009A
Tue Dec 1 07:25:06 2020 : count ranges need to be merged ...
Tue Dec 1 07:25:06 2020: Create splice site indices ...
Tue Dec 1 07:25:06 2020: Writing split counts to folder: ..//savedObjects/Data_Analysis/splitCounts
Tue Dec 1 07:25:06 2020: Identifying introns with read count <= 20 in all samples...
Tue Dec 1 07:25:06 2020: Start counting the non spliced reads ...
Tue Dec 1 07:25:06 2020: In total 0 splice junctions are found.
Error in [[<-
(*tmp*
, name, value = "Donor") :
1 elements in value to replace 0 elements
It shows error like this.
I've tried repeatedly to build the vignette for this package with little success. These are the steps I took along with the output. Any idea why this might be happening?
Alternatively, would it be possible to also include an already built PDF copy of the vignette. The bits I am particularly interested in are the syntax nuances in using FRASER to get split and non-split counts
I'm doing this on Mac OS Sierra
Steps I took:
> devtools::build(pkg='.', vignettes=TRUE)
✔ checking for file ‘/Users/himanshujoshi/PycharmProjects/FRASER/DESCRIPTION’ ...
─ preparing ‘FRASER’:
✔ checking DESCRIPTION meta-information ...
─ cleaning src
─ installing the package to build vignettes
E creating vignettes (2m 19.9s)
--- re-building ‘FraseR.Rnw’ using knitr
Fri Jan 17 15:27:01 2020: Calculate the PSI 5 and 3 values ...
Fri Jan 17 15:27:03 2020: Calculate the PSI site values ...
Fri Jan 17 15:27:04 2020: Calculate the delta for psi5 values ...
Fri Jan 17 15:27:04 2020: Calculate the delta for psi3 values ...
Fri Jan 17 15:27:04 2020: Calculate the delta for psiSite values ...
Fri Jan 17 15:27:05 2020: Writing final FraseR object ('/var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/savedObjects/Example_Dataset/fds-object.RDS').
Fri Jan 17 15:27:05 2020: Fit step for: 'psi5'.
Fri Jan 17 15:27:06 2020: Running fit with correction method: PCA
Fri Jan 17 15:27:06 2020: Computing PCA ...
Fri Jan 17 15:27:06 2020: Fitting rho ...
Fri Jan 17 15:27:07 2020: Compute p values for: 'psi5'.
Fri Jan 17 15:27:08 2020: Adjust p values for: 'psi5'.
Fri Jan 17 15:27:08 2020: Compute Z scores for: 'psi5'.
Fri Jan 17 15:27:08 2020: Fit step for: 'psi3'.
Fri Jan 17 15:27:09 2020: Running fit with correction method: PCA
Fri Jan 17 15:27:09 2020: Computing PCA ...
Fri Jan 17 15:27:09 2020: Fitting rho ...
Fri Jan 17 15:27:10 2020: Compute p values for: 'psi3'.
Fri Jan 17 15:27:11 2020: Adjust p values for: 'psi3'.
Fri Jan 17 15:27:11 2020: Compute Z scores for: 'psi3'.
Fri Jan 17 15:27:12 2020: Fit step for: 'psiSite'.
Fri Jan 17 15:27:12 2020: Running fit with correction method: PCA
Fri Jan 17 15:27:12 2020: Computing PCA ...
Fri Jan 17 15:27:12 2020: Fitting rho ...
Fri Jan 17 15:27:13 2020: Compute p values for: 'psiSite'.
Fri Jan 17 15:27:14 2020: Adjust p values for: 'psiSite'.
Fri Jan 17 15:27:15 2020: Compute Z scores for: 'psiSite'.
Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: org.Hs.eg.db
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:1 mapping between keys and columns
Fri Jan 17 15:27:18 2020: Collecting results for: psi5
Fri Jan 17 15:27:18 2020: Process chunk: 1 for: psi5
Fri Jan 17 15:27:18 2020: Collecting results for: psi3
Fri Jan 17 15:27:18 2020: Process chunk: 1 for: psi3
Fri Jan 17 15:27:18 2020: Collecting results for: psiSite
Fri Jan 17 15:27:18 2020: Process chunk: 1 for: psiSite
Warning in has_utility("convert", "ImageMagick") :
ImageMagick not installed or not in PATH
Fri Jan 17 15:27:22 2020: Start counting the split reads ...
Fri Jan 17 15:27:22 2020: Count split reads for sample: sample1
Warning in dir.create(cachedir, recursive = TRUE) :
'/var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/cache/splitCounts' already exists
Fri Jan 17 15:27:22 2020: Count split reads for sample: sample2
Fri Jan 17 15:27:22 2020: Count split reads for sample: sample3
Warning in dir.create(cachedir, recursive = TRUE) :
'/var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/cache/splitCounts' already exists
Fri Jan 17 15:27:33 2020 : count ranges need to be merged ...
Fri Jan 17 15:27:35 2020: Create splice site indices ...
Fri Jan 17 15:27:35 2020: Writing counts to file: /var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/savedObjects/Data_Analysis/splitCounts.tsv.gz
Fri Jan 17 15:27:35 2020: Create splice site indices ...
Fri Jan 17 15:27:35 2020: Start counting the non spliced reads ...
Fri Jan 17 15:27:35 2020: In total 60 splice junctions are found.
Fri Jan 17 15:27:35 2020: In total 77 splice sites (acceptor/donor) will be counted ...
Fri Jan 17 15:27:35 2020: Count non spliced reads for sample: sample1
Warning in dir.create(cachedir, recursive = TRUE) :
'/var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/cache/nonSplicedCounts/Data_Analysis' already exists
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 2.0.0
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| o sample1.bam ||
|| ||
|| Annotation : R data.frame ||
|| Dir for temp files : /var/folders/gw/rccgk_r53w316ntmdmtv1hrw0001 ... ||
|| Threads : 3 ||
|| Level : meta-feature level ||
|| Paired-end : no ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : counted ||
|| Min overlapping bases : 10 ||
|| Long read mode : yes ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid92802 ... ||
|| Features : 77 ||
|| Meta-features : 77 ||
|| Chromosomes/contigs : 2 ||
|| ||
|| Process BAM file sample1.bam... ||
|| WARNING: Paired-end reads were found. ||
|| Total alignments : 802 ||
|| Successfully assigned alignments : 188 (23.4%) ||
|| Running time : 0.01 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
\\============================================================================//
Saving splice site cache ...
Fri Jan 17 15:27:35 2020: Count non spliced reads for sample: sample3
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 2.0.0
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| o sample3.bam ||
|| ||
|| Annotation : R data.frame ||
|| Dir for temp files : /var/folders/gw/rccgk_r53w316ntmdmtv1hrw0001 ... ||
|| Threads : 3 ||
|| Level : meta-feature level ||
|| Paired-end : no ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : counted ||
|| Min overlapping bases : 10 ||
|| Long read mode : yes ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid92804 ... ||
|| Features : 77 ||
|| Meta-features : 77 ||
|| Chromosomes/contigs : 2 ||
|| ||
|| Process BAM file sample3.bam... ||
|| WARNING: Paired-end reads were found. ||
|| Total alignments : 3498 ||
|| Successfully assigned alignments : 866 (24.8%) ||
|| Running time : 0.04 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
\\============================================================================//
Saving splice site cache ...
Fri Jan 17 15:27:35 2020: Count non spliced reads for sample: sample2
Warning in dir.create(cachedir, recursive = TRUE) :
'/var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/cache/nonSplicedCounts/Data_Analysis' already exists
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 2.0.0
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| o sample2.bam ||
|| ||
|| Annotation : R data.frame ||
|| Dir for temp files : /var/folders/gw/rccgk_r53w316ntmdmtv1hrw0001 ... ||
|| Threads : 3 ||
|| Level : meta-feature level ||
|| Paired-end : no ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : counted ||
|| Min overlapping bases : 10 ||
|| Long read mode : yes ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid92803 ... ||
|| Features : 77 ||
|| Meta-features : 77 ||
|| Chromosomes/contigs : 2 ||
|| ||
|| Process BAM file sample2.bam... ||
|| WARNING: Paired-end reads were found. ||
|| Total alignments : 4397 ||
|| Successfully assigned alignments : 1011 (23.0%) ||
|| Running time : 0.05 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
\\============================================================================//
Saving splice site cache ...
Fri Jan 17 15:27:38 2020 : Fast merging of counts ...
Fri Jan 17 15:27:39 2020: Writing counts to file: /var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/savedObjects/Data_Analysis/nonSplitCounts.tsv.gz
Fri Jan 17 15:27:41 2020: Calculate the PSI 5 and 3 values ...
Fri Jan 17 15:27:42 2020: Calculate the PSI site values ...
Fri Jan 17 15:27:42 2020: Calculate the delta for psi5 values ...
Fri Jan 17 15:27:43 2020: Calculate the delta for psi3 values ...
Fri Jan 17 15:27:43 2020: Calculate the delta for psiSite values ...
Fri Jan 17 15:27:44 2020: Writing final FraseR object ('/var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/sav Fri Jan 17 15:27:44 2020: Writing final FraseR object ('/var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/savedObjects/Data_Analysis/fds-object.RDS').
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 165 rows containing missing values (geom_bar).
Fri Jan 17 15:27:46 2020: Fit step for: 'psi5'.
Fri Jan 17 15:27:46 2020: Running fit with correction method: PCA
Fri Jan 17 15:27:46 2020: Computing PCA ...
Fri Jan 17 15:27:46 2020: Fitting rho ...
Fri Jan 17 15:27:47 2020: Compute p values for: 'psi5'.
Fri Jan 17 15:27:48 2020: Adjust p values for: 'psi5'.
Fri Jan 17 15:27:48 2020: Compute Z scores for: 'psi5'.
Fri Jan 17 15:27:48 2020: Fit step for: 'psi3'.
Fri Jan 17 15:27:49 2020: Running fit with correction method: PCA
Fri Jan 17 15:27:49 2020: Computing PCA ...
Fri Jan 17 15:27:49 2020: Fitting rho ...
Fri Jan 17 15:27:50 2020: Compute p values for: 'psi3'.
Fri Jan 17 15:27:51 2020: Adjust p values for: 'psi3'.
Fri Jan 17 15:27:51 2020: Compute Z scores for: 'psi3'.
Fri Jan 17 15:27:52 2020: Fit step for: 'psiSite'.
Fri Jan 17 15:27:52 2020: Running fit with correction method: PCA
Fri Jan 17 15:27:52 2020: Computing PCA ...
Fri Jan 17 15:27:52 2020: Fitting rho ...
Fri Jan 17 15:27:53 2020: Compute p values for: 'psiSite'.
Fri Jan 17 15:27:54 2020: Adjust p values for: 'psiSite'.
Fri Jan 17 15:27:55 2020: Compute Z scores for: 'psiSite'.
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:1 mapping between keys and columns
Fri Jan 17 15:27:58 2020: Collecting results for: psi5
Fri Jan 17 15:27:58 2020: Process chunk: 1 for: psi5
Fri Jan 17 15:27:58 2020: Collecting results for: psiSite
Fri Jan 17 15:27:58 2020: Process chunk: 1 for: psiSite
Fri Jan 17 15:27:58 2020: Collecting results for: psi3
Fri Jan 17 15:27:58 2020: Process chunk: 1 for: psi3
Fri Jan 17 15:27:59 2020: Collecting results for: psi3
Fri Jan 17 15:28:00 2020: Process chunk: 1 for: psi3
Fri Jan 17 15:27:59 2020: Collecting results for: psiSite
Fri Jan 17 15:28:00 2020: Process chunk: 1 for: psiSite
Fri Jan 17 15:27:59 2020: Collecting results for: psi5
Fri Jan 17 15:28:00 2020: Process chunk: 1 for: psi5
Fri Jan 17 15:28:03 2020: Running fit with correction method: PCA-BB-Decoder
125
67
dPsi filter:FALSE: 121 TRUE: 2
Exclusion matrix: TRUE: 123
Fri Jan 17 15:28:12 2020: Injecting outliers: 0 / 0 (primary/secondary)
Warning in optimHyperParams(fds, type = "psi5") :
No outliers could be injected so the hyperparameter optimization could not run. Possible reason: too few junctions in the data.
Warning in plotEncDimSearch(fds, type = "psi5") :
no hyperparameters were estimated for psi5
Please use `optimHyperParams` to compute them.
Warning in switch(x, psi5 = c(bquote(psi[5])), psi3 = c(bquote(psi[3])), :
EXPR is a "factor", treated as integer.
Consider using 'switch(as.character( * ), ...)' instead.
Warning in switch(x, psi5 = c(bquote(psi[5])), psi3 = c(bquote(psi[3])), :
EXPR is a "factor", treated as integer.
Consider using 'switch(as.character( * ), ...)' instead.
Warning in switch(x, psi5 = c(bquote(psi[5])), psi3 = c(bquote(psi[3])), :
EXPR is a "factor", treated as integer.
Consider using 'switch(as.character( * ), ...)' instead.
Error: processing vignette 'FraseR.Rnw' failed with diagnostics:
Running 'texi2dvi' on 'FraseR.tex' failed.
--- failed re-building ‘FraseR.Rnw’
SUMMARY: processing the following file failed:
‘FraseR.Rnw’
Error: Vignette re-building failed.
Execution halted
Error in (function (command = NULL, args = character(), error_on_status = TRUE, :
System command error
Hello,
I would like to see the raw data that was used to generate p-values and z-scores in the results() table. For example, fds_filtered@assays@data[["psi5"]], and see the values by sample.
However, I don't seem to be able to get back to it using the concatenation of seqnames : ranges. Are these coordinates changed somehow when running res() ?
Or is it again a problem of species?
Thanks in advance!
Dear C. Mertes,
I'm running FRASER on 61 PBMC samples for the detection of aberrant splicing events in these samples. Counting of the spliced and none spliced reads and parameter optimisation worked perfectly. However, during the model fitting step I run into the following error:
Wed Feb 1 12:08:32 2023: Fit step for: 'psi5'.
Wed Feb 1 12:08:34 2023: Running fit with correction method: PCA
Wed Feb 1 12:08:36 2023: Computing PCA ...
Wed Feb 1 12:10:39 2023: Fitting rho ...
Thu Feb 2 05:31:11 2023: rho fit:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000000 0.0000001 0.0000001 0.0047353 0.0000001 0.5999270
Thu Feb 2 05:36:43 2023: Compute p values for: 'psi5'.
Thu Feb 2 22:18:56 2023: Adjust p values for: 'psi5'.
Thu Feb 2 22:19:42 2023: Compute Z scores for: 'psi5'.
Error in check_returned_array(ans, expected_dim, "extract_array", class(x)) :
The "extract_array" method for HDF5ArraySeed objects returned an array
with incorrect dimensions. Please contact the author of the
HDF5ArraySeed class (defined in the HDF5Array package) about this and
point him/her to the man page for extract_array() in the DelayedArray
package (?extract_array).
Calls: FRASER ... extract_array -> extract_array -> check_returned_array
Execution halted
As the error occurs in the middle of the fitting step, I believe that it is not directly related to my input files and thus I was wondering whether you had an idea what might be causing the error?
This is the script I use for the model fit:
############ NVQ_Run093-NVQ_Run642 Untreated Samples FRASER Analysis: Fit Model and Call Outliers ############
#### Load Required Libraries ####
library(FRASER)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)
#### Define Global Parameters ####
WORK_DIR <- '/data/gent/shared/001/gvo00101/RNA-Seq_in_Diagnostics/Aberrant_Splicing/FRASER/work/'
OUT_DIR <- '/data/gent/shared/001/gvo00101/RNA-Seq_in_Diagnostics/Aberrant_Splicing/FRASER/output/'
register(MulticoreParam(workers = 10))
#### Read in FRASER Object ####
fds <- readRDS(paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Filtered_Optim_psi5_psi3_and_theta.rds', sep = ''))
#### Detection of Aberrant Splicing Events ####
# Fit the Splicing Model
fds <- FRASER(fds, q=c(psi5=5, psi3=5, theta=5))
# Annotate Introns
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
orgDb <- org.Hs.eg.db
fds <- annotateRangesWithTxDb(fds, txdb=txdb, orgDb=orgDb)
# Call Splicing Outliers
res <- results(fds, zScoreCutoff=NA, padjCutoff = 0.999999, deltaPsiCutoff=0.0000001)
#### Save Objects ####
# Processed FRASER Object
FDS_File <- paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Processed.rds', sep = '')
saveRDS(fds, FDS_File)
# Results Object
RES_File <- paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Results.rds', sep = '')
saveRDS(res, RES_File)
#### Generate Results File for Each Sample ####
Samples <- unique(res$sampleID)
for (s in Samples){
Results_Samples <- res[res$sampleID == s]
write.xlsx(Results_Samples, paste(OUT_DIR, 'Untreated/','NVQ_Run093-NVQ_Run642_FRASER_Untreated_Results_deltaPsi_0_padj_1_', s, '.xlsx', sep = ''))
}
Kind regards,
Laurenz De Cock
I understand using different values of q can significantly affect results, so I wanted to ask whether I'm choosing q and the implementation
arg correctly.
My dataset is 54 affected male fibroblast samples.
optimHyperParams
(psi5_q = 5, psi3_q = 2, psiSite_q = 2) are much smaller than the q found by OUTRIDER findEncodingDim
(q=12) for the same set of samples, and also smaller than the q used in the colab notebook example with 50 samples (q=10)?I'm using FRASER to calculate counts for splice junctions, and STARv2 to get gene counts for OUTRIDER.
When I run
for(i in c("psi5", "psi3", "psiSite")) {
fds <- optimHyperParams(fds, i, plot=FALSE, implementation="PCA", BPPARAM=bpparam)
plotEncDimSearch(fds, type=i, plotType="auc")
plotEncDimSearch(fds, type=i, plotType="loss")
}}
I get plots that look like these (the plots for psi5, psi3, and psiSite look nearly identical):
and this is the plot from OUTRIDER plotEncDimSearch(ods)
for the same set of samples:
From the pre-print, I understood that auto-encoder correction performs much better than PCA, so I run FRASER with implementation="AE". Is it fine to run optimHyperParams(..) with the default implementation="PCA", and use those q's for FRASER with implementation="AE"?
I tried running optimHyperParams(..) with implementation="AE" (and other values taken from the source code) but got validation errors saying these are not recognized implementations:
> fds = optimHyperParams(fds, 'psi5', implementation="AE", plot=FALSE)
Error in needsHyperOpt(implementation) : Method not found: 'AE'!
> fds = optimHyperParams(fds, 'psi5', implementation="FRASER", plot=FALSE)
dPsi filter:FALSE: 1307001 TRUE: 57861
Exclusion matrix: FALSE: 1335180 TRUE: 29682
Thu Jul 9 02:02:00 2020: Injecting outliers: 9001 / 800 (primary/secondary)
Thu Jul 9 02:02:05 2020: Run hyper optimization with 13 options.
1 ; 2 ; 0
Thu Jul 9 02:02:06 2020 ; q: 2 ; noise: 0
Error: BiocParallel errors
element index: 1, 2, 3, 4, 5, 6, ...
first error: 'arg' should be one of “PCA”, “PCA-BB-Decoder”, “AE”, “AE-weighted”, “PCA-BB-full”, “fullAE”, “PCA-regression”, “PCA-reg-full”, “PCA-BB-Decoder-no-weights”, “BB”
optimHyperParams
for psi5, psi3, psiSite are sometimes not the same, what is the recommended way to choose which q to pass to FRASER? (currently I'm using q=max(bestQ(fds, type="psi5"), bestQ(fds, type="psi3"), bestQ(fds, type="psiSite"))
)After glancing at the source code, I also tried
q = list(psi5=bestQ(fds, type="psi5"), psi3=bestQ(fds, type="psi3"), psiSite=bestQ(fds, type="psiSite"))
fds <- FRASER(fds, q=q, implementation ="AE", BPPARAM=MulticoreParam(4))
but I got this error:
Wed Jul 8 18:01:50 2020: Fit step for: 'psi5'.
Wed Jul 8 18:01:50 2020: Running fit with correction method: AE
FALSE TRUE
97078 30265
Error in 1:nPcs : NA/NaN argument
I realize now I can separately run fit
and the other methods inside FRASER, and pass in a different q to each. Is this recommended?
error in installing devtools
ft_cache.h:9:10: fatal error: ft2build.h: No such file or directory
Dear C. Mertes,
I would like to analyze alternative splicing events in non-model organisms such as different mouse subspecies (e.g., Mus musculus castaneus).
Is it possible to do this? I am trying to follow the example with my data, but when I annotate the ranges, I cannot find the way to use the correct genome.
Thanks.
A.
Hi,
I tried to run fds <- calculatePSIValues(fds, types="jaccard", BPPARAM=bpparam())
but keep getting the error:
Error in FUN(X[[i]], ...) : Could not find read type: jaccard Calls: calculatePSIValues -> unique -> vapply -> FUN
fds
is created from:
library(FRASER)
annotation_dat <- data.table(sampleID="{sample_id}", bamFile="{sample_bam_path}", pairedEnd=TRUE)
settings <- FraserDataSet(colData=annotation_dat, workingDir=".")
fds <- countRNAData(settings)
I wonder if you could help look into this. Thanks!
Dear C. Mertes,
At the moment I'm running FRASER on 61 untreated PBMC samples and 60 treated PBMC samples. However we are running into some issues with HPC infrastructure. The HPC system has limited the compute time for all analysis to 70h, based on the current output of the FRASER model fit step, I'm afraid these 70h will be insufficient time to finish:
Wed Feb 1 12:08:32 2023: Fit step for: 'psi5'.
Wed Feb 1 12:08:34 2023: Running fit with correction method: PCA
Wed Feb 1 12:08:36 2023: Computing PCA ...
Wed Feb 1 12:10:44 2023: Fitting rho ...
Thu Feb 2 06:22:23 2023: rho fit:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000000 0.0000001 0.0000001 0.0039224 0.0000001 0.6091761
Thu Feb 2 06:27:56 2023: Compute p values for: 'psi5'.
Fri Feb 3 00:16:14 2023: Adjust p values for: 'psi5'.
Fri Feb 3 00:17:03 2023: Compute Z scores for: 'psi5'.
Fri Feb 3 00:21:16 2023: Fit step for: 'psi3'.
Fri Feb 3 00:21:28 2023: Running fit with correction method: PCA
Fri Feb 3 00:21:30 2023: Computing PCA ...
Fri Feb 3 00:23:18 2023: Fitting rho ...
Furthermore I previously provided FRASER with 90GB RAM for the analysis, which was unfortunately insufficient and I restarted the analysis with 200GB of RAM. Is this normal behaviour for FRASER with these sample amounts? Or can we speed up the model fitting process? Or can I fit each parameter separately in different jobsubmissions?
The script I currently use is the following:
############ NVQ_Run093-NVQ_Run642 Untreated Samples FRASER Analysis: Fit Model and Call Outliers ############
#### Load Required Libraries ####
library(FRASER)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)
#### Define Global Parameters ####
WORK_DIR <- '/data/gent/shared/001/gvo00101/RNA-Seq_in_Diagnostics/Aberrant_Splicing/FRASER/work/'
OUT_DIR <- '/data/gent/shared/001/gvo00101/RNA-Seq_in_Diagnostics/Aberrant_Splicing/FRASER/output/'
register(MulticoreParam(workers = 10))
#### Read in FRASER Object ####
fds <- readRDS(paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Filtered_Optim_psi5_psi3_and_theta.rds', sep = ''))
#### Detection of Aberrant Splicing Events ####
# Fit the Splicing Model
fds <- FRASER(fds, q=c(psi5=5, psi3=5, theta=5))
# Annotate Introns
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
orgDb <- org.Hs.eg.db
fds <- annotateRangesWithTxDb(fds, txdb=txdb, orgDb=orgDb)
# Call Splicing Outliers
res <- results(fds, zScoreCutoff=NA, padjCutoff = 0.999999, deltaPsiCutoff=0.0000001)
#### Save Objects ####
# Processed FRASER Object
FDS_File <- paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Processed.rds', sep = '')
saveRDS(fds, FDS_File)
# Results Object
RES_File <- paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Results.rds', sep = '')
saveRDS(res, RES_File)
#### Generate Results File for Each Sample ####
Samples <- unique(res$sampleID)
for (s in Samples){
Results_Samples <- res[res$sampleID == s]
write.xlsx(Results_Samples, paste(OUT_DIR, 'Untreated/','NVQ_Run093-NVQ_Run642_FRASER_Untreated_Results_deltaPsi_0_padj_1_', s, '.xlsx', sep = ''))
}
Kind regards,
Laurenz De Cock
Hi,
I got an error (below) while running FRASER on a large dataset.
Can you please help me to resolve this error?
Computing PCA ...
Fitting rho ...
Error in predictYCpp(as.matrix(H), D, b) :
Mat::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD
Calls: FRASER ... fit.FraserDataSet -> fitPCA -> updateRho -> predictY -> predictYCpp
Execution halted
Thank you!
Hello!
We are attempting to use FRASER on a cohort of 400 samples. We've been experiencing issues completing FRASER when sending the job to our sge queue, even when providing 1.5TB of memory. It seems the parallelization of the PSI calculation (fds <- calculatePSIValues(fds,BPPARAM=BPPARAM)) is causing the job to go over our h_vmem allocation. We've attempted to force FRASER to run in serial (BPPARAM=SerialParam()), but encounter the same issue maxing out of the memory.
Is it possible that FRASER is ignoring the serial setting?
Is there a quick fix for this? Or does a solution similar to the link below have to be implemented:
gagneurlab/OUTRIDER#11
Thank you for the tool, we've really enjoyed running it on some previous cohorts.
@ischeller @c-mertes
I ran FRASER pipeline with the cutoffs for results as follows. I wanted to see all the results and will apply manual filtering later.
res <- results(fds, zScoreCutoff=NA, padjCutoff=NA, deltaPsiCutoff=NA)
I got 8 lakh 37k entries
However, the p value range I found in the total results was from 0.73 to 1
p adj = 1 always for all rows
delta psi = 0 always for all rows
zscore = ranging from 1.5 to -1.5
I am not sure how to report the significantly present aberrant events. Can you provide some guideline?
Hi,
I am trying to run the demo data, as described in the tutorial, but when I run
fds <- FRASER(fds, q=c(psi5=2, psi3=3, theta=3))
I get the following error
Mon Dec 11 18:39:27 2023: Fit step for: 'jaccard'.
Mon Dec 11 18:39:27 2023: Running fit with correction method: PCA
Mon Dec 11 18:39:28 2023: Computing PCA ...
Error in if (nPcs > ncol(Matrix)) { : the condition has length > 1
I am running "R version 4.3.0"
Thanks
Hello,
First of all, thank you for devoloping this tool. Secondly, after trying to run the countRNAData function, the following error appears: "Error: useNames = NA is defunct. Instead, specify either useNames = TRUE or useNames = FALSE.". After looking up the reason behind this, seems that the version 1.2 of the matrixStats package causes this error. If I could have some help on how should I proceed in order to be able to use countRNAData function with this happening, I would really appreciate it. (I have tried to downgrade the matrixStats version, but haven't been lucky with it either). Thank you in advance.
Best regards,
Miriam
Hi,
I am trying to create the FRASER object using the function FraserDataSet
and here is the code that I ran
settings <- FraserDataSet(colData=annotated_toftable, workingDir="/hpf/largeprojects/ccmbio/yliang/smital_ASE/")
Both workingDir(settings)
and settings@workingDir
showed the workingDir that I specified. However when I want to check the FraserObject by just running settings
, this is the error message that I got
Fri Sep 16 13:17:51 2022: The given working directory '/localhd/125692/Rtmpu2AQgw/FRASER' does not exists. We will create it.
Warning message in dir.create(object@workingDir, recursive = TRUE):
“cannot create dir '/localhd/125692', reason 'Permission denied'”
ERROR while rich displaying an object: Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'scanBamParam': invalid class “FraserDataSet” object: Make sure we can write to the given working directory ' /localhd/125692/Rtmpu2AQgw/FRASER '.
Why the working directory showing here is different from the one that I specified? I am running FRASER on a cluster so I can't change the permission. I am using R 4.0.3 and the FRASER version is FRASER 1.2.1
Update:
Tried to install the package from github/c-mertes/FRASER, which updated the version to 1.9.1. The error disappeared. I don't know why the version I got from bioconductor is 1.2.1
Thanks for the help!
Laur
Dear sir,
I tried to use fds<- countRNAData(fds) to count my bam file.
There are something wrong when running Subread:
ERROR: Paired-end reads were detected in single-end read library : /NGS_Storage/Debbie/Tumor/A3060/arriba/Aligned.sortedByCoord.out.bam
No counts were generated.
Error: BiocParallel errors
1 remote errors, element index: 1
0 unevaluated and other errors
first remote error:
Hello!
I was trying to use FRASER with some test samples, and got error at countRNAdata step, my commond and output:
> fds <- countRNAData(settings)
Fri Jun 10 10:43:07 2022: Start counting the split reads ...
Fri Jun 10 10:43:07 2022: Count split reads for sample: Test1
Fri Jun 10 10:43:08 2022: Count split reads for sample: Test2
Fri Jun 10 10:49:52 2022 : count ranges need to be merged ...
Fri Jun 10 10:49:55 2022: Create splice site indices ...
Fri Jun 10 10:49:57 2022: Writing split counts to folder: /home/fraser/workingdir/savedObjects/Data_Analysis/splitCounts
Error in H5Fcreate(file) : HDF5. File accessibilty. Unable to open file.
I tried with both R3.6.1/R4.0.3 and single core/multicores.
Do you have any suggestions on this error?
Thanks a lot!
Zhe
I'm trying to run the latest version of FRASER. Everything works until I get to the results(..) function. The following code
library(FRASER)
library(annotables)
library(data.table)
library(ggplot2)
library(ggpubr)
library(dplyr)
library(purrr)
library(ggrepel)
library(plotly)
library(stringr)
library(RColorBrewer)
library(ggsci)
library(ggplot2)
library(gtable)
fds = loadFraserDataSet(".")
res = results(fds, padjCutoff={padj_threshold}, zScoreCutoff=NA, deltaPsiCutoff={delta_psi_threshold})
terminates with a "Please provide hgnc symbols to compute gene p values!" error.
I'm having a hard time tracing how the results(..) function even reaches that error which AFAIK is only raised by the aberrant(..) function when aggregate is True.
Hello, I'm trying to install FRASER 2.0, but when I install it with bioconductor, i get version 1.8.1, which is the first FRASER. When trying to use devtools i get the following error:
Using GitHub PAT from the git credential store.
Error: Failed to install 'unknown package' from GitHub:
HTTP error 401.
Bad credentials
Rate limit remaining: 53/60
Rate limit reset at: 2024-04-26 14:52:19 UTC
I have already checked my github path with Sys.getenv("GITHUB_PAT") and it seemes fine (I get [1] "" as answer). If I may have some help in order to install FRASER 2.0.
Thank you in advance.
countRNAData has safety checks but not their submethods for getSplitReadCountsForAllSamples & getNonSplitReadCountsForAllSamples. Would be nice if they could be added
Hello,
First of all, is this code still maintained? I received absolutely no news despite writing directly to the author.
My question is, is it possible to adapt the code for mouse data?
I tried to use
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
library(org.Mm.eg.db)
but I get errors about chromosome length, I suspect it's hardcoded to use human, is it the case?
It makes most of the plotting functions not usable, as they seem to depend on this annotation.
Correct way to calculate z-scores would be to derive it from the CDF:
cdf = pbetabinom(a, b , q=counts, log.p=TRUE)
zscore = qnorm(cdf, log.p=TRUE)
When I use countNonSplicedReads, it showed subread logo. And after waited for 1 day, it still keep running, but the cpu usage is very low. Because of no any error showed from it, I can't know what happened. Could you please add more information for better diagnosis。
Hello,
I am trying to update to a FRASER package which corresponds to FRASER 2.0.
According to GitHub anything above 1.9 would do and an R version of 4.0.0 or higher should enable installation with BiocManager.
Unfortunately, I am unable to install any version above 1.14.1.
available.packages(repos=BiocManager::repositories())["FRASER", "Version"]
[1] "1.14.1"
R.version
_
platform x86_64-conda-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 4
minor 3.3
year 2024
month 02
day 29
svn rev 86002
language R
version.string R version 4.3.3 (2024-02-29)
nickname Angel Food Cake
packageDescription("FRASER")$Version
[1] "1.14.1"
BiocManager::install("FRASER")
Bioconductor version 3.18 (BiocManager 1.30.23), R 4.3.3 (2024-02-29)
Old packages: 'BiocFileCache', 'broom', 'bslib', 'cachem', 'curl',
'data.table', 'DBI', 'dbplyr', 'evaluate', 'farver', 'fastmap', 'fs',
'future', 'future.apply', 'GenomeInfoDb', 'ggplot2', 'gtable', 'hardhat',
'highr', 'htmltools', 'httpuv', 'KernSmooth', 'knitr', 'lattice',
'matrixStats', 'munsell', 'nlme', 'openssl', 'promises', 'ragg',
'RcppArmadillo', 'repr', 'rlang', 'rmarkdown', 'RSQLite', 'rstudioapi',
'seriation', 'shiny', 'stringi', 'survival', 'systemfonts', 'textshaping',
'tinytex', 'vegan', 'VGAM', 'xfun', 'xts'
Update all/some/none? [a/s/n]: n
Warning message:
package(s) not installed when version(s) same as or greater than current; use
force = TRUE
to re-install: 'FRASER'
I tried the force installation or updating other packages but ended up with the same FRASER version.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.