c-mertes / fraser Goto Github PK

FRASER - Find RAre Splicing Events in RNA-seq

License: MIT License

R 93.72% C++ 5.66% TeX 0.63%

rna-seq outlier-detection aberrant-splicing splicing rare-disease diagnostics r

fraser's Introduction

FRASER - Find RAre Splicing Events in RNA-seq

FRASER is a tool to detect aberrant splicing events in RNA-seq data. The method is described and published in Nature Communications and available through Bioconductor. It is also part of the Detection of RNA Outlier Pipeline (DROP). DROP is described and published in Nature Protocols.

The FRASER framework and workflow aims to assist the diagnostics in the field of rare diseases where RNA-seq is performed to identify aberrant splicing defects. For a short tutorial on how to use FRASER on a dataset please use the vignette or our Colab tutorial at: http://tinyurl.com/RNA-ASHG-colab. The Colab is based on a workshop that we presented at ASHG 2019/2020.

Please cite our method paper FRASER if you use it in a publication:

Mertes, C., Scheller, I.F., Yépez, V.A. et al. Detection of aberrant splicing events in RNA-seq data using FRASER. Nat Commun 12, 529 (2021). https://doi.org/10.1038/s41467-020-20573-7

or if you use FRASER2:

Scheller, I.F., Lutz, K., Mertes, C et al. Improved detection of aberrant splicing with FRASER 2.0 and the intron Jaccard index. Am Jrnl Hum Genet 110, 12 (2023). https://doi.org/10.1016/j.ajhg.2023.10.014

What's new

FRASER 2.0, an improved version of FRASER, is now available and used by default (version 1.99.0 and above). FRASER 2.0 uses the Intron Jaccard Index as its splice metric instead of FRASER's previous three metrics along with some other parameter optimizations of pseudocounts, filtering settings and default delta cutoff.

To change the splice metric, set fitMetrics(fds) to one or more of the metrics specified in FRASER::psiTypes. For FRASER 2.0 and the Intron Jaccard Index, the new default delta cutoff is 0.1 instead of the previous value of 0.3. When using the 3 previous metrics, the delta cutoff should be set manually to 0.3 during results extraction, e.g. results(fds, deltaPsiCutoff=0.3,...).

The preprint describing these changes in more detail is available in medRxiv.

Installation

FRASER is an R/Bioconductor software package requiring a running R 3.6 version or higher.

The recommended way of installing FRASER is through Bioconductor:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages('BiocManager')
BiocManager::install('FRASER')

If you use an R version below 4.0.0, you have to install it from the source. To this end, we will use devtools to install it. For this, you need a working development environment to compile the C++ code (see for details: Windows or MacOS X).

if (!requireNamespace("devtools", quietly=TRUE))
    install.packages('devtools')

# latest development version
devtools::install_github('gagneurlab/FRASER', dependencies=TRUE)

# or a specific version of FRASER (search for tags on github)
devtools::install_github('gagneurlab/FRASER', ref='1.1.3', dependencies=TRUE)

If you have dependency issues while installing any package, please have a look at the Troubleshooting section or submit an issue on GitHub.

Toubleshooting

Error in UseMethod("filter_")

When using FRASER with R3.6 one might observe the following error:

fds <- annotateRanges(fds)
# Error in UseMethod("filter_") :
#   no applicable method for 'filter_' applied to an object of class "c('tbl_SQLiteConnection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"

To overcome this error one needs to upgrade BiocFileCache.

BiocManager::install("Bioconductor/BiocFileCache", ask=FALSE, update=FALSE)

Missing libraries while compiling R packages

On some Linux distributions we need the developer libraries for compiling the R packages.

To install those packages, please run as administrator:

For Ubuntu or Debian based systems:

sudo apt-get install build-essential libcurl4-gnutls-dev libxml2-dev libssl-dev zlib1g-dev libmysqld-dev

For centOS or RHEL based systems:

sudo yum install R-devel zlib-devel openssl-devel libcurl-devel libxml2-devel mariadb-devel

fraser's People

Contributors

Stargazers

Watchers

Forkers

ischeller amrr101 mumichae gagneurlab lu96 kant bw2 standardgalactic nickhsmith drsivamani pleiadian53 volcanicsalmon vyepez88 liangdp1984 atajadidahari yujie422

fraser's Issues

In countNonSplicedReads stucked for no error report

When I use countNonSplicedReads, it showed subread logo. And after waited for 1 day, it still keep running, but the cpu usage is very low. Because of no any error showed from it, I can't know what happened. Could you please add more information for better diagnosis。

FRASER Error in check_returned_array

Dear C. Mertes,

I'm running FRASER on 61 PBMC samples for the detection of aberrant splicing events in these samples. Counting of the spliced and none spliced reads and parameter optimisation worked perfectly. However, during the model fitting step I run into the following error:

Wed Feb  1 12:08:32 2023: Fit step for: 'psi5'.
Wed Feb  1 12:08:34 2023: Running fit with correction method: PCA
Wed Feb  1 12:08:36 2023: Computing PCA ...
Wed Feb  1 12:10:39 2023: Fitting rho ...
Thu Feb  2 05:31:11 2023: rho fit:
             Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
        0.0000000 0.0000001 0.0000001 0.0047353 0.0000001 0.5999270
Thu Feb  2 05:36:43 2023: Compute p values for: 'psi5'.
Thu Feb  2 22:18:56 2023: Adjust p values for: 'psi5'.
Thu Feb  2 22:19:42 2023: Compute Z scores for: 'psi5'.
Error in check_returned_array(ans, expected_dim, "extract_array", class(x)) :
  The "extract_array" method for HDF5ArraySeed objects returned an array
  with incorrect dimensions. Please contact the author of the
  HDF5ArraySeed class (defined in the HDF5Array package) about this and
  point him/her to the man page for extract_array() in the DelayedArray
  package (?extract_array).
Calls: FRASER ... extract_array -> extract_array -> check_returned_array
Execution halted

As the error occurs in the middle of the fitting step, I believe that it is not directly related to my input files and thus I was wondering whether you had an idea what might be causing the error?

This is the script I use for the model fit:

############ NVQ_Run093-NVQ_Run642 Untreated Samples FRASER Analysis: Fit Model and Call Outliers ############

#### Load Required Libraries ####
library(FRASER)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)

#### Define Global Parameters ####
WORK_DIR <- '/data/gent/shared/001/gvo00101/RNA-Seq_in_Diagnostics/Aberrant_Splicing/FRASER/work/'
OUT_DIR <- '/data/gent/shared/001/gvo00101/RNA-Seq_in_Diagnostics/Aberrant_Splicing/FRASER/output/'
register(MulticoreParam(workers = 10))

#### Read in FRASER Object ####
fds <- readRDS(paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Filtered_Optim_psi5_psi3_and_theta.rds', sep = ''))

#### Detection of Aberrant Splicing Events ####
# Fit the Splicing Model 
fds <- FRASER(fds, q=c(psi5=5, psi3=5, theta=5))

# Annotate Introns
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
orgDb <- org.Hs.eg.db

fds <- annotateRangesWithTxDb(fds, txdb=txdb, orgDb=orgDb)

# Call Splicing Outliers
res <- results(fds, zScoreCutoff=NA, padjCutoff = 0.999999, deltaPsiCutoff=0.0000001)

#### Save Objects ####
# Processed FRASER Object
FDS_File <- paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Processed.rds', sep = '')

saveRDS(fds, FDS_File)

# Results Object
RES_File <- paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Results.rds', sep = '')

saveRDS(res, RES_File)

#### Generate Results File for Each Sample ####
Samples <- unique(res$sampleID)

for (s in Samples){
  Results_Samples <- res[res$sampleID == s]
  
  write.xlsx(Results_Samples, paste(OUT_DIR, 'Untreated/','NVQ_Run093-NVQ_Run642_FRASER_Untreated_Results_deltaPsi_0_padj_1_', s, '.xlsx', sep = ''))
}

Kind regards,
Laurenz De Cock

install devtools

error in installing devtools

ft_cache.h:9:10: fatal error: ft2build.h: No such file or directory

loadFraserDataSet directory ignored

Hi, thanks for making this great library.
I'm seeing an issue with the loadFraserDataSet function.
When I run

setwd("/tmp/fraser/")
saveFraserDataSet(fds)

I can later do

fds <- loadFraserDataSet(file="/tmp/fraser/savedObjects/Data_Analysis/fds-object.RDS")

to restore the dataset.
But if I rename /tmp/fraser to /tmp/fraser2 and then run

fds <- loadFraserDataSet(file="/tmp/fraser2/savedObjects/Data_Analysis/fds-object.RDS")

I get this error:

> fds <- loadFraserDataSet(file="/tmp/fraser2/savedObjects/Data_Analysis/fds-object.RDS")
Error in value[[3L]](cond) : 
  'assay(<FraserDataSet>, i="character", ...)' invalid subscript 'i'
failed to open file '/tmp/fraser/savedObjects/Data_Analysis/rawCountsJ.h5'

which tells me it's not updating the absolute path.
The same thing happens if I use the dir= arg instead of file=.

Questions about optimal q and implementation="AE" vs. "PCA"

I understand using different values of q can significantly affect results, so I wanted to ask whether I'm choosing q and the implementation arg correctly.

My dataset is 54 affected male fibroblast samples.

One thing I'm curious about is why the optimal q's found by FRASER optimHyperParams (psi5_q = 5, psi3_q = 2, psiSite_q = 2) are much smaller than the q found by OUTRIDER findEncodingDim (q=12) for the same set of samples, and also smaller than the q used in the colab notebook example with 50 samples (q=10)?

I'm using FRASER to calculate counts for splice junctions, and STARv2 to get gene counts for OUTRIDER.

When I run

for(i in c("psi5", "psi3", "psiSite")) {
    fds <- optimHyperParams(fds, i, plot=FALSE, implementation="PCA", BPPARAM=bpparam)
    plotEncDimSearch(fds, type=i, plotType="auc")
    plotEncDimSearch(fds, type=i, plotType="loss")
}}

I get plots that look like these (the plots for psi5, psi3, and psiSite look nearly identical):

and this is the plot from OUTRIDER plotEncDimSearch(ods) for the same set of samples:

From the pre-print, I understood that auto-encoder correction performs much better than PCA, so I run FRASER with implementation="AE". Is it fine to run optimHyperParams(..) with the default implementation="PCA", and use those q's for FRASER with implementation="AE"?
I tried running optimHyperParams(..) with implementation="AE" (and other values taken from the source code) but got validation errors saying these are not recognized implementations:

> fds = optimHyperParams(fds, 'psi5', implementation="AE", plot=FALSE)
Error in needsHyperOpt(implementation) : Method not found: 'AE'!

> fds = optimHyperParams(fds, 'psi5', implementation="FRASER", plot=FALSE)
dPsi filter:FALSE: 1307001	TRUE: 57861
Exclusion matrix: FALSE: 1335180	TRUE: 29682
Thu Jul  9 02:02:00 2020: Injecting outliers: 9001 / 800 (primary/secondary)
Thu Jul  9 02:02:05 2020: Run hyper optimization with 13 options.
1 ;	 2 ;	 0
Thu Jul  9 02:02:06 2020 ; q: 2 ; noise:  0
Error: BiocParallel errors
  element index: 1, 2, 3, 4, 5, 6, ...
  first error: 'arg' should be one of “PCA”, “PCA-BB-Decoder”, “AE”, “AE-weighted”, “PCA-BB-full”, “fullAE”, “PCA-regression”, “PCA-reg-full”, “PCA-BB-Decoder-no-weights”, “BB”

Given that optimal q computed by optimHyperParams for psi5, psi3, psiSite are sometimes not the same, what is the recommended way to choose which q to pass to FRASER? (currently I'm using q=max(bestQ(fds, type="psi5"), bestQ(fds, type="psi3"), bestQ(fds, type="psiSite")))

After glancing at the source code, I also tried

q = list(psi5=bestQ(fds, type="psi5"), psi3=bestQ(fds, type="psi3"), psiSite=bestQ(fds, type="psiSite"))
fds <- FRASER(fds, q=q, implementation ="AE",  BPPARAM=MulticoreParam(4))

but I got this error:

Wed Jul  8 18:01:50 2020: Fit step for: 'psi5'.
Wed Jul  8 18:01:50 2020: Running fit with correction method: AE

FALSE  TRUE 
97078 30265 
Error in 1:nPcs : NA/NaN argument

I realize now I can separately run fit and the other methods inside FRASER, and pass in a different q to each. Is this recommended?

Error in working directory

Hi,

I am trying to create the FRASER object using the function FraserDataSet and here is the code that I ran

settings <- FraserDataSet(colData=annotated_toftable, workingDir="/hpf/largeprojects/ccmbio/yliang/smital_ASE/")

Both workingDir(settings) and settings@workingDir showed the workingDir that I specified. However when I want to check the FraserObject by just running settings, this is the error message that I got

Fri Sep 16 13:17:51 2022: The given working directory '/localhd/125692/Rtmpu2AQgw/FRASER' does not exists. We will create it.

Warning message in dir.create(object@workingDir, recursive = TRUE):
“cannot create dir '/localhd/125692', reason 'Permission denied'”
ERROR while rich displaying an object: Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'scanBamParam': invalid class “FraserDataSet” object: Make sure we can write to the given working directory ' /localhd/125692/Rtmpu2AQgw/FRASER '.

Why the working directory showing here is different from the one that I specified? I am running FRASER on a cluster so I can't change the permission. I am using R 4.0.3 and the FRASER version is FRASER 1.2.1

Update:
Tried to install the package from github/c-mertes/FRASER, which updated the version to 1.9.1. The error disappeared. I don't know why the version I got from bioconductor is 1.2.1

Thanks for the help!
Laur

Ppadjust is always 1

Hello everyone,
I have been familiarizing myself with FRASER for a week now. Read the article and the vignette, great tool in general. Congrats!
The design I have been trying to use for your tool is to have a number of "control" samples with no apparent events detected and a "study" sample to be contrasted. For this initial test, this sample has a confirmed aberrant splicing event that truncates the transcript. I don't know if this design fits in the FRASER use cases. Does it?
The FraserDataSet looks like this:

   sampleID   group bamFile                                       pairedEnd
   <fct>      <dbl> <chr>                                         <lgl>    
 1 control-1     1 control-1.Aligned.out.sorted.q11_q21.bam TRUE     
 2 control-2     2 control-2.Aligned.out.sorted.q11_q21.bam TRUE     
 3 control-3     3 control-3.Aligned.out.sorted.q11_q21.bam TRUE     
 4 control-4     4 control-4.Aligned.out.sorted.q11_q21.bam TRUE     
 5 control-5     5 control-5.Aligned.out.sorted.q11_q21.bam TRUE     
 6 control-6     6 control-6.Aligned.out.sorted.q11_q21.bam TRUE     
 7 control-7     7 control-7.Aligned.out.sorted.q11_q21.bam TRUE     
 8 control-8     8 control-8.Aligned.out.sorted.q11_q21.bam TRUE     
 9 control-9     9 control-9.Aligned.out.sorted.q11_q21.bam TRUE     
10 control-10    10 control-10.Aligned.out.sorted.q11_q21.bam TRUE     
11 study      11 study.Aligned.out.sorted.q11_q21.bam TRUE     

----------------------- Settings -----------------------
Analysis name:               Data Analysis 
Analysis is strand specific: reverse 
Working directory:           './temp/study' 

-------------------- BAM parameters --------------------
class: ScanBamParam
bamFlag (NA unless specified):
bamSimpleCigar: FALSE
bamReverseComplement: FALSE
bamTag:  
bamTagFilter:
bamWhich: 0 ranges
bamWhat:
bamMapqFilter: 20

I would like to know how to tell Fraser how to group samples? I have seen in the vignette sample grouping is performed based on the group column and the condition column in the input sample table. However, even though I have been playing around with these two categories I haven't seen differences in the obtained results. Using both column names and using different grouping designs, for example all control samples as group 1 and study sample as group 2, etc.
I have been able to extract the aberrant event from the study sample by using the zScore > abs(2.5) and psiValue > 0.6 columns. However, filtering by padjust returns an empty table. That is actually my major concern. I haven't been able to obtain a value for the padjust different from 1. This actually happens to all events regardless of the sample.
Is it related to the grouping? I need more samples in the analysis? Is there a parameter that could decrease the statistic stringency? What would you suggest?
Thanks in advance!
David

Error in predictYCpp(as.matrix(H), D, b) : Mat::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD

Hi,
I got an error (below) while running FRASER on a large dataset.
Can you please help me to resolve this error?

Computing PCA ...
Fitting rho ...
Error in predictYCpp(as.matrix(H), D, b) :
Mat::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD
Calls: FRASER ... fit.FraserDataSet -> fitPCA -> updateRho -> predictY -> predictYCpp
Execution halted

Thank you!

filter out reads with MAPQ=0 & duplicated reads

for fraser junction count analysis, in default setting: ScanBamParam(mapqFilter=0), but sometime, we can see several duplicated reads with mapQ=0 and reads with multiple locations with mapQ=0, it can also cause the false positive abrrent splicing events. what's your opinion for mapqfilter setting? if we set mapqFilter >0, whether it results other false negative events?

Getting error when counting splice sites

Hello,

I installed the package on an Ubuntu 22 server.

When running
fds <- countRNAData(settings)
with my own data (ONT bam files), the first steps work ok, but then I get

Mon Aug 29 17:58:49 2022: In total 239257 splice sites (acceptor/donor) will be counted ...
Error in reducer$value.cache[[as.character(idx)]] <- values :
wrong args for environment subassignment
In addition: Warning message:
In parallel::mccollect(wait = FALSE, timeout = 1) :
1 parallel job did not deliver a result

What could be the issue?

Thanks

annotateRangesWithTxDb function error

Hi.

I follow vignette with my bam file.

However, When I entered annotateRangesWithTxDb (fds, txdb=txdb, orgDb=orgDb), i got warning and error message :
403 genes were dropped because they have exons located on both strands of the same reference sequence or on more than one reference sequence, so cannot be represented by a single genomic range.
Use 'single.strand.genes.only=FALSE' to get all the genes in a GRangesList object
Error in mergeNamedAtomicVectors(seqlengths(x), seqlengths(y), what=c("seqeunce", : sequence chrM has incompatible seqlengths:
- in 'x' : 16569
- in 'y' : 16571

I don't know what I can do with the fds object and how I can solve the problem.
Thank you for your help.

Error in H5Fcreate(file) : HDF5. File accessibilty. Unable to open file.

Hello!

I was trying to use FRASER with some test samples, and got error at countRNAdata step, my commond and output:

> fds <- countRNAData(settings)
Fri Jun 10 10:43:07 2022: Start counting the split reads ...
Fri Jun 10 10:43:07 2022: Count split reads for sample: Test1
Fri Jun 10 10:43:08 2022: Count split reads for sample: Test2
Fri Jun 10 10:49:52 2022 : count ranges need to be merged ...
Fri Jun 10 10:49:55 2022: Create splice site indices ...
Fri Jun 10 10:49:57 2022: Writing split counts to folder: /home/fraser/workingdir/savedObjects/Data_Analysis/splitCounts
Error in H5Fcreate(file) : HDF5. File accessibilty. Unable to open file.

I tried with both R3.6.1/R4.0.3 and single core/multicores.

Do you have any suggestions on this error?

Thanks a lot!
Zhe

FRASER 2.0 versions not available

Hello,

I am trying to update to a FRASER package which corresponds to FRASER 2.0.
According to GitHub anything above 1.9 would do and an R version of 4.0.0 or higher should enable installation with BiocManager.

Unfortunately, I am unable to install any version above 1.14.1.

available.packages(repos=BiocManager::repositories())["FRASER", "Version"]
[1] "1.14.1"
R.version
_
platform x86_64-conda-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 4
minor 3.3
year 2024
month 02
day 29
svn rev 86002
language R
version.string R version 4.3.3 (2024-02-29)
nickname Angel Food Cake
packageDescription("FRASER")$Version
[1] "1.14.1"
BiocManager::install("FRASER")
Bioconductor version 3.18 (BiocManager 1.30.23), R 4.3.3 (2024-02-29)
Old packages: 'BiocFileCache', 'broom', 'bslib', 'cachem', 'curl',
'data.table', 'DBI', 'dbplyr', 'evaluate', 'farver', 'fastmap', 'fs',
'future', 'future.apply', 'GenomeInfoDb', 'ggplot2', 'gtable', 'hardhat',
'highr', 'htmltools', 'httpuv', 'KernSmooth', 'knitr', 'lattice',
'matrixStats', 'munsell', 'nlme', 'openssl', 'promises', 'ragg',
'RcppArmadillo', 'repr', 'rlang', 'rmarkdown', 'RSQLite', 'rstudioapi',
'seriation', 'shiny', 'stringi', 'survival', 'systemfonts', 'textshaping',
'tinytex', 'vegan', 'VGAM', 'xfun', 'xts'
Update all/some/none? [a/s/n]: n
Warning message:
package(s) not installed when version(s) same as or greater than current; use
force = TRUE to re-install: 'FRASER'

I tried the force installation or updating other packages but ended up with the same FRASER version.

Adapt the script for mouse?

Hello,

First of all, is this code still maintained? I received absolutely no news despite writing directly to the author.

My question is, is it possible to adapt the code for mouse data?

I tried to use
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
library(org.Mm.eg.db)

but I get errors about chromosome length, I suspect it's hardcoded to use human, is it the case?

It makes most of the plotting functions not usable, as they seem to depend on this annotation.

Understanding the concept behind the metrics used for FRASER calculations

@c-mertes @ischeller It will be great if you guys can explain more about the metrics sigh3 and sigh5 and theta 5 and theta3 used for the calculations of FRASER. It will be really helpful.

"Please provide hgnc symbols to compute gene p values!" error

I'm trying to run the latest version of FRASER. Everything works until I get to the results(..) function. The following code

library(FRASER)
library(annotables)
library(data.table)
library(ggplot2)
library(ggpubr)
library(dplyr)
library(purrr)
library(ggrepel)
library(plotly)
library(stringr)
library(RColorBrewer)
library(ggsci)
library(ggplot2)
library(gtable)

fds = loadFraserDataSet(".")

res = results(fds, padjCutoff={padj_threshold}, zScoreCutoff=NA, deltaPsiCutoff={delta_psi_threshold})

terminates with a "Please provide hgnc symbols to compute gene p values!" error.

I'm having a hard time tracing how the results(..) function even reaches that error which AFAIK is only raised by the aberrant(..) function when aggregate is True.

Safety checks for sub function of countRNADaata

countRNAData has safety checks but not their submethods for getSplitReadCountsForAllSamples & getNonSplitReadCountsForAllSamples. Would be nice if they could be added

Incompatibility of fds objects between versions

I have saved to a file a FRASER object that has been generated with version 1.2.1.

Then, I have tried to load it with the latest version of FRASER (1.99.2) and I got the following error:

"Error in validObject(object) :
invalid class "FraserDataSet" object: The nonSplicedReads do not have corresponding splitReads. This is probably the result of merging"

Indeed, the counts are from an external tool and I have formatted them in order to be compatible with FRASER requirements. But in version 1.2.1 I can get the outlier results without any problem.

I would like to know if the format of the counting matrices has changed in the latest release. I have checked the last manual and I have not found any new formatting specification.

A question about resultsByGenes

Hi，
I found the following problems when viewing the FRASER results of genes (The red lines are removed in the gene result):

These data on the graph show that the removed and retained have the same pvalue and padjust. In this case, how does FRASER judge which junction site should be kept?

Running FRASER for 4 bam files

@ischeller @c-mertes
I ran FRASER pipeline with the cutoffs for results as follows. I wanted to see all the results and will apply manual filtering later.

res <- results(fds, zScoreCutoff=NA, padjCutoff=NA, deltaPsiCutoff=NA)

I got 8 lakh 37k entries
However, the p value range I found in the total results was from 0.73 to 1
p adj = 1 always for all rows
delta psi = 0 always for all rows
zscore = ranging from 1.5 to -1.5

I am not sure how to report the significantly present aberrant events. Can you provide some guideline?

loadFraserDataset failing if RNASeq data counts generated and loaded on different workstations

Steps to reproduce:

countRNA run on workstation A and saved via saveFraserDataset
Dataset transferred to workstation B (dataset saved in a different path)
Data is loaded on workstation B using loadFraserDataset

Expected:
fds dataset is successfully loaded

Actual:
The following error is thrown
Error: no slot of name "seed" for this object of class "HDF5ArraySeed"

Code issue:
https://github.com/c-mertes/FRASER/blob/faa0a207156e9d575dcbb61d774d16e6d24f9371/R/saveHDF5Objects.R#L114

From what I understand, the logic (just preceding the line above) checks if the HDF datasets are available (as per the fds object), otherwise they are loaded from the specified 'dir' parameter. This bug is surfaced in the R major version > 3 portion.

The slot( .... "seed") directive doesn't need another @seed accessor. That's what causing this scenario to fail

My crude fix:

    else if (afile != path(assay(fds, aname, withDimnames = FALSE))) {
      if (R.Version()$major == "3") {
        path(assay(fds, aname, withDimnames = FALSE)) <- afile
      }
      else {
        if ("seed" %in% slotNames(assay(fds, aname, withDimnames = FALSE))) {
          if ("filepath" %in% slotNames(slot(assay(fds, aname, withDimnames = FALSE), "seed"))) {
            slot(assay(fds, aname, withDimnames = FALSE), "seed")@filepath <- afile
          }
        }
        
        if ("seed" %in% slotNames(assay(fds, aname, withDimnames = FALSE))) {
          if ("seed" %in% slotNames(slot(assay(fds, aname, withDimnames = FALSE), "seed"))) {
            if ("filepath" %in% slotNames(slot(slot(assay(fds, aname, withDimnames = FALSE), "seed"), "seed"))) {
              slot(assay(fds, aname, withDimnames = FALSE), "seed")@seed@filepath <- afile
            }
          }
        }
      }
    }

Sessioninfo:

sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggpubr_0.4.0                ggplot2_3.3.2               FRASER_1.0.2                SummarizedExperiment_1.18.2
 [5] DelayedArray_0.14.1         matrixStats_0.57.0          Biobase_2.48.0              Rsamtools_2.4.0            
 [9] Biostrings_2.56.0           XVector_0.28.0              GenomicRanges_1.40.0        GenomeInfoDb_1.24.2        
[13] IRanges_2.22.2              S4Vectors_0.26.1            BiocGenerics_0.34.0         data.table_1.13.0          
[17] BiocParallel_1.22.0        

loaded via a namespace (and not attached):
  [1] VGAM_1.1-3                colorspace_1.4-1          ggsignif_0.6.0            ellipsis_0.3.1            rio_0.5.16               
  [6] rstudioapi_0.11           ggrepel_0.8.2             bit64_4.0.5               AnnotationDbi_1.50.3      fansi_0.4.1              
 [11] splines_4.0.2             R.methodsS3_1.8.1         PRROC_1.3.1               knitr_1.30                jsonlite_1.7.1           
 [16] broom_0.7.1               dbplyr_1.4.4              R.oo_1.24.0               pheatmap_1.0.12           HDF5Array_1.16.1         
 [21] compiler_4.0.2            httr_1.4.2                backports_1.1.10          assertthat_0.2.1          Matrix_1.2-18            
 [26] lazyeval_0.2.2            cli_2.0.2                 htmltools_0.5.0           prettyunits_1.1.1         tools_4.0.2              
 [31] gtable_0.3.0              glue_1.4.2                GenomeInfoDbData_1.2.3    dplyr_1.0.2               rappdirs_0.3.1           
 [36] Rcpp_1.0.5                carData_3.0-4             cellranger_1.1.0          vctrs_0.3.4               rtracklayer_1.48.0       
 [41] DelayedMatrixStats_1.10.1 xfun_0.18                 stringr_1.4.0             openxlsx_4.2.2            lifecycle_0.2.0          
 [46] rstatix_0.6.0             XML_3.99-0.5              zlibbioc_1.34.0           scales_1.1.1              BSgenome_1.56.0          
 [51] pcaMethods_1.80.0         hms_0.5.3                 rhdf5_2.32.4              RColorBrewer_1.1-2        BBmisc_1.11              
 [56] yaml_2.2.1                curl_4.3                  memoise_1.1.0             biomaRt_2.44.1            stringi_1.5.3            
 [61] RSQLite_2.2.1             checkmate_2.0.0           GenomicFeatures_1.40.1    zip_2.1.1                 Rsubread_2.2.6           
 [66] rlang_0.4.8               pkgconfig_2.0.3           bitops_1.0-6              evaluate_0.14             lattice_0.20-41          
 [71] purrr_0.3.4               Rhdf5lib_1.10.1           GenomicAlignments_1.24.0  htmlwidgets_1.5.2         cowplot_1.1.0            
 [76] bit_4.0.4                 tidyselect_1.1.0          magrittr_1.5              R6_2.4.1                  generics_0.0.2           
 [81] DBI_1.1.0                 pillar_1.4.6              haven_2.3.1               foreign_0.8-80            withr_2.3.0              
 [86] abind_1.4-5               RCurl_1.98-1.2            tibble_3.0.3              crayon_1.3.4              car_3.0-10               
 [91] utf8_1.1.4                BiocFileCache_1.12.1      plotly_4.9.2.1            rmarkdown_2.4             progress_1.2.2           
 [96] grid_4.0.2                readxl_1.3.1              blob_1.2.1                forcats_0.5.0             digest_0.6.25            
[101] tidyr_1.1.2               extraDistr_1.9.1          R.utils_2.10.1            openssl_1.4.3             munsell_0.5.0            
[106] viridisLite_0.3.0         askpass_1.1

Error in `[[<-`(`tmp`, name, value = "Donor")

When doing the count split read by using our own bam file.
I use the command as follow:

settings <- FraserDataSet(colData=anno, workingDir="../")
fds <- countRNAData(settings)

Tue Dec 1 06:50:58 2020: Start counting the split reads ...
Tue Dec 1 06:50:58 2020: Count split reads for sample: RNA-JXY-001A
Tue Dec 1 06:57:31 2020: Count split reads for sample: RNA-YCS-004A
Tue Dec 1 07:04:45 2020: Count split reads for sample: RNA-CSW-006A
Tue Dec 1 07:13:19 2020: Count split reads for sample: RNA-WCJ-007A
Tue Dec 1 07:18:48 2020: Count split reads for sample: RNA-SYL-009A
Tue Dec 1 07:25:06 2020 : count ranges need to be merged ...
Tue Dec 1 07:25:06 2020: Create splice site indices ...
Tue Dec 1 07:25:06 2020: Writing split counts to folder: ..//savedObjects/Data_Analysis/splitCounts
Tue Dec 1 07:25:06 2020: Identifying introns with read count <= 20 in all samples...
Tue Dec 1 07:25:06 2020: Start counting the non spliced reads ...
Tue Dec 1 07:25:06 2020: In total 0 splice junctions are found.

Error in [[<-(*tmp*, name, value = "Donor") :
1 elements in value to replace 0 elements

It shows error like this.

Trying to install FRASER 2.0

Hello, I'm trying to install FRASER 2.0, but when I install it with bioconductor, i get version 1.8.1, which is the first FRASER. When trying to use devtools i get the following error:

Using GitHub PAT from the git credential store.
Error: Failed to install 'unknown package' from GitHub:
HTTP error 401.
Bad credentials

Rate limit remaining: 53/60
Rate limit reset at: 2024-04-26 14:52:19 UTC

I have already checked my github path with Sys.getenv("GITHUB_PAT") and it seemes fine (I get [1] "" as answer). If I may have some help in order to install FRASER 2.0.

Thank you in advance.

FRASER bam files for samples

It will be really helpful if you can provide the bam files for all the example data which includes sample4,sample5....sample9.

psiValue calculation

Dear Team,

I am using Fraser on count table generated by featurecounts(split reads + non split reads) for splice sites. I have given unique ID to each splice site. Both tables shares ID( junction count endID == splicesiteCount, spliceSiteID). For junction count table "startID" is the just number of rows(1,2,3,4 and so on). Each table is sorted by chromosome and start coordinate. This is how, I created table for FRASER.

Fraserdata set creation was successful. But I am facing problem in PSI value calculation:
fds <- calculatePSIValues(fds)

After calculating psi3 and 5values, It is giving following error:

Calculating the PSI site values:
0% Error: subscript contains invalid names
Execution halted.

Can you please help me with this? Am I using wrong format of table?
Thank you.

FRASER with non model organisms

Dear C. Mertes,

I would like to analyze alternative splicing events in non-model organisms such as different mouse subspecies (e.g., Mus musculus castaneus).

Is it possible to do this? I am trying to follow the example with my data, but when I annotate the ranges, I cannot find the way to use the correct genome.

Thanks.

Error: useNames = NA is defunct

Dear sir,
I ran the test dataset, and got the error below

Question regarding the sample size of FRASER

Hello. I am new to bioinformatics and curious how to use FRASER to look for aberrant splicing events in a single sample (If this is possible)? I know the recommended sample size is 30. Lets say if I run 1 patient sample and 30 normal samples from GTEx together, will this sufficient enough to look for aberrant splicing events in my sample?
Any help will be much appreciated.

Error in if (nPcs > ncol(Matrix))

Hi,

I am trying to run the demo data, as described in the tutorial, but when I run
fds <- FRASER(fds, q=c(psi5=2, psi3=3, theta=3))

I get the following error
Mon Dec 11 18:39:27 2023: Fit step for: 'jaccard'.
Mon Dec 11 18:39:27 2023: Running fit with correction method: PCA
Mon Dec 11 18:39:28 2023: Computing PCA ...
Error in if (nPcs > ncol(Matrix)) { : the condition has length > 1

I am running "R version 4.3.0"

Thanks

Merging Multiple Samples

Dear C. Mertes,

I would like to run FRASER on a dataset of 47 BAM files. However, when running the countRNAData function I keep getting out of memory errors. Is it possible to run fds <- countRNAData(createTestFraserSettings()) on each sample individually, and then merge the resulting fds objects? If not, do you know how I can best get around this issue? Thanks in advance!

All adjusted p-values at 1

Dear FRASER2 team,

I am currently trying to use FRASER2 to analyse RNAseq data (mRNA and totalRNA), from muscle biopsy samples. I have been trying to reproduce the vignette on my data and I was surprised to see that all my adjusted p-values were all at 1. It seems that the vignette is also showing adjusted p-values at 1. I was wondering why this is, and how can this be resolved? The data has been tested already and we know that there are outliers in certain samples.

I am running the script on R 4.4.0 and the installed FRASER version is 2.0.0.

Thank you very much in advance!
Camille

ERROR: Paired-end reads were detected in single-end read library

Dear sir,
I tried to use fds<- countRNAData(fds) to count my bam file.
There are something wrong when running Subread:
ERROR: Paired-end reads were detected in single-end read library : /NGS_Storage/Debbie/Tumor/A3060/arriba/Aligned.sortedByCoord.out.bam
No counts were generated.
Error: BiocParallel errors
1 remote errors, element index: 1
0 unevaluated and other errors
first remote error:

Should I add any option? or I should direct use Subread?

Can't run countRNAData

Hello,
First of all, thank you for devoloping this tool. Secondly, after trying to run the countRNAData function, the following error appears: "Error: useNames = NA is defunct. Instead, specify either useNames = TRUE or useNames = FALSE.". After looking up the reason behind this, seems that the version 1.2 of the matrixStats package causes this error. If I could have some help on how should I proceed in order to be able to use countRNAData function with this happening, I would really appreciate it. (I have tried to downgrade the matrixStats version, but haven't been lucky with it either). Thank you in advance.

Best regards,

Miriam

Error in creating dir

Hi, Mr Mertes and your team!
I've read your manual and relative literature and found your package really worth a try of best effort!
However, when running your example data as well as my own data, I met the same error from the very first step as follows:
#######Example data in your manual#######

settings<-FraserDataSet(colData=sampleTable,workingDir=tempdir())
settings
-------------------- Sample data table -----------------

A tibble: 3 x 5

sampleID bamFile group gene pairedEnd

1 sample1 extdata/bam/sample1.bam 1 TIMMDC1 TRUE
2 sample2 extdata/bam/sample2.bam 3 CLPP TRUE
3 sample3 extdata/bam/sample3.bam 2 MCOLN1 TRUE

----------------------- Settings -----------------------
Analysis name: Data Analysis
Analysis is strand specific: no
Working directory: 'C:\Users\ADMINI~1\AppData\Local\Temp\RtmpQ9cxvH'

-------------------- BAM parameters --------------------
Fri Dec 11 09:26:15 2020: The given working directory 'C:\Users\biocbuild\bbs-3.12-bioc\tmpdir\Rtmpw1fLb2/FRASER' does not exists. We will create it.
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'scanBamParam': invalid class “FraserDataSet” object: Make sure we can write to the given working directory ' C:\Users\biocbuild\bbs-3.12-bioc\tmpdir\Rtmpw1fLb2/FRASER '.
In addition: Warning message:
In dir.create(object@workingDir, recursive = TRUE) :
cannot create dir 'C:\Users\biocbuild', reason 'Permission denied'

When working on my own data, I set my own working directory at the first step, and other script is almost the same. Nonetheless, the same error still exists:
#######My own data#######
----------------------- Settings -----------------------
Analysis name: Data Analysis
Analysis is strand specific: no
Working directory: 'Y:/HCM-newly-sequenced/200820-WGX'
-------------------- BAM parameters --------------------
Fri Dec 11 09:51:35 2020: The given working directory 'C:\Users\biocbuild\bbs-3.12-bioc\tmpdir\Rtmpw1fLb2/FRASER' does not exists. We will create it.
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'scanBamParam': invalid class “FraserDataSet” object: Make sure we can write to the given working directory ' C:\Users\biocbuild\bbs-3.12-bioc\tmpdir\Rtmpw1fLb2/FRASER '.
In addition: Warning message:
In dir.create(object@workingDir, recursive = TRUE) :
cannot create dir 'C:\Users\biocbuild', reason 'Permission denied'

Is the working directory completely fixed in ' C:\Users\biocbuild\bbs-3.12-bioc\tmpdir\Rtmpw1fLb2/FRASER ' ? How can I alter it into my own path? Otherwise, must I run R as an administrator? Yet this might lead to unpredictable risk?
Looking forward to your reply!
Thanks a lot!

Offline use of countSplitReads

Hello, I am having some problems running your package due to (lack of) internet connectivity.

Our group works in a network which blocks internet connection for security reasons. We can download any necessary resources locally, but when it comes to execution we have no connection available.

The issue comes with the countSplitReads function, which ends up calling get_chrom_info_for_registered_UCSC_genome from genomeInfoDb. This function attempts to download chromInfo.txt.gz, which is not possible in our setting.

An issue was submitted to GenomeInfoDb describing a related problem (Bioconductor/GenomeInfoDb#26), and a solution was implemented to the package (quote reply). Would it be possible for countSplitReads to have a parameter passed down through function calls to use this "offline mode"? If so, we could first download chromosome information locally and access that cache in offline execution.

Thank you,
Álvaro.

Question about using the mergeCounts function

Hello,
I have two sets of data that come from different studies. I want to run the countRNAData on the first set of data first and save the output (maybe a rds?). When I get my second set of data, is it possible to run the countRNAData on my second set, load the output of the first set, then use the mergecount function to merge everything together?

Thank you very much!

improve zscores

Correct way to calculate z-scores would be to derive it from the CDF:

cdf = pbetabinom(a, b , q=counts, log.p=TRUE)
zscore = qnorm(cdf, log.p=TRUE)

Paired-end BAM files recognized as single end

Hi everyone,

I am having an issue when trying to create my fds table. Below is my Rscript:

library(FRASER)


sampleTable <- fread(system.file(
"scripts_objects", "fraser_settings_object.tsv", package="FRASER", mustWork=TRUE))

bamFiles <- sampleTable[ ,bamFile]
sampleTable[,bamFile:=bamFiles]

settings <- FraserDataSet(colData=sampleTable,
workingDir=tempdir())

if(.Platform$OS.type == "unix") {
register(MulticoreParam(workers=multicoreWorkers()))
} else {
register(SnowParam(workers=multicoreWorkers()))
}

fds <- countRNAData(settings)

write.table(fds, "fds.txt")

I get the following error:

ERROR: Paired-end reads were detected in single-end read library : /HOME/BAM_files/STAR_1010Aligned.sortedByCoord.out.bam
No counts were generated.
Error: BiocParallel errors
  element index: 1
  first error: 
In addition: Warning message:
stop worker failed:
  attempt to select less than one element in OneIndex 
Execution halted

I have run the code a couple times with different iterations of MulticoreParam() and everytime a different BAM file is flagged as being "paired-end reads detected in single-end read library". I checked my BAMs and they are definitely paired-end, so I am not sure why the BAMs are being recognized as single-ended. Is there additional syntax I have to add to specify the BAMs are paired-end? I couldn't find anything in the vignette.

That seems to be my main issue-- however if there are any suggestions on how to solve the "stop worker failed" issue, that would be appreciated as well! I believe it has to do with my memory not being comparable for all the BAM files I have, but I am not sure.

thank you very much,

Alison

Fix README after moving to gagneurlab

Ensure that all the badges work after moving it to Gagneurlab.

crash if run on large number of samples

We need to run around 400 samples (each is RNA-seq data from one of the patients) but the FRASER crashes. (It runs ok with lower number of samples - <25).
machine specifications:
CPUs: 8
memory: 32849380
storage: /home total: 11T used: 2,7T free: 8,3T
Programm is killed when running function countRNAData().
with the following error: Error in sendMaster(try(eval(expr, env), silent = TRUE), FALSE) :
ignoring SIGPIPE signal
Calls: countRNAData ... bploop.lapply -> .send_to -> .send_to -> -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE), FALSE) :
ignoring SIGPIPE signal
Calls: countRNAData ... bploop.lapply -> .send_to -> .send_to -> -> sendMaster

The parallelization should be turned on by the following code:
if(.Platform$OS.type == "unix") { register(MulticoreParam(workers=min(10, multicoreWorkers())))
} else { register(SnowParam(workers=min(10, multicoreWorkers()))) }

FRASER Resources and Run Time

Dear C. Mertes,

At the moment I'm running FRASER on 61 untreated PBMC samples and 60 treated PBMC samples. However we are running into some issues with HPC infrastructure. The HPC system has limited the compute time for all analysis to 70h, based on the current output of the FRASER model fit step, I'm afraid these 70h will be insufficient time to finish:

Wed Feb  1 12:08:32 2023: Fit step for: 'psi5'.
Wed Feb  1 12:08:34 2023: Running fit with correction method: PCA
Wed Feb  1 12:08:36 2023: Computing PCA ...
Wed Feb  1 12:10:44 2023: Fitting rho ...
Thu Feb  2 06:22:23 2023: rho fit:
             Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
        0.0000000 0.0000001 0.0000001 0.0039224 0.0000001 0.6091761
Thu Feb  2 06:27:56 2023: Compute p values for: 'psi5'.
Fri Feb  3 00:16:14 2023: Adjust p values for: 'psi5'.
Fri Feb  3 00:17:03 2023: Compute Z scores for: 'psi5'.

Fri Feb  3 00:21:16 2023: Fit step for: 'psi3'.
Fri Feb  3 00:21:28 2023: Running fit with correction method: PCA
Fri Feb  3 00:21:30 2023: Computing PCA ...
Fri Feb  3 00:23:18 2023: Fitting rho ...

Furthermore I previously provided FRASER with 90GB RAM for the analysis, which was unfortunately insufficient and I restarted the analysis with 200GB of RAM. Is this normal behaviour for FRASER with these sample amounts? Or can we speed up the model fitting process? Or can I fit each parameter separately in different jobsubmissions?

The script I currently use is the following:

############ NVQ_Run093-NVQ_Run642 Untreated Samples FRASER Analysis: Fit Model and Call Outliers ############

#### Load Required Libraries ####
library(FRASER)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)

#### Define Global Parameters ####
WORK_DIR <- '/data/gent/shared/001/gvo00101/RNA-Seq_in_Diagnostics/Aberrant_Splicing/FRASER/work/'
OUT_DIR <- '/data/gent/shared/001/gvo00101/RNA-Seq_in_Diagnostics/Aberrant_Splicing/FRASER/output/'
register(MulticoreParam(workers = 10))

#### Read in FRASER Object ####
fds <- readRDS(paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Filtered_Optim_psi5_psi3_and_theta.rds', sep = ''))

#### Detection of Aberrant Splicing Events ####
# Fit the Splicing Model
fds <- FRASER(fds, q=c(psi5=5, psi3=5, theta=5))

# Annotate Introns
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
orgDb <- org.Hs.eg.db

fds <- annotateRangesWithTxDb(fds, txdb=txdb, orgDb=orgDb)

# Call Splicing Outliers
res <- results(fds, zScoreCutoff=NA, padjCutoff = 0.999999, deltaPsiCutoff=0.0000001)

#### Save Objects ####
# Processed FRASER Object
FDS_File <- paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Processed.rds', sep = '')

saveRDS(fds, FDS_File)

# Results Object
RES_File <- paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Results.rds', sep = '')

saveRDS(res, RES_File)

#### Generate Results File for Each Sample ####
Samples <- unique(res$sampleID)

for (s in Samples){
  Results_Samples <- res[res$sampleID == s]
  
  write.xlsx(Results_Samples, paste(OUT_DIR, 'Untreated/','NVQ_Run093-NVQ_Run642_FRASER_Untreated_Results_deltaPsi_0_padj_1_', s, '.xlsx', sep = ''))
}

Kind regards,
Laurenz De Cock

Remove group and condition from the code/vignette as it is not used within FRASER

As we do not currently use the group or condition column in colData we need to remove it from the package to not confuse users as in #26.

Could not find read type: jaccard

Hi,

I tried to run fds <- calculatePSIValues(fds, types="jaccard", BPPARAM=bpparam()) but keep getting the error:
Error in FUN(X[[i]], ...) : Could not find read type: jaccard Calls: calculatePSIValues -> unique -> vapply -> FUN

fds is created from:

    library(FRASER)
    
    annotation_dat <- data.table(sampleID="{sample_id}", bamFile="{sample_bam_path}", pairedEnd=TRUE)
    settings <- FraserDataSet(colData=annotation_dat, workingDir=".")
    
    fds <- countRNAData(settings)

I wonder if you could help look into this. Thanks!

Go back to raw data after using FRASER::results()

Hello,

I would like to see the raw data that was used to generate p-values and z-scores in the results() table. For example, fds_filtered@assays@data[["psi5"]], and see the values by sample.

However, I don't seem to be able to get back to it using the concatenation of seqnames : ranges. Are these coordinates changed somehow when running res() ?

Or is it again a problem of species?

Thanks in advance!

Unable to build vignette

I've tried repeatedly to build the vignette for this package with little success. These are the steps I took along with the output. Any idea why this might be happening?

Alternatively, would it be possible to also include an already built PDF copy of the vignette. The bits I am particularly interested in are the syntax nuances in using FRASER to get split and non-split counts

I'm doing this on Mac OS Sierra

R version 3.6.2
texinfo: stable 6.7 [keg-only]

Steps I took:

git cloned gagneurlab/FRASER repo
Installed devtools
Navigated to FRASER folder
R session involved & devtools::build(pkg='.', vignettes=TRUE)

> devtools::build(pkg='.', vignettes=TRUE)
✔  checking for file ‘/Users/himanshujoshi/PycharmProjects/FRASER/DESCRIPTION’ ...
─  preparing ‘FRASER’:
✔  checking DESCRIPTION meta-information ...
─  cleaning src
─  installing the package to build vignettes
E  creating vignettes (2m 19.9s)
   --- re-building ‘FraseR.Rnw’ using knitr
   Fri Jan 17 15:27:01 2020: Calculate the PSI 5 and 3 values ...
   Fri Jan 17 15:27:03 2020: Calculate the PSI site values ...
   Fri Jan 17 15:27:04 2020: Calculate the delta for psi5 values ...
   Fri Jan 17 15:27:04 2020: Calculate the delta for psi3 values ...
   Fri Jan 17 15:27:04 2020: Calculate the delta for psiSite values ...
   Fri Jan 17 15:27:05 2020: Writing final FraseR object ('/var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/savedObjects/Example_Dataset/fds-object.RDS').

   Fri Jan 17 15:27:05 2020: Fit step for: 'psi5'.
   Fri Jan 17 15:27:06 2020: Running fit with correction method: PCA
   Fri Jan 17 15:27:06 2020: Computing PCA ...
   Fri Jan 17 15:27:06 2020: Fitting rho ...
   Fri Jan 17 15:27:07 2020: Compute p values for: 'psi5'.
   Fri Jan 17 15:27:08 2020: Adjust p values for: 'psi5'.
   Fri Jan 17 15:27:08 2020: Compute Z scores for: 'psi5'.

   Fri Jan 17 15:27:08 2020: Fit step for: 'psi3'.
   Fri Jan 17 15:27:09 2020: Running fit with correction method: PCA
   Fri Jan 17 15:27:09 2020: Computing PCA ...
   Fri Jan 17 15:27:09 2020: Fitting rho ...
   Fri Jan 17 15:27:10 2020: Compute p values for: 'psi3'.
   Fri Jan 17 15:27:11 2020: Adjust p values for: 'psi3'.
   Fri Jan 17 15:27:11 2020: Compute Z scores for: 'psi3'.

   Fri Jan 17 15:27:12 2020: Fit step for: 'psiSite'.
   Fri Jan 17 15:27:12 2020: Running fit with correction method: PCA
   Fri Jan 17 15:27:12 2020: Computing PCA ...
   Fri Jan 17 15:27:12 2020: Fitting rho ...
   Fri Jan 17 15:27:13 2020: Compute p values for: 'psiSite'.
   Fri Jan 17 15:27:14 2020: Adjust p values for: 'psiSite'.
   Fri Jan 17 15:27:15 2020: Compute Z scores for: 'psiSite'.
   Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
   Loading required package: GenomicFeatures
   Loading required package: AnnotationDbi
   Loading required package: org.Hs.eg.db

   'select()' returned 1:1 mapping between keys and columns
   'select()' returned 1:1 mapping between keys and columns
   Fri Jan 17 15:27:18 2020: Collecting results for: psi5
   Fri Jan 17 15:27:18 2020: Process chunk: 1 for: psi5
   Fri Jan 17 15:27:18 2020: Collecting results for: psi3
   Fri Jan 17 15:27:18 2020: Process chunk: 1 for: psi3
   Fri Jan 17 15:27:18 2020: Collecting results for: psiSite
   Fri Jan 17 15:27:18 2020: Process chunk: 1 for: psiSite
   Warning in has_utility("convert", "ImageMagick") :
     ImageMagick not installed or not in PATH
   Fri Jan 17 15:27:22 2020: Start counting the split reads ...
   Fri Jan 17 15:27:22 2020: Count split reads for sample: sample1
   Warning in dir.create(cachedir, recursive = TRUE) :
     '/var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/cache/splitCounts' already exists
   Fri Jan 17 15:27:22 2020: Count split reads for sample: sample2
   Fri Jan 17 15:27:22 2020: Count split reads for sample: sample3
   Warning in dir.create(cachedir, recursive = TRUE) :
     '/var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/cache/splitCounts' already exists
   Fri Jan 17 15:27:33 2020 : count ranges need to be merged ...
   Fri Jan 17 15:27:35 2020: Create splice site indices ...
   Fri Jan 17 15:27:35 2020: Writing counts to file: /var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/savedObjects/Data_Analysis/splitCounts.tsv.gz
   Fri Jan 17 15:27:35 2020: Create splice site indices ...
   Fri Jan 17 15:27:35 2020: Start counting the non spliced reads ...
   Fri Jan 17 15:27:35 2020: In total 60 splice junctions are found.
   Fri Jan 17 15:27:35 2020: In total 77 splice sites (acceptor/donor) will be counted ...
   Fri Jan 17 15:27:35 2020: Count non spliced reads for sample: sample1
   Warning in dir.create(cachedir, recursive = TRUE) :
     '/var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/cache/nonSplicedCounts/Data_Analysis' already exists

           ==========     _____ _    _ ____  _____  ______          _____
           =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
             =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
               ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
                 ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
           ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
          Rsubread 2.0.0

   //========================== featureCounts setting ===========================\\
   ||                                                                            ||
   ||             Input files : 1 BAM file                                       ||
   ||                           o sample1.bam                                    ||
   ||                                                                            ||
   ||              Annotation : R data.frame                                     ||
   ||      Dir for temp files : /var/folders/gw/rccgk_r53w316ntmdmtv1hrw0001 ... ||
   ||                 Threads : 3                                                ||
   ||                   Level : meta-feature level                               ||
   ||              Paired-end : no                                               ||
   ||      Multimapping reads : counted                                          ||
   || Multi-overlapping reads : counted                                          ||
   ||   Min overlapping bases : 10                                               ||
   ||          Long read mode : yes                                              ||
   ||                                                                            ||
   \\============================================================================//

   //================================= Running ==================================\\
   ||                                                                            ||
   || Load annotation file .Rsubread_UserProvidedAnnotation_pid92802 ...         ||
   ||    Features : 77                                                           ||
   ||    Meta-features : 77                                                      ||
   ||    Chromosomes/contigs : 2                                                 ||
   ||                                                                            ||
   || Process BAM file sample1.bam...                                            ||
   ||    WARNING: Paired-end reads were found.                                   ||
   ||    Total alignments : 802                                                  ||
   ||    Successfully assigned alignments : 188 (23.4%)                          ||
   ||    Running time : 0.01 minutes                                             ||
   ||                                                                            ||
   || Write the final count table.                                               ||
   || Write the read assignment summary.                                         ||
   ||                                                                            ||
   \\============================================================================//

   Saving splice site cache ...
   Fri Jan 17 15:27:35 2020: Count non spliced reads for sample: sample3

           ==========     _____ _    _ ____  _____  ______          _____
           =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
             =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
               ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
                 ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
           ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
          Rsubread 2.0.0

   //========================== featureCounts setting ===========================\\
   ||                                                                            ||
   ||             Input files : 1 BAM file                                       ||
   ||                           o sample3.bam                                    ||
   ||                                                                            ||
   ||              Annotation : R data.frame                                     ||
   ||      Dir for temp files : /var/folders/gw/rccgk_r53w316ntmdmtv1hrw0001 ... ||
   ||                 Threads : 3                                                ||
   ||                   Level : meta-feature level                               ||
   ||              Paired-end : no                                               ||
   ||      Multimapping reads : counted                                          ||
   || Multi-overlapping reads : counted                                          ||
   ||   Min overlapping bases : 10                                               ||
   ||          Long read mode : yes                                              ||
   ||                                                                            ||
   \\============================================================================//

   //================================= Running ==================================\\
   ||                                                                            ||
   || Load annotation file .Rsubread_UserProvidedAnnotation_pid92804 ...         ||
   ||    Features : 77                                                           ||
   ||    Meta-features : 77                                                      ||
   ||    Chromosomes/contigs : 2                                                 ||
   ||                                                                            ||
   || Process BAM file sample3.bam...                                            ||
   ||    WARNING: Paired-end reads were found.                                   ||
   ||    Total alignments : 3498                                                 ||
   ||    Successfully assigned alignments : 866 (24.8%)                          ||
   ||    Running time : 0.04 minutes                                             ||
   ||                                                                            ||
   || Write the final count table.                                               ||
   || Write the read assignment summary.                                         ||
   ||                                                                            ||
   \\============================================================================//

   Saving splice site cache ...
   Fri Jan 17 15:27:35 2020: Count non spliced reads for sample: sample2
   Warning in dir.create(cachedir, recursive = TRUE) :
     '/var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/cache/nonSplicedCounts/Data_Analysis' already exists

           ==========     _____ _    _ ____  _____  ______          _____
           =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
             =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
               ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
                 ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
           ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
          Rsubread 2.0.0

   //========================== featureCounts setting ===========================\\
   ||                                                                            ||
   ||             Input files : 1 BAM file                                       ||
   ||                           o sample2.bam                                    ||
   ||                                                                            ||
   ||              Annotation : R data.frame                                     ||
   ||      Dir for temp files : /var/folders/gw/rccgk_r53w316ntmdmtv1hrw0001 ... ||
   ||                 Threads : 3                                                ||
   ||                   Level : meta-feature level                               ||
   ||              Paired-end : no                                               ||
   ||      Multimapping reads : counted                                          ||
   || Multi-overlapping reads : counted                                          ||
   ||   Min overlapping bases : 10                                               ||
   ||          Long read mode : yes                                              ||
   ||                                                                            ||
   \\============================================================================//

   //================================= Running ==================================\\
   ||                                                                            ||
   || Load annotation file .Rsubread_UserProvidedAnnotation_pid92803 ...         ||
   ||    Features : 77                                                           ||
   ||    Meta-features : 77                                                      ||
   ||    Chromosomes/contigs : 2                                                 ||
   ||                                                                            ||
   || Process BAM file sample2.bam...                                            ||
   ||    WARNING: Paired-end reads were found.                                   ||
   ||    Total alignments : 4397                                                 ||
   ||    Successfully assigned alignments : 1011 (23.0%)                         ||
   ||    Running time : 0.05 minutes                                             ||
   ||                                                                            ||
   || Write the final count table.                                               ||
   || Write the read assignment summary.                                         ||
   ||                                                                            ||
   \\============================================================================//

   Saving splice site cache ...
   Fri Jan 17 15:27:38 2020 : Fast merging of counts ...
   Fri Jan 17 15:27:39 2020: Writing counts to file: /var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/savedObjects/Data_Analysis/nonSplitCounts.tsv.gz
   Fri Jan 17 15:27:41 2020: Calculate the PSI 5 and 3 values ...
   Fri Jan 17 15:27:42 2020: Calculate the PSI site values ...
   Fri Jan 17 15:27:42 2020: Calculate the delta for psi5 values ...
   Fri Jan 17 15:27:43 2020: Calculate the delta for psi3 values ...
   Fri Jan 17 15:27:43 2020: Calculate the delta for psiSite values ...
Fri Jan 17 15:27:44 2020: Writing final FraseR object ('/var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/sav   Fri Jan 17 15:27:44 2020: Writing final FraseR object ('/var/folders/gw/rccgk_r53w316ntmdmtv1hrw000157/T//RtmpaEz1GE/savedObjects/Data_Analysis/fds-object.RDS').
   Warning: Transformation introduced infinite values in continuous y-axis
   Warning: Removed 165 rows containing missing values (geom_bar).

   Fri Jan 17 15:27:46 2020: Fit step for: 'psi5'.
   Fri Jan 17 15:27:46 2020: Running fit with correction method: PCA
   Fri Jan 17 15:27:46 2020: Computing PCA ...
   Fri Jan 17 15:27:46 2020: Fitting rho ...
   Fri Jan 17 15:27:47 2020: Compute p values for: 'psi5'.
   Fri Jan 17 15:27:48 2020: Adjust p values for: 'psi5'.
   Fri Jan 17 15:27:48 2020: Compute Z scores for: 'psi5'.

   Fri Jan 17 15:27:48 2020: Fit step for: 'psi3'.
   Fri Jan 17 15:27:49 2020: Running fit with correction method: PCA
   Fri Jan 17 15:27:49 2020: Computing PCA ...
   Fri Jan 17 15:27:49 2020: Fitting rho ...
   Fri Jan 17 15:27:50 2020: Compute p values for: 'psi3'.
   Fri Jan 17 15:27:51 2020: Adjust p values for: 'psi3'.
   Fri Jan 17 15:27:51 2020: Compute Z scores for: 'psi3'.

   Fri Jan 17 15:27:52 2020: Fit step for: 'psiSite'.
   Fri Jan 17 15:27:52 2020: Running fit with correction method: PCA
   Fri Jan 17 15:27:52 2020: Computing PCA ...
   Fri Jan 17 15:27:52 2020: Fitting rho ...
   Fri Jan 17 15:27:53 2020: Compute p values for: 'psiSite'.
   Fri Jan 17 15:27:54 2020: Adjust p values for: 'psiSite'.
   Fri Jan 17 15:27:55 2020: Compute Z scores for: 'psiSite'.
   'select()' returned 1:1 mapping between keys and columns
   'select()' returned 1:1 mapping between keys and columns
   Fri Jan 17 15:27:58 2020: Collecting results for: psi5
   Fri Jan 17 15:27:58 2020: Process chunk: 1 for: psi5
   Fri Jan 17 15:27:58 2020: Collecting results for: psiSite
   Fri Jan 17 15:27:58 2020: Process chunk: 1 for: psiSite
   Fri Jan 17 15:27:58 2020: Collecting results for: psi3
   Fri Jan 17 15:27:58 2020: Process chunk: 1 for: psi3
   Fri Jan 17 15:27:59 2020: Collecting results for: psi3
   Fri Jan 17 15:28:00 2020: Process chunk: 1 for: psi3
   Fri Jan 17 15:27:59 2020: Collecting results for: psiSite
   Fri Jan 17 15:28:00 2020: Process chunk: 1 for: psiSite
   Fri Jan 17 15:27:59 2020: Collecting results for: psi5
   Fri Jan 17 15:28:00 2020: Process chunk: 1 for: psi5
   Fri Jan 17 15:28:03 2020: Running fit with correction method: PCA-BB-Decoder
   125
   67
   dPsi filter:FALSE: 121	TRUE: 2
   Exclusion matrix: TRUE: 123
   Fri Jan 17 15:28:12 2020: Injecting outliers: 0 / 0 (primary/secondary)
   Warning in optimHyperParams(fds, type = "psi5") :
     No outliers could be injected so the hyperparameter optimization could not run. Possible reason: too few junctions in the data.
   Warning in plotEncDimSearch(fds, type = "psi5") :
     no hyperparameters were estimated for psi5
   Please use `optimHyperParams` to compute them.
   Warning in switch(x, psi5 = c(bquote(psi[5])), psi3 = c(bquote(psi[3])),  :
     EXPR is a "factor", treated as integer.
    Consider using 'switch(as.character( * ), ...)' instead.
   Warning in switch(x, psi5 = c(bquote(psi[5])), psi3 = c(bquote(psi[3])),  :
     EXPR is a "factor", treated as integer.
    Consider using 'switch(as.character( * ), ...)' instead.
   Warning in switch(x, psi5 = c(bquote(psi[5])), psi3 = c(bquote(psi[3])),  :
     EXPR is a "factor", treated as integer.
    Consider using 'switch(as.character( * ), ...)' instead.
   Error: processing vignette 'FraseR.Rnw' failed with diagnostics:
   Running 'texi2dvi' on 'FraseR.tex' failed.
   --- failed re-building ‘FraseR.Rnw’

   SUMMARY: processing the following file failed:
     ‘FraseR.Rnw’

   Error: Vignette re-building failed.
   Execution halted
Error in (function (command = NULL, args = character(), error_on_status = TRUE,  :
  System command error

FRASER maxing out memory on sge cluster

Hello!

We are attempting to use FRASER on a cohort of 400 samples. We've been experiencing issues completing FRASER when sending the job to our sge queue, even when providing 1.5TB of memory. It seems the parallelization of the PSI calculation (fds <- calculatePSIValues(fds,BPPARAM=BPPARAM)) is causing the job to go over our h_vmem allocation. We've attempted to force FRASER to run in serial (BPPARAM=SerialParam()), but encounter the same issue maxing out of the memory.

Is it possible that FRASER is ignoring the serial setting?

Is there a quick fix for this? Or does a solution similar to the link below have to be implemented:
gagneurlab/OUTRIDER#11

Thank you for the tool, we've really enjoyed running it on some previous cohorts.

Error: checkIntergenic on unlocalized and unplaced sequences

When using comprehensive gene annotations, unlocalized and unplaced sequences result in an error when attempting to annotate potential impact of splice junctions with annotatePotentialImpact.

Debugging

The error arises since no genes in the txdb are located on unlocalized chr*_random and unplaced chrUn_* sequences.
This causes min(distance(test_junction, refseq.genes), na.rm = TRUE) to evaluate to Inf.
https://github.com/c-mertes/FRASER/blob/e775832aef77c14aa3c414832735bd1b0b9a4e47/R/resultAnnotations.R#L817
distance(test_junction, refseq.genes) in turn evaluates to a vector of exclusively NA's.
min(c(NA, NA, ...), na.rm=TRUE) is equivalent to min(numeric(0)), which evaluates to Inf.
As Inf is greater than 0, the conditional dist > 0 evaluates to TRUE, whereas the code block under the if-statement is erroneously executed.
https://github.com/c-mertes/FRASER/blob/e775832aef77c14aa3c414832735bd1b0b9a4e47/R/resultAnnotations.R#L818-L830

Fix

This error can be resolved by modifying the condition of the if-statement from dist > 0 to is.finite(dist) && dist > 0.
https://github.com/c-mertes/FRASER/blob/e775832aef77c14aa3c414832735bd1b0b9a4e47/R/resultAnnotations.R#L818

Error

1. annotatePotentialImpact(result = res_junc_dt, txdb = txdb, fds = fds)
6. addPotentialImpactLabels(annoResult, fds, txdb)
8. sapply(noLaps, function(i) {
 .     overlap <- to(findOverlaps(junctions_gr[i], exons))
 .     if (length(overlap) == 0) {
 .         return(checkIntergenic(junctions_gr, i, refseq.genes))
 .     }
 .     for (j in overlap) {
 .         exon_start = start(exons[j])
 .         exon_end = end(exons[j])
 .         if (exon_start <= start(junctions_gr[i]) & exon_end >= 
 .             end(junctions_gr[i])) {
 .             if ((end(junctions_gr[i]) - start(junctions_gr[i]) + 
 .                 1)%%3 != 0) {
 .                 frs = "likely"
 .             }
 .             else {
 .                 frs = "unlikely"
 .             }
 .             return(c("exonTruncation", frs))
 .         }
 .     }
 .     return(c("complex", "inconclusive"))
 . })
9. lapply(X = X, FUN = FUN, ...)
10. FUN(X[[i]], ...)
11. checkIntergenic(junctions_gr, i, refseq.genes)
12. start(refseq.genes[nearest(junctions_gr[i], refseq.genes)])
13. refseq.genes[nearest(junctions_gr[i], refseq.genes)]
14. refseq.genes[nearest(junctions_gr[i], refseq.genes)]
15. subset_along_ROWS(x, i, drop = drop)
16. extractROWS(x, i)
17. extractROWS(x, i)
18. normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE)
19. NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, 
  .     allow.NAs = allow.NAs)
20. NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, 
  .     allow.NAs = allow.NAs)
21. .subscript_error("subscript contains NAs")
22. stop(wmsg(...), call. = FALSE)
23. .handleSimpleError(function (cond) 
  . .Internal(C_tryCatchHelper(addr, 1L, cond)), "subscript contains NAs", 
  .     base::quote(NULL))
24. h(simpleError(msg, call))

c-mertes / fraser Goto Github PK

fraser's Introduction

FRASER - Find RAre Splicing Events in RNA-seq

What's new

Installation

Toubleshooting

Error in UseMethod("filter_")

Missing libraries while compiling R packages

fraser's People

Contributors

Stargazers

Watchers

Forkers

fraser's Issues

A tibble: 3 x 5

Debugging

Fix

Error

Recommend Projects

Recommend Topics

Recommend Org