Git Product home page Git Product logo

shortread's People

Contributors

dtenenba avatar henrikbengtsson avatar hpages avatar jmacdon avatar jwokaty avatar kasperdanielhansen avatar kayla-morrell avatar mtmorgan avatar nturaga avatar rohit-satyam avatar sonali-bioc avatar vobencha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

shortread's Issues

Error in example: invalid 'nrow' value (too large or NA)

This is Bioconductor ShortRead 1.56.1:

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: powerpc-apple-darwin10.8.0 (32-bit)

. . .

> ### Name: Utilites
> ### Title: Utilities for common, simple operations
> ### Aliases: polyn
> ### Keywords: manip
> 
> ### ** Examples
> 
> polyn(c("A", "N"), 35)
                                    A                                     N 
"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN" 
> 
> 
> 
> cleanEx()
> nameEx("qa")
> ### * qa
> 
> flush(stderr()); flush(stdout())
> 
> ### Name: qa
> ### Title: Perform quality assessment on short reads
> ### Aliases: qa qa,character-method qa,list-method
> ### Keywords: manip
> 
> ### ** Examples
> 
> dirPath <- system.file(package="ShortRead", "extdata", "E-MTAB-1147")
> ## sample 1M reads / file
> qa <- qa(dirPath, "fastq.gz", BPPARAM=SerialParam())
> if (interactive())
+     browseURL(report(qa))
> 
> showMethods("qa", where=getNamespace("ShortRead"))
Function: qa (package ShortRead)
dirPath="ShortReadQ"
dirPath="SolexaPath"
dirPath="character"
dirPath="list"

> 
> 
> 
> cleanEx()
> nameEx("qa2")
> ### * qa2
> 
> flush(stderr()); flush(stdout())
> 
> ### Name: qa2
> ### Title: (Updated) quality assessment reports on short reads
> ### Aliases: QAFastqSource QACollate QA QAFlagged QAFiltered
> ###   QAAdapterContamination QAData QAFrequentSequence QANucleotideByCycle
> ###   QANucleotideUse QAQualityByCycle QAQualityUse QAReadQuality
> ###   QASequenceUse QACollate,QAFastqSource-method QACollate,missing-method
> ###   qa2 qa2,FastqSampler-method qa2,QAAdapterContamination-method
> ###   qa2,QACollate-method qa2,QAFastqSource-method
> ###   qa2,QAFrequentSequence-method qa2,QANucleotideByCycle-method
> ###   qa2,QANucleotideUse-method qa2,QAQualityByCycle-method
> ###   qa2,QAQualityUse-method qa2,QAReadQuality-method
> ###   qa2,QASequenceUse-method flag flag,.QA2-method
> ###   flag,QAFrequentSequence-method flag,QAReadQuality-method
> ###   flag,QASource-method report,QA-method
> ###   report,QAAdapterContamination-method report,QAFiltered-method
> ###   report,QAFlagged-method report,QAFrequentSequence-method
> ###   report,QANucleotideByCycle-method report,QANucleotideUse-method
> ###   report,QAQualityByCycle-method report,QAQualityUse-method
> ###   report,QAReadQuality-method report,QASequenceUse-method
> ###   report,QASource-method rbind,QASummary-method
> ###   show,QAAdapterContamination-method show,QACollate-method
> ###   show,QAFastqSource-method show,QAFrequentSequence-method
> ###   show,QAReadQuality-method show,QASummary-method
> ### Keywords: manip
> 
> ### ** Examples
> 
> dirPath <- system.file(package="ShortRead", "extdata", "E-MTAB-1147")
> fls <- dir(dirPath, "fastq.gz", full=TRUE)
> 
> coll <- QACollate(QAFastqSource(fls), QAReadQuality(),
+     QAAdapterContamination(), QANucleotideUse(),
+     QAQualityUse(), QASequenceUse(),
+     QAFrequentSequence(n=10), QANucleotideByCycle(),
+     QAQualityByCycle())
> x <- qa2(coll, BPPARAM=SerialParam(), verbose=TRUE)
qa2,QACollate-method
qa2,QACollate1-method
qa2,QAFastqSource-method
qa2,FastqSampler-method
qa2,QAReadQuality-method
qa2,QAAdapterContamination-method
qa2,QANucleotideUse-method
qa2,QAQualityUse-method
qa2,QASequenceUse-method
qa2,QAFrequentSequence-method
qa2,QANucleotideByCycle-method
qa2,QAQualityByCycle-method
qa2,QACollate1-method
qa2,QAFastqSource-method
qa2,FastqSampler-method
qa2,QAReadQuality-method
qa2,QAAdapterContamination-method
qa2,QANucleotideUse-method
qa2,QAQualityUse-method
qa2,QASequenceUse-method
qa2,QAFrequentSequence-method
qa2,QANucleotideByCycle-method
qa2,QAQualityByCycle-method
flag,QASource-method
flag,QAReadQuality-method
flag,ANY-method
flag,ANY-method
flag,ANY-method
flag,ANY-method
flag,ANY-method
flag,ANY-method
> 
> res <- report(x)
Error in matrix(0, rows.per.page, cols.per.page) : 
  invalid 'nrow' value (too large or NA)
Calls: report ... .html_img -> print -> print.trellis -> printFunction -> matrix
Execution halted

Tests pass fine, only one example fails.
ShortRead_Ex_out.txt

FastqSampler/FastqStreamer: Seeking clarification for OpenMP threads issues

In ?FastqStreamer, there's:

\note{
\code{FastqSampler} and \code{FastqStreamer} use OpenMP threads (when
available) during creation of the return value. This may sometimes
create problems when a process is already running on multiple threads,
e.g., with an error message like \preformatted{
libgomp: Thread creation failed: Resource temporarily unavailable
} A solution is to precede problematic code with the following code
snippet, to disable threading \preformatted{
nthreads <- .Call(ShortRead:::.set_omp_threads, 1L)
on.exit(.Call(ShortRead:::.set_omp_threads, nthreads))
}
}

I'm trying to figure out when the error happens. What does "running on multiple threads" in "... create problems when a process is already running on multiple threads ..." exactly mean here? Is the term "thread" here used in the classical way, where a process can have multiple threads? Could it be that this happens in forked parallel worker, e.g. parallel::mclapply() and BiocParallel::MulticoreParam()? That is, is it that this OpenMP implementation, which in itself is multi-threaded, is not fork safe? But it works just fine if one parallelize in separate processes, e.g. a PSOCK cluster.

Was a reproducible example identified? If so, that could be a great example of illustrating issues with OpenMP and R parallelization. I'm looking for such examples.

Contribution of Sweave2Rmd

Hi,
I'm volunteering with Sweave2Rmd
and we're trying to replace all Bioconductor Sweave vignettes R Markdown
vignettes for the second contribution in Outreachy.
Thanks for your time.

Cant read fasta file with long line.

Hello,

while trying to import sequences from a .fasta file I encounter the following error :
cannot read line 14348, line is too long (and indeed it is long: 25157 character)

In encounter the same error while trying to use Biostrings packages.
Do you have any suggestions on how I could resolve this issue?

The error occurs in the getSequences function from dada2 that is using ShortRead to read .fasta for taxonomic assignment.

Here is my sessioninfo:
image

Let me know if you need anything more to address this issue.

Thanks for your work and have a nice day!

ShortRead can segfault

Issue

ShortRead may segfault. See below for a minimal reproducible example. This may for instance happen when someone try to parallelize using some of the ShortRead helper objects, which then often result in a "silent" crash with little information to go by.

Wish

I understand why the following is not meant to work, but I'd argue an R package should never be able to core dump R. Instead, it should detect the problem and produce an informative run-time error.

Reproducible example

library(ShortRead)
sp <- SolexaPath(system.file("extdata", package="ShortRead"))
fl <- file.path(analysisPath(sp), "s_1_sequence.txt")
f <- FastqStreamer(fl, 50)
print(f)
close(f)
library(ShortRead)
f2 <- readRDS("f.rds")
print(f2)

results in a segfault:

class: FastqStreamer 
file: closed 

 *** caught segfault ***
address 0x8, cause 'memory not mapped'

Traceback:
 1: .Call(.streamer_status, sampler)
 2: .self$status()
 3: object$show()
 4: (new("standardGeneric", .Data = function (object) standardGeneric("show"), generic = "show", package = "methods",     group = list(), valueClass = character(0), signature = "object",     default = new("derivedDefaultMethod", .Data = function (object)     showDefault(object), target = new("signature", .Data = "ANY",         names = "object", package = "methods"), defined = new("signature",         .Data = "ANY", names = "object", package = "methods"),         generic = "show"), skeleton = (new("derivedDefaultMethod",         .Data = function (object)         showDefault(object), target = new("signature", .Data = "ANY",             names = "object", package = "methods"), defined = new("signature",             .Data = "ANY", names = "object", package = "methods"),         generic = "show"))(object)))(new("FastqStreamer", .xData = <environment>))
 5: (new("standardGeneric", .Data = function (object) standardGeneric("show"), generic = "show", package = "methods",     group = list(), valueClass = character(0), signature = "object",     default = new("derivedDefaultMethod", .Data = function (object)     showDefault(object), target = new("signature", .Data = "ANY",         names = "object", package = "methods"), defined = new("signature",         .Data = "ANY", names = "object", package = "methods"),         generic = "show"), skeleton = (new("derivedDefaultMethod",         .Data = function (object)         showDefault(object), target = new("signature", .Data = "ANY",             names = "object", package = "methods"), defined = new("signature",             .Data = "ANY", names = "object", package = "methods"),         generic = "show"))(object)))(new("FastqStreamer", .xData = <environment>))
 6: print.default(f2)
 7: print(f2)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 

Session info

> sessionInfo()
R version 4.1.0 RC (2021-05-10 r80282)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/R-4-1-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-4-1-branch/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] ShortRead_1.49.2            GenomicAlignments_1.27.2   
 [3] SummarizedExperiment_1.21.3 Biobase_2.51.0             
 [5] MatrixGenerics_1.3.1        matrixStats_0.58.0         
 [7] Rsamtools_2.7.2             GenomicRanges_1.43.4       
 [9] Biostrings_2.59.4           GenomeInfoDb_1.27.13       
[11] XVector_0.31.1              IRanges_2.25.11            
[13] S4Vectors_0.29.19           BiocParallel_1.25.5        
[15] BiocGenerics_0.37.6        

loaded via a namespace (and not attached):
 [1] rstudioapi_0.13        zlibbioc_1.37.0        lattice_0.20-44       
 [4] jpeg_0.1-8.1           hwriter_1.3.2          tools_4.1.0           
 [7] grid_4.1.0             png_0.1-7              latticeExtra_0.6-29   
[10] crayon_1.4.1           Matrix_1.3-3           GenomeInfoDbData_1.2.6
[13] RColorBrewer_1.1-2     bitops_1.0-7           RCurl_1.98-1.3        
[16] DelayedArray_0.17.13   compiler_4.1.0 

ShortRead/libs/ShortRead.so: undefined symbol: bioc_gzread

When install ShortRead by BiocManager::install(), and R version is 4.2.1, this package is not successful installed, and detail is as follow:
`installing to /home/xinzhepang/R/x86_64-pc-linux-gnu-library/4.2/00LOCK-ShortRead/00new/ShortRead/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘ShortRead’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/xxx/R/x86_64-pc-linux-gnu-library/4.2/00LOCK-ShortRead/00new/ShortRead/libs/ShortRead.so':
/home/xxx/R/x86_64-pc-linux-gnu-library/4.2/00LOCK-ShortRead/00new/ShortRead/libs/ShortRead.so: undefined symbol: bioc_gzread
Error: loading failed
Execution halted
ERROR: loading failed

  • removing ‘/home/xxx/R/x86_64-pc-linux-gnu-library/4.2/ShortRead’

The downloaded source packages are in
‘/tmp/RtmpteFAiR/downloaded_packages’
Installation paths not writeable, unable to update packages
path: /usr/lib/R/library
packages:
foreign, MASS, mgcv, rpart, spatial
Warning message:
In install.packages(...) :
installation of package ‘ShortRead’ had non-zero exit status`

ShortRead fails test test_readPrb.R for non-Intel architectures

[the below was sent to Bioconductor Package Maintainer [email protected] by @tillea but I didn't see any response, so I thought I'd try here]

Hi,

the Debian packaged ShortRead is tested in CI test on different hardware
architectures. On the debci page[1] you can see the matrix for success
and failure which shows that amd64 and i386 are passing the complete
test suite while other architectures are failing. In those failing cases
the test log says something like

Timing stopped at: 1.012 0.02 1.031
Error in checkIdentical(exp, as(readPrb(sp, ".*prb.txt", as = "FastqEncoding"),  : 
  FALSE 
 
In addition: Warning messages:
1: In .class1(object) :
  closing unused connection 6 (/usr/lib/R/site-library/ShortRead/extdata/Data/C1-36Firecrest/Bustard/GERALD/s_1_sequence.txt)
2: In .class1(object) :
  closing unused connection 5 (/usr/lib/R/site-library/ShortRead/extdata/Data/C1-36Firecrest/Bustard/GERALD/s_1_sequence.txt)
Timing stopped at: 0.028 0.001 0.031
Error in checkIdentical(36L, unique(width(obj))) : FALSE 
 


RUNIT TEST PROTOCOL -- Sun Jan 24 21:54:15 2021 
*********************************************** 
Number of test functions: 104 
Number of errors: 0 
Number of failures: 2 

 
1 Test Suite : 
ShortRead RUnit Tests - 104 test functions, 0 errors, 2 failures
FAILURE in test_readPrb_consistent: Error in checkIdentical(exp, as(readPrb(sp, ".*prb.txt", as = "FastqEncoding"),  : 
  FALSE 
 
FAILURE in test_readPrb_input: Error in checkIdentical(36L, unique(width(obj))) : FALSE 
 

Test files with failing tests

   test_readPrb.R 
     test_readPrb_consistent 
     test_readPrb_input 


Error in BiocGenerics:::testPackage("ShortRead") : 
  unit tests failed for package ShortRead
In addition: Warning message:
In is.factor(x) :
  closing unused connection 3 (/tmp/RtmpEinleR/file1d7b55bbcf1b)
Execution halted

When removing the file test_readPrb.R from the installation the tests
are passing but before I do such nasty things I'd like to clarify the
situation with you.

Kind regards

  Andreas.

[1] https://ci.debian.net/packages/r/r-bioc-shortread/

countFastq can overflow

countFastq can overflow and return a negative number of bases.

It looks like the count_records C function returns results as 32 bit integers. Maybe better to return them as doubles, which can represent up to 52 bit integers accurately.

Example:

n <- 3000000 #Reads
m <- 1000    #Read length
dna <- paste(rep("A",m),collapse="")
qual <- paste(rep("J",m),collapse="")
sink("example.fastq")
for(i in 1:n) 
    cat(paste0("@read",i,"\n",dna,"\n+\n",qual,"\n"))
sink()

library(ShortRead)
countFastq("example.fastq")

#Result:
#              records nucleotides      scores
# example.fastq   3e+06 -1294967296 -1294967296

P.S. This is a very handy function, thank you! Much better than needing a command line tool to get fastq statistics.

Converting from a ShortRead object to a QualityScaledDNAStringSet discards names

If we have a look at this example:

library(ShortRead)
sp <- SolexaPath(system.file('extdata', package='ShortRead'))
fl <- file.path(analysisPath(sp), "s_1_sequence.txt")
f <- FastqSampler(fl, 50)
out <- yield(f)
out
## class: ShortReadQ
## length: 50 reads; width: 36 cycles

This does okay:

as(out, "QualityScaledDNAStringSet")
##   A QualityScaledDNAStringSet instance containing:
## 
## DNAStringSet object of length 50:
##      width seq
##  [1]    36 GGTAAAGGACTTCTTGACGGTACGTTGCATGCTTGG
##  [2]    36 GCAAGCTGCTTATGCTAATTTGCATACTGACCAAGA
##  [3]    36 GACATTATGGGTCTGCAAGCTGCTTATGCTACTTTG
##  [4]    36 GTTACCATGATGTTATTTCTTCATTTGGAGGTAAAA
##  [5]    36 GTATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTT
##  ...   ... ...
## [46]    36 GTTAATGCTGGTAATGGTGGTTTTCTTCTTTTCCTT
## [47]    36 GGGGGAGCACATTGTAGCATTGTGCCAATTCATCCA
## [48]    36 GCTTATCACCTTCTTGAAGGCTTCCCATTCATTCAG
## [49]    36 GGGTGATAAGCAGGAGAAACATACGAAGGCGCATAA
## [50]    36 GTTTTCATGCCTCCCAATCTTGGAGGCTTTTTTATG
## 
## SolexaQuality object of length 50:
##      width seq
##  [1]    36 ]]]]]]]]]]T]]]]]RC]Y]R]]]]WPCTVCQCMA
##  [2]    36 ]]]]]]]]]]]]]]]]]V]]]]]O]]]]MQZXEAOK
##  [3]    36 ]]]]]]]]]]]]]]]]YR]]]]]]]T]VWZSEVSSJ
##  [4]    36 ]]]]]]]]]]]]]]]]]]]]]]T]]]]RJRZTQLOA
##  [5]    36 ]]]]]]]]]]]]]]]]]T]]]]]]]]]]MJUJVLSS
##  ...   ... ...
## [46]    36 ]]]]]]]]]]]Y]]]]]]]]J]]]OR]MCUZEKAOO
## [47]    36 ]]]]]Y]]Y]Y]Y]HTRVVT]MRY]VCEVVZJQKHF
## [48]    36 ]]]]]]]]]]]]]]]]]VYR]]]]]]O][ZSXVSHJ
## [49]    36 ]]]]]]]]]]]Y]]V]YVTYHYJ]VJTVWPZOHJHF
## [50]    36 ]]]]]]]]]]]]]]HT]]]]]]]C]]V][ZZXVASS

But it doesn't pass on the names of each sequence in id(out), which makes it difficult to write back to FASTQ.

Plus it would also be nice for as(out, "DNAStringSet") to work, whereas it currently gives:

## Error in h(simpleError(msg, call)) :
##   error in evaluating the argument 'x' in selecting a method for function 'XStringSet': no method for coercing this S4 class to a vector

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.