Git Product home page Git Product logo

biocsingular's Introduction

biocsingular's People

Contributors

jwokaty avatar ltla avatar nturaga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

mimi3421

biocsingular's Issues

Feature request: Generalize `runExactSVD`, `runIrlbaSVD` ... and other SVD functions, make methods depend on the matrix type

I'm implementing the BPCells backend Bioconductor/DelayedArray#110, the BPCells procide a method to runSVD in cpp, it would be awosome if runSVD can dispatch methods depending on the underlying matrix class.

#' @export
#' @import methods
methods::setClass("SpectraParam",
    contains = "BiocSingularParam",
    slots = c(deferred = "logical", fold = "numeric")
)

#' @export
SpectraParam <- function() {
    methods::new("SpectraParam", deferred = FALSE, fold = Inf)
}

#' @export
#' @importMethodsFrom BiocSingular runSVD
methods::setMethod(
    "runSVD", "SpectraParam",
    function(x, k, nu = k, nv = k, center = FALSE, scale = FALSE, ncv = NULL,
             tol = 1e-10, maxitr = 1000, threads = 0L, ..., BSPARAM) {
        svds(
            x = x, k = k, nu = nu, nv = nv,
            center = center, scale = scale, ncv = ncv,
            tol = tol, maxitr = maxitr,
            threads = threads
        )
    }
)

methods::setGeneric("svds", function(x, ...) standardGeneric("svds"))
methods::setMethod(
    "svds", "ANY",
    function(x, k, nu, nv, center, scale, ncv, tol, maxitr, threads) {
        ncv <- ncv %||% min(min(nrow(x), ncol(x)), max(2 * k + 1, 20))
        out <- RSpectra::svds(
            A = x, k = k, nu = nu, nv = nv,
            opts = list(
                ncv = ncv, tol = tol, maxitr = maxitr,
                center = center, scale = scale
            )
        )
        out[c("d", "u", "v")]
    }
)

methods::setMethod(
    "svds", "IterableMatrix",
    function(x, k, nu, nv, center, scale, ncv, tol, maxitr, threads) {
        ncv <- ncv %||% min(min(nrow(x), ncol(x)), max(2 * k + 1, 20))
        out <- BPCells::svds(
            A = x, k = k, nu = nu, nv = nv,
            opts = list(ncv = ncv, tol = tol, maxitr = maxitr),
            threads = threads
        )
        out[c("d", "u", "v")]
    }
)

runSVD with RandomParam() returns inverted values

I was trying to use runSVD() with RandomParam() on a very large dataset in a HDF5Array. Before that, I did some tests to see how could values change between this and base::svd(), but it turns out everytime I use RandomParam() I get results on the first column of $u and $v with its values inverted, don't know if this is intended.

> library(BiocSingular)
> set.seed(123)
> m <- matrix(sample.int(10, 25, T), 10, 10)
> m
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    3    5    9    5    3    3    5    9    5     3
 [2,]    3    3    3    4    8    3    3    3    4     8
 [3,]   10    9    4    6   10   10    9    4    6    10
 [4,]    2    9    1    9    7    2    9    1    9     7
 [5,]    6    9    7   10   10    6    9    7   10    10
 [6,]    5    3    3    5    9    5    3    3    5     9
 [7,]    4    8    3    3    3    4    8    3    3     3
 [8,]    6   10   10    9    4    6   10   10    9     4
 [9,]    9    7    2    9    1    9    7    2    9     1
[10,]   10   10    6    9    7   10   10    6    9     7
> gSetIdx <- 1:2

> x1 <- svd(m[gSetIdx, ])
> x2 <- runSVD(m[gSetIdx, ], k=2)
> x3 <- runSVD(m[gSetIdx, ], k=2, BSPARAM=RandomParam())

This are the results I get:

> x1
$d
[1] 21.227029  7.836661

$u
           [,1]       [,2]
[1,] -0.7796929 -0.6261621
[2,] -0.6261621  0.7796929

$v
            [,1]         [,2]
 [1,] -0.1986884  0.058774061
 [2,] -0.2721507 -0.101029225
 [3,] -0.4190753 -0.420635796
 [4,] -0.3016490 -0.001536228
 [5,] -0.3461801  0.556239043
 [6,] -0.1986884  0.058774061
 [7,] -0.2721507 -0.101029225
 [8,] -0.4190753 -0.420635796
 [9,] -0.3016490 -0.001536228
[10,] -0.3461801  0.556239043

> x2
$d
[1] 21.227029  7.836661

$u
           [,1]       [,2]
[1,] -0.7796929 -0.6261621
[2,] -0.6261621  0.7796929

$v
            [,1]         [,2]
 [1,] -0.1986884  0.058774061
 [2,] -0.2721507 -0.101029225
 [3,] -0.4190753 -0.420635796
 [4,] -0.3016490 -0.001536228
 [5,] -0.3461801  0.556239043
 [6,] -0.1986884  0.058774061
 [7,] -0.2721507 -0.101029225
 [8,] -0.4190753 -0.420635796
 [9,] -0.3016490 -0.001536228
[10,] -0.3461801  0.556239043

> x3
$d
[1] 21.227029  7.836661

$u
          [,1]       [,2]
[1,] 0.7796929 -0.6261621
[2,] 0.6261621  0.7796929

$v
           [,1]         [,2]
 [1,] 0.1986884  0.058774061
 [2,] 0.2721507 -0.101029225
 [3,] 0.4190753 -0.420635796
 [4,] 0.3016490 -0.001536228
 [5,] 0.3461801  0.556239043
 [6,] 0.1986884  0.058774061
 [7,] 0.2721507 -0.101029225
 [8,] 0.4190753 -0.420635796
 [9,] 0.3016490 -0.001536228
[10,] 0.3461801  0.556239043

The values of x3$u[,1] and x3$v[1,] are inverted.

> sessionInfo()
R Under development (unstable) (2020-10-29 r79387)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /home/bort/R-devel/lib/libRblas.so
LAPACK: /home/bort/R-devel/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocSingular_1.7.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5           rsvd_1.0.3           lattice_0.20-41     
 [4] matrixStats_0.57.0   IRanges_2.25.6       grid_4.1.0          
 [7] stats4_4.1.0         irlba_2.3.3          S4Vectors_0.29.6    
[10] Matrix_1.3-0         BiocParallel_1.25.2  beachmat_2.7.5      
[13] DelayedArray_0.17.7  MatrixGenerics_1.3.0 parallel_4.1.0      
[16] compiler_4.1.0       BiocGenerics_0.37.0

replacing the interface of irlbaSVD with CppIrlba `request`

Thanks for developing the infrastructure (BiocSingular and CppIrlba). They are significant in omics or other data analysis. However,
the runIrlbaSVD uses the irlba, which is slow for big data. Since CppIrlba has implemented the same work with Eigen, whether it will be applied to BiocSinglular.

Installation error

Hello there BiocSingular Team!

I'm trying to install batchelor. However, when I run devtools::install_github("LTLA/batchelor"), I get the following error output:


Downloading GitHub repo LTLA/batchelor@master
Skipping 2 packages not available: BiocNeighbors, BiocSingular
Skipping 11 packages ahead of CRAN: beachmat, BiocGenerics, BiocParallel, DelayedArray, gtable, HDF5Array, IRanges, rhdf5, Rhdf5lib, rlang, S4Vectors
Installing 17 packages: Biobase, BiocSingular, DelayedMatrixStats, edgeR, GenomeInfoDb, GenomeInfoDbData, GenomicRanges, limma, locfit, rjson, scater, shinydashboard, SingleCellExperiment, SummarizedExperiment, tximport, XVector, zlibbioc
Installing packages into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Error: (converted from warning) package ‘BiocSingular’ is not available (for R version 3.5.3)

I've then tried to install biocSingular. However, I get the following error:

In file included from /usr/local/lib/R/site-library/beachmat/include/beachmat/all_readers.h:4:0,
                 from /usr/local/lib/R/site-library/beachmat/include/beachmat/LIN_matrix.h:4,
                 from /usr/local/lib/R/site-library/beachmat/include/beachmat/numeric_matrix.h:4,
                 from compute_scale.cpp:2:
/usr/local/lib/R/site-library/beachmat/include/beachmat/beachmat.h:15:19: fatal error: H5Cpp.h: No such file or directory
 #include "H5Cpp.h"
                   ^
compilation terminated.
/usr/local/lib/R/etc/Makeconf:172: recipe for target 'compute_scale.o' failed
make: *** [compute_scale.o] Error 1
ERROR: compilation failed for package ‘BiocSingular’
* removing ‘/usr/local/lib/R/site-library/BiocSingular’
Error in i.p(...) : 
  (converted from warning) installation of package ‘/tmp/RtmpGJQkfa/file1977a4d39ec/BiocSingular_0.99.14.tar.gz’ had non-zero exit status

Does anyone know a way to solve this? Any help will be really appreciated.

Thanks in advance!

Davi

Installation

When I tried to install this package, I got the following error:

> BiocManager::install("LTLA/BiocSingular")
Bioconductor version 3.8 (BiocManager 1.30.4), R 3.5.1 (2018-07-02)
Installing github package(s) 'LTLA/BiocSingular'
Downloading GitHub repo LTLA/BiocSingular@master
Installing 1 packages: rsvd
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/rsvd_1.0.0.tgz'
Content type 'application/x-gzip' length 6130139 bytes (5.8 MB)
==================================================
downloaded 5.8 MB


The downloaded binary packages are in
	/var/folders/11/zcb2x4g97c92gr68fsd15k6c0000gn/T//RtmpHcxZv1/downloaded_packageschecking for file/private/var/folders/11/zcb2x4g97c92gr68fsd15k6c0000gn/T/RtmpHcxZv1/remotesa673acb6ec9/LTLA-BiocSingular-cfaba17/DESCRIPTION...preparingBiocSingular:checking DESCRIPTION meta-information ...cleaning srcchecking for LF line-endings in source and make files and shell scriptschecking for empty or unneeded directoriesbuildingBiocSingular_0.99.12.tar.gz* installing *source* packageBiocSingular...
** libs
clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/beachmat/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rhdf5lib/include" -I/usr/local/include   -fPIC  -Wall -g -O2 -c RcppExports.cpp -o RcppExports.o
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp.h:27:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/RcppCommon.h:29:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp/r/headers.h:59:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:153:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/unordered_map:369:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/__hash_table:16:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/memory:653:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/typeinfo:61:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/exception:82:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/cstdlib:86:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/stdlib.h:94:
In file included from /usr/include/stdlib.h:65:
In file included from /usr/include/sys/wait.h:110:
In file included from /usr/include/sys/resource.h:72:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/stdint.h:119:
In file included from /usr/local/include/stdint.h:82:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/inttypes.h:247:
In file included from /Library/Developer/CommandLineTools/usr/lib/clang/10.0.0/include/inttypes.h:30:
/usr/include/inttypes.h:235:8: error: unknown type name 'intmax_t'
extern intmax_t
       ^
/usr/include/inttypes.h:236:9: error: unknown type name 'intmax_t'
imaxabs(intmax_t j);
        ^
/usr/include/inttypes.h:240:2: error: unknown type name 'intmax_t'
        intmax_t quot;
        ^
/usr/include/inttypes.h:241:2: error: unknown type name 'intmax_t'
        intmax_t rem;
        ^
/usr/include/inttypes.h:246:9: error: unknown type name 'intmax_t'
imaxdiv(intmax_t __numer, intmax_t __denom);
        ^
/usr/include/inttypes.h:246:27: error: unknown type name 'intmax_t'
imaxdiv(intmax_t __numer, intmax_t __denom);
                          ^
/usr/include/inttypes.h:250:8: error: unknown type name 'intmax_t'
extern intmax_t
       ^
/usr/include/inttypes.h:256:8: error: unknown type name 'uintmax_t'; did you mean 'uintptr_t'?
extern uintmax_t
       ^
/usr/include/sys/_types/_uintptr_t.h:30:24: note: 'uintptr_t' declared here
typedef unsigned long           uintptr_t;
                                ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp.h:27:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/RcppCommon.h:29:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp/r/headers.h:59:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:153:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/unordered_map:369:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/__hash_table:16:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/memory:653:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/typeinfo:61:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/exception:82:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/cstdlib:86:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/stdlib.h:94:
In file included from /usr/include/stdlib.h:65:
In file included from /usr/include/sys/wait.h:110:
In file included from /usr/include/sys/resource.h:72:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/stdint.h:119:
In file included from /usr/local/include/stdint.h:82:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/inttypes.h:247:
In file included from /Library/Developer/CommandLineTools/usr/lib/clang/10.0.0/include/inttypes.h:30:
/usr/include/inttypes.h:263:8: error: unknown type name 'intmax_t'
extern intmax_t
       ^
/usr/include/inttypes.h:269:8: error: unknown type name 'uintmax_t'; did you mean 'uintptr_t'?
extern uintmax_t
       ^
/usr/include/sys/_types/_uintptr_t.h:30:24: note: 'uintptr_t' declared here
typedef unsigned long           uintptr_t;
                                ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp.h:27:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/RcppCommon.h:29:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp/r/headers.h:59:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:153:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/unordered_map:369:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/__hash_table:16:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/memory:653:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/typeinfo:61:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/exception:82:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/cstdlib:86:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/stdlib.h:94:
In file included from /usr/include/stdlib.h:65:
In file included from /usr/include/sys/wait.h:110:
/usr/include/sys/resource.h:197:2: error: unknown type name 'uint64_t'
        uint64_t ri_user_time;
        ^
/usr/include/sys/resource.h:198:2: error: unknown type name 'uint64_t'
        uint64_t ri_system_time;
        ^
/usr/include/sys/resource.h:199:2: error: unknown type name 'uint64_t'
        uint64_t ri_pkg_idle_wkups;
        ^
/usr/include/sys/resource.h:200:2: error: unknown type name 'uint64_t'
        uint64_t ri_interrupt_wkups;
        ^
/usr/include/sys/resource.h:201:2: error: unknown type name 'uint64_t'
        uint64_t ri_pageins;
        ^
/usr/include/sys/resource.h:202:2: error: unknown type name 'uint64_t'
        uint64_t ri_wired_size;
        ^
/usr/include/sys/resource.h:203:2: error: unknown type name 'uint64_t'
        uint64_t ri_resident_size;
        ^
/usr/include/sys/resource.h:204:2: error: unknown type name 'uint64_t'
        uint64_t ri_phys_footprint;
        ^
/usr/include/sys/resource.h:205:2: error: unknown type name 'uint64_t'
        uint64_t ri_proc_start_abstime;
        ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make: *** [RcppExports.o] Error 1
ERROR: compilation failed for packageBiocSingular* removing/Library/Frameworks/R.framework/Versions/3.5/Resources/library/BiocSingularError in i.p(...) : 
  (converted from warning) installation of package/var/folders/11/zcb2x4g97c92gr68fsd15k6c0000gn/T//RtmpHcxZv1/filea674473df68/BiocSingular_0.99.12.tar.gzhad non-zero exit status
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0         rstudioapi_0.9.0   knitr_1.21         magrittr_1.5       R6_2.4.0           tools_3.5.1        pkgbuild_1.0.2     xfun_0.5          
 [9] cli_1.0.1          withr_2.1.2        htmltools_0.3.6    remotes_2.0.2      yaml_2.2.0         digest_0.6.18      assertthat_0.2.0   rprojroot_1.3-2   
[17] crayon_1.3.4       processx_3.2.1     BiocManager_1.30.4 callr_3.1.1        ps_1.3.0           curl_3.3           evaluate_0.13      rmarkdown_1.11    
[25] compiler_3.5.1     backports_1.1.3    prettyunits_1.0.2  BiocStyle_2.10.0  

`runIrlbaSVD()` on DelayedMatrix

I'm trying to run

svd <- runIrlbaSVD(x)

where x is a DelayedMatrix, but I get the following error:

Error in W[, j_w] <- avj : 
  number of items to replace is not a multiple of replacement length

This is the same error that I get if I run irlba::irlba() directly on x. Looking at the runIrlbaSVD() code, it looks like it's indeed passing a DelayedMatrix (the result of standardize_matrix()) to irlba::irlba().

Obviously, this works:

svd <- runIrlbaSVD(as.matrix(x))

But is there a way to use irlba on DelayedMatrices or should I stick to random or exact SVD?

show method for LowRankMatrix gives `'S4' is not subsettable` on some machines

I can't reproduce the problem myself but in a seminar of 8 students I was teaching, about half of them were having this error. Here is an example from one of them, with a stop(error=recover) trace. Below I include some other with cleaner sessionInfo() (I asked everyone to use reprex on this code but not everyone did...) If there's more information that would be useful from one of the students seeing this error, let me know.

> library(BiocSingular)
> a <- matrix(rnorm(100000), ncol=20)
> out <- runPCA(a, rank=10)
> lr <- LowRankMatrix(out$rotation, out$x)
> lr

Error in (function (cond) :
errore durante la valutazione dell'argomento 'x' nella selezione di un metodo per la funzione 'type': object of type 'S4' is not subsettable

> sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default


locale:
[1] LC_COLLATE=Italian_Italy.utf8 LC_CTYPE=Italian_Italy.utf8 LC_MONETARY=Italian_Italy.utf8
[4] LC_NUMERIC=C LC_TIME=Italian_Italy.utf8

time zone: Europe/Rome
tzcode source: internal

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] batchelor_1.16.0 BiocSingular_1.16.0 scran_1.28.1
[4] scater_1.28.0 ggplot2_3.4.2 scuttle_1.10.1
[7] TENxPBMCData_1.18.0 HDF5Array_1.28.1 rhdf5_2.44.0
[10] DelayedArray_0.26.3 S4Arrays_1.0.4 Matrix_1.5-4.1
[13] SingleCellExperiment_1.22.0 SummarizedExperiment_1.30.2 Biobase_2.60.0
[16] GenomicRanges_1.52.0 GenomeInfoDb_1.36.1 IRanges_2.34.1
[19] S4Vectors_0.38.1 BiocGenerics_0.46.0 MatrixGenerics_1.12.2
[22] matrixStats_1.0.0

loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.14 jsonlite_1.8.7
[4] magrittr_2.0.3 ggbeeswarm_0.7.2 farver_2.1.1
[7] rmarkdown_2.23 zlibbioc_1.46.0 vctrs_0.6.3
[10] memoise_2.0.1 DelayedMatrixStats_1.22.1 RCurl_1.98-1.12
[13] htmltools_0.5.5 AnnotationHub_3.8.0 curl_5.0.1
[16] BiocNeighbors_1.18.0 Rhdf5lib_1.22.0 sass_0.4.6
[19] bslib_0.5.0 cachem_1.0.8 ResidualMatrix_1.10.0
[22] igraph_1.5.0 mime_0.12 lifecycle_1.0.3
[25] pkgconfig_2.0.3 rsvd_1.0.5 R6_2.5.1
[28] fastmap_1.1.1 GenomeInfoDbData_1.2.10 shiny_1.7.4
[31] digest_0.6.32 colorspace_2.1-0 AnnotationDbi_1.62.2
[34] dqrng_0.3.0 irlba_2.3.5.1 ExperimentHub_2.8.0
[37] RSQLite_2.3.1 beachmat_2.16.0 labeling_0.4.2
[40] filelock_1.0.2 fansi_1.0.4 httr_1.4.6
[43] compiler_4.3.1 withr_2.5.0 bit64_4.0.5
[46] BiocParallel_1.34.2 viridis_0.6.3 DBI_1.1.3
[49] rappdirs_0.3.3 bluster_1.10.0 tools_4.3.1
[52] vipor_0.4.5 beeswarm_0.4.0 interactiveDisplayBase_1.38.0
[55] httpuv_1.6.11 glue_1.6.2 rhdf5filters_1.12.1
[58] promises_1.2.0.1 rsconnect_0.8.29 grid_4.3.1
[61] Rtsne_0.16 cluster_2.1.4 generics_0.1.3
[64] gtable_0.3.3 ScaledMatrix_1.8.1 metapod_1.8.0
[67] utf8_1.2.3 XVector_0.40.0 RcppAnnoy_0.0.21
[70] ggrepel_0.9.3 BiocVersion_3.17.1 pillar_1.9.0
[73] limma_3.56.2 later_1.3.1 dplyr_1.1.2
[76] BiocFileCache_2.8.0 lattice_0.21-8 FNN_1.1.3.2
[79] bit_4.0.5 tidyselect_1.2.0 locfit_1.5-9.8
[82] Biostrings_2.68.1 knitr_1.43 gridExtra_2.3
[85] edgeR_3.42.4 xfun_0.39 statmod_1.5.0
[88] pheatmap_1.0.12 yaml_2.3.7 evaluate_0.21
[91] codetools_0.2-19 tibble_3.2.1 BiocManager_1.30.21
[94] cli_3.6.1 uwot_0.1.16 xtable_1.8-4
[97] munsell_0.5.0 jquerylib_0.1.4 Rcpp_1.0.10
[100] dbplyr_2.3.2 png_0.1-8 parallel_4.3.1
[103] ellipsis_0.3.2 blob_1.2.4 sparseMatrixStats_1.12.1
[106] bitops_1.0-7 viridisLite_0.4.2 scales_1.2.1
[109] purrr_1.0.1 crayon_1.5.2 rlang_1.1.1
[112] cowplot_1.1.1 KEGGREST_1.40.0

> options(error=recover)
> lr
Error in (function (cond) :
errore durante la valutazione dell'argomento 'x' nella selezione di un metodo per la funzione 'type': object of type 'S4' is not subsettable

Enter a frame number, or 0 to exit

1: (new("standardGeneric", .Data = function (object)
standardGeneric("show"), generic = "show", package = "methods"
2: (new("standardGeneric", .Data = function (object)
standardGeneric("show"), generic = "show", package = "methods"
3: S4Arrays:::show_compact_array(object)
4: cat(array_as_one_line_summary(object))
5: array_as_one_line_summary(object)
6: sprintf("<%s>%s %s object of type \"%s\"", paste0(x_dim, collapse = " x "), if (is_sparse(x)) " sparse" else "",
7: type(x)
8: type(x)
9: type(.extract_empty_array(x))
10: .extract_empty_array(x)
11: extract_array(x, index)
12: extract_array(x, index)
13: callNextMethod()
14: .nextMethod(x = x, index = index)
15: extract_array(x@seed, index)
16: extract_array(x@seed, index)
17: subset_by_Nindex(x, index)
18: do.call(`[`, c(list(x), subscripts, list(drop = drop)))
19: do.call(`[`, c(list(x), subscripts, list(drop = drop)))
20: (function (cond)
.Internal(C_tryCatchHelper(addr, 1, cond)))(list(message = "object of type 'S4' is not subsetta
library(BiocSingular)
a <- matrix(rnorm(100000), ncol=20)
out <- runPCA(a, rank=10)
lr <- LowRankMatrix(out$rotation, out$x)
lr

sessionInfo()
Error in (function (cond) :
errore durante la valutazione dell'argomento 'x' nella selezione di un metodo per la funzione 'type': object of type 'S4' is not subsettable

> sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22000)

Matrix products: default


locale:
[1] LC_COLLATE=Italian_Italy.utf8 LC_CTYPE=Italian_Italy.utf8
[3] LC_MONETARY=Italian_Italy.utf8 LC_NUMERIC=C
[5] LC_TIME=Italian_Italy.utf8

time zone: Europe/Rome
tzcode source: internal

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] BiocSingular_1.16.0 TENxPBMCData_1.18.0
[3] HDF5Array_1.28.1 rhdf5_2.44.0
[5] DelayedArray_0.26.3 S4Arrays_1.0.4
[7] Matrix_1.5-4.1 edgeR_3.42.4
[9] limma_3.56.2 bluster_1.10.0
[11] batchelor_1.16.0 scran_1.28.1
[13] scater_1.28.0 ggplot2_3.4.2
[15] scuttle_1.10.1 MouseGastrulationData_1.14.0
[17] SpatialExperiment_1.10.0 BiocManager_1.30.21
[19] SingleCellExperiment_1.22.0 SummarizedExperiment_1.30.2
[21] Biobase_2.60.0 GenomicRanges_1.52.0
[23] GenomeInfoDb_1.36.1 IRanges_2.34.1
[25] S4Vectors_0.38.1 BiocGenerics_0.46.0
[27] MatrixGenerics_1.12.2 matrixStats_1.0.0

loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 jsonlite_1.8.5
[3] magrittr_2.0.3 ggbeeswarm_0.7.2
[5] magick_2.7.4 rmarkdown_2.23
[7] farver_2.1.1 BiocIO_1.10.0
[9] zlibbioc_1.46.0 vctrs_0.6.3
[11] Rsamtools_2.16.0 memoise_2.0.1
[13] DelayedMatrixStats_1.22.1 RCurl_1.98-1.12
[15] htmltools_0.5.5 AnnotationHub_3.8.0
[17] curl_5.0.1 BiocNeighbors_1.18.0
[19] Rhdf5lib_1.22.0 sass_0.4.6
[21] bslib_0.5.0 cachem_1.0.8
[23] ResidualMatrix_1.10.0 GenomicAlignments_1.36.0
[25] igraph_1.5.0 mime_0.12
[27] lifecycle_1.0.3 pkgconfig_2.0.3
[29] rsvd_1.0.5 R6_2.5.1
[31] fastmap_1.1.1 GenomeInfoDbData_1.2.10
[33] shiny_1.7.4 digest_0.6.32
[35] colorspace_2.1-0 AnnotationDbi_1.62.2
[37] dqrng_0.3.0 irlba_2.3.5.1
[39] ExperimentHub_2.8.0 RSQLite_2.3.1
[41] beachmat_2.16.0 filelock_1.0.2
[43] labeling_0.4.2 fansi_1.0.4
[45] httr_1.4.6 compiler_4.3.1
[47] bit64_4.0.5 withr_2.5.0
[49] BiocParallel_1.34.2 viridis_0.6.3
[51] DBI_1.1.3 R.utils_2.12.2
[53] rappdirs_0.3.3 rjson_0.2.21
[55] tools_4.3.1 vipor_0.4.5
[57] beeswarm_0.4.0 interactiveDisplayBase_1.38.0
[59] httpuv_1.6.11 R.oo_1.25.0
[61] glue_1.6.2 restfulr_0.0.15
[63] rhdf5filters_1.12.1 promises_1.2.0.1
[65] grid_4.3.1 Rtsne_0.16
[67] cluster_2.1.4 generics_0.1.3
[69] gtable_0.3.3 R.methodsS3_1.8.2
[71] ScaledMatrix_1.8.1 metapod_1.8.0
[73] utf8_1.2.3 XVector_0.40.0
[75] RcppAnnoy_0.0.20 ggrepel_0.9.3
[77] BiocVersion_3.17.1 pillar_1.9.0
[79] BumpyMatrix_1.8.0 later_1.3.1
[81] splines_4.3.1 dplyr_1.1.2
[83] BiocFileCache_2.8.0 lattice_0.21-8
[85] rtracklayer_1.60.0 FNN_1.1.3.2
[87] bit_4.0.5 tidyselect_1.2.0
[89] locfit_1.5-9.8 Biostrings_2.68.1
[91] knitr_1.43 gridExtra_2.3
[93] xfun_0.39 statmod_1.5.0
[95] DropletUtils_1.20.0 pheatmap_1.0.12
[97] yaml_2.3.7 evaluate_0.21
[99] codetools_0.2-19 tibble_3.2.1
[101] cli_3.6.1 uwot_0.1.15
[103] xtable_1.8-4 jquerylib_0.1.4
[105] munsell_0.5.0 Rcpp_1.0.10
[107] dbplyr_2.3.2 png_0.1-8
[109] XML_3.99-0.14 parallel_4.3.1
[111] ellipsis_0.3.2 blob_1.2.4
[113] sparseMatrixStats_1.12.1 bitops_1.0-7
[115] viridisLite_0.4.2 scales_1.2.1
[117] purrr_1.0.1 crayon_1.5.2
[119] rlang_1.1.1 KEGGREST_1.40.0
> library(BiocSingular)
> a <- matrix(rnorm(100000), ncol=20)
> out <- runPCA(a, rank=10)
> lr <- LowRankMatrix(out$rotation, out$x)
> lr
Error in (function (cond)  :
  error in evaluating the argument 'x' in selecting a method for function 'type': object of type 'S4' is not subsettable
> sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Europe/Rome
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] BiocSingular_1.16.0

loaded via a namespace (and not attached):
 [1] codetools_0.2-19      Matrix_1.5-4.1        lattice_0.21-8        MatrixGenerics_1.12.2 rsvd_1.0.5            matrixStats_1.0.0     S4Arrays_1.0.4      
 [8] parallel_4.3.1        BiocGenerics_0.46.0   stats4_4.3.1          IRanges_2.34.1        grid_4.3.1            DelayedArray_0.26.3   compiler_4.3.1      
[15] beachmat_2.16.0       tools_4.3.1           irlba_2.3.5.1         ScaledMatrix_1.8.1    Rcpp_1.0.10           BiocManager_1.30.21   crayon_1.5.2

DelayedArray and SVD questions

Hi,

I am considering using the BiocSingular, DelayedArray, and HDF5Array packages for initial processing of large single-cell data sets where I use on-disk storage of the expression matrices.

I wonder if you might be willing to answer some of my questions, and if so, where you prefer that I post them.

As some background, I have run timing tests in which I begin processing from an sci-RNA-seq counts matrix, estimate size factors, normalize counts, calculate column means and variances, and then run singular value decomposition. It became clear that the SVD is the bottleneck so I saved an object with the elements required for the SVD and limited timing tests to the SVD.

I ran the SVD using dgCMatrix sparse matrix passed to irlba::irlba in order to get a reference time.I followed this with tests in which I wrapped the dgCMatrix in a DelayedArray and passed it to irlba::irlba, to BiocSingular::runIrlbaSVD, and BiocSingular::runRandomSVD. I tried also using the HDF5Array::TENxMatrix as the DelayedArray seed. The run times for matrices wrapped in DelayedArray are substantially longer than the runs using sparse matrices. My biggest concern is that I may be running these tests incorrectly.

As a warning, I have a poor understanding of DelayedArrays so at least some of my questions may be basic.

Thank you.
Brent

Make deferred=NA the default for IrlbaParam and RandomParam

Some kind of deferring is already the effective default for single-core IRLBA, given how the algorithm works. If it's NA, we just let it continue doing its thing; if FALSE, we scale, and if TRUE, we use a ScaledMatrix.

For RandomParam, if it's NA, we make the choice depending on whether the backend is_sparse() or not. This should preserve back-compatibility for dense inputs while taking advantage of the new sparse file-backed methods.

Segmentation Fault for Ultra-High-Dimensional Data

Let's say there is a CyTOF data set with 45 million cells and 60 markers.

library(BiocSingular)
test <- matrix(rnorm(45000000*60), ncol = 60)
test <- runSVD(test, k = 5) 

 *** caught segfault ***
address 0x7f0ae16da618, cause 'memory not mapped'
Error in La.svd(x, nu, nv) : 
  long vectors not supported yet: ../../src/include/Rinlinedfuns.h:537

Can it fail fast instead (or perhaps handle long vectors)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.