Git Product home page Git Product logo

snparrays.jl's Introduction

SnpArrays.jl

Documentation Build Status Code Coverage Citation
build Actions Status Coverage Status codecov DOI

Routines for reading and manipulating compressed storage of biallelic SNP (Single-Nucleotide Polymorphism) data.

Data from genome-wide association studies (GWAS) are often saved as a PLINK binary biallelic genotype table or .bed file. To be useful, such files should be accompanied by a .fam file, containing metadata on the rows of the table, and a .bim file, containing metadata on the columns. The .fam and .bim files are in tab-separated format.

Linear algebra operations on the PLINK formatted data now support multi-threading and GPU (CUDA) computing.

Installation

This package requires Julia v1.5 or later, which can be obtained from https://julialang.org/downloads/ or by building Julia from the sources in the https://github.com/JuliaLang/julia repository.

This package is registered in the default Julia package registry, and can be installed through standard package installation procedure.

using Pkg
pkg"add SnpArrays"

Use the backspace key to return to the Julia REPL.

Citation

If you use OpenMendel analysis packages in your research, please cite the following reference in the resulting publications:

Zhou H, Sinsheimer JS, Bates DM, Chu BB, German CA, Ji SS, Keys KL, Kim J, Ko S, Mosher GD, Papp JC, Sobel EM, Zhai J, Zhou JJ, Lange K. OPENMENDEL: a cooperative programming project for statistical genetics. Hum Genet. 2020 Jan;139(1):61-71. doi: 10.1007/s00439-019-02001-z. Epub 2019 Mar 26. PMID: 30915546; PMCID: PMC6763373.

Acknowledgments

Current implementation incorporates ideas in the package BEDFiles.jl by Doug Bates (@dmbates).

Chris Elrod (@chriselrod) helped us accelerate CPU linear algebra through his great support of LoopVectorization.jl package.

This project has been supported by the National Institutes of Health under awards R01GM053275, R01HG006139, R25GM103774, and 1R25HG011845.

snparrays.jl's People

Contributors

alanderos91 avatar andreasnoack avatar biona001 avatar brendonchau avatar chris-german avatar github-actions[bot] avatar hua-zhou avatar jcpapp avatar joshday avatar kose-y avatar msuchard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

snparrays.jl's Issues

Mean imputation for linear algebra?

Currently, linear algebra routines impute any missing value with zero. I know it's going to be a performance killer, but isn't imputing it with the mean more natural?

SnpBitMatrix-based GPU acceleration

CPU version of SnpBitMatrix is 2x faster than the direct SnpLinAlg. Wouldn't SnpBitMatrix-based GPU acceleration be faster than SnpArray-based one? Now timing for SnpLinAlg and SnpBitMatrix are similar. But still, let's check it.

Convert SnpArray into matrix{Int} doesn't obey conversion convention

According to documentation, 0x01 encodes missing (NaN), 0x02 encode 1, 0x03 encodes 2. But if we convert a SnpArray into matrix of subtype Int, 0x01 will become 1, 0x02 = 2, and 0x03 = 3.

using SnpArrays
const mouse = SnpArray(SnpArrays.datadir("mouse.bed"))
julia> convert(Matrix{Float64}, mouse)[1:10]
10-element Array{Float64,1}:
 1.0
 1.0
 2.0
 1.0
 2.0
 1.0
 1.0
 1.0
 1.0
 2.0
julia> convert(Matrix{Int64}, mouse)[1:10]
10-element Array{Int64,1}:
 2
 2
 3
 2
 3
 2
 2
 2
 2
 3

Is this by design or a bug?

Problem adding SnpArrays with Julia 1.2

Hello, I typed the following command and was unable to add the current version of SnpArrays.jl

(v1.2) pkg> add https://github.com/OpenMendel/SnpArrays.jl
Updating registry at ~/.julia/registries/General
Updating git-repo https://github.com/JuliaRegistries/General.git
Updating git-repo https://github.com/OpenMendel/SnpArrays.jl
Updating git-repo https://github.com/OpenMendel/SnpArrays.jl
Resolving package versions...

signal (11): Segmentation fault: 11
in expression starting at none:0
unknown function (ip: 0xffffffffffffffff)
Allocations: 6739978 (Pool: 6738748; Big: 1230); GC: 15

LD Pruning of SNPs

@Hua-Zhou at #58

We may open another issue regarding LD pruning of SNPs so that remaining
SNPs are not very highly related. A literature review of how Plink does it
is helpful.

Convert SnpArray to NullableArray

It'll be useful to be able to convert SnpArray to NullableArray. Currently if we convert SnpArray to Array{Int}, the missing genotypes cause errors.

Further subsetting/splitting/merging support

@Hua-Zhou in #25:

When working with Biobank data, the sample size is up to 1 million. Analysts are splitting the Plink file into regions even much smaller than a chromosome, because for large chromosomes like 1 and 21 the Plink files can still be too large for current analysis software to handle. We may need to think about a unifying interface for subsetting, splitting, and merging SnpData either along SNPs or individuals.

default convert to Float64 returns backwards genotypes under additive model?

Shouldn't a SnpArray element (true, true) map to floating point 2.0 and (false, false) to floating point 0.0? Currently the default convert to a Matrix{Float64} flips them. There appears to be no way for the user to convert to Matrix{Float64} with a specification for minor_allele, unlike convert for the Boolean tuples (currently line 150 of SnpArrays.jl).

Minimal working example:

srand(2016);
z = SnpArray(rand(0:2, 5, 3));
Z = convert(Matrix{Float64}, z);
display(z)
display(Z)

Output for z:

5x3 SnpArrays.SnpArray{2}:
(false,true)   (false,true)  (true,true) 
(false,false)  (true,true)   (true,true) 
(true,true)    (true,true)   (true,true) 
(true,true)    (false,true)  (false,true)
(false,true)   (false,true)  (false,true)

Output for Z:

5x3 Array{Float64,2}:
 1.0  1.0  0.0
 2.0  0.0  0.0
 0.0  0.0  0.0
 0.0  1.0  1.0
 1.0  1.0  1.0

Prune a kinship matrix to eliminate high relatedness

It's a common task in the genetic analysis workflow to prune kinship matrix to eliminate highly related individuals according to a threshold of kinship values. GCTA recommends a default threshold of 0.025 (2nd degree cousin, translated to 0.0125 in our notation).

The function interface can be simple as

keepidx = prune::AbstractMatrix; kinship_threshold::Number = 0.0125)

The output is a BitVector such that Φ[keepidx, keepidx] has no off-diagonal entries above kinship_threshold.

A simple heuristic algorithm can be:

  1. Initialize keepidx to be all trues.
  2. Calculate the number of (off-diagonal) entries above kinship_thresold per column.
  3. Find the column with maximum number of entries above kinship_threshold and set the corresponding keepidx entry to false.
  4. Update the counts from Step 1 for remaining columns.
    Repeat Steps 3-4 until no columns have counts > 0.

Any reference to how current GCTA and PLINK do it will be helpful.

σinv calculated wrong in SnpBitMatrix

The following example shows that σinv is not computed correctly in SnpBitMatrix.

#stupid function to simulate a random non-0 SnpArray
function simulate_random_snparray(
    n :: Int64,
    p :: Int64
)
    x_tmp = rand(0:2, n, p)
    x = SnpArray(undef, n, p)
    for i in 1:(n*p)
        if x_tmp[i] == 0
            x[i] = 0x00
        elseif x_tmp[i] == 1
            x[i] = 0x02
        else
            x[i] = 0x03
        end
    end
    return x
end

Compute the inverse standard deviation of a random SnpArray

using Random, SnpArrays
Random.seed!(1111)

x = simulate_random_snparray(100, 1)
xbm = SnpBitMatrix{Float64}(x, model=ADDITIVE_MODEL, center=true, scale=true);
xbm_vector = convert(Matrix{Float64}, x)

julia> xbm.σinv
1-element Array{Float64,1}:
 1.4159846508095772

julia> 1 / std(xbm_vector)
1.166543930706444

The following function computes the inverse standard deviation correctly, but I am not sure what's the best way to do this as well.

function std_reciprocal(
    x        :: SnpBitMatrix, 
    mean_vec :: Vector{T}
) where {T <: Float64}
    m, n = size(x)
    @assert n == length(mean_vec) "number of columns of snpmatrix doesn't agree with length of mean vector"
    std_vector = zeros(T, n)

    @inbounds for j in 1:n
        @simd for i in 1:m
            a1 = x.B1[i, j]
            a2 = x.B2[i, j]
            std_vector[j] += (convert(T, a1 + a2) - mean_vec[j])^2
        end
        std_vector[j] = 1.0 / sqrt(std_vector[j] / (m - 1))
    end
    return std_vector
end

julia> std_reciprocal(xbm, xbm.μ)
1-element Array{Float64,1}:
 1.166543930706444

sort by survival time (or reordering subjects by a set of indices)

This feature would be useful for computing partial log-likelihood in Cox regression, particularly for the terms involving summation over "subjects with longer survival time than subject i". If the SnpArray is sorted by decreasing survival time, cumsum can be employed to compute these terms with O(1) memory requirement (unless there are ties).

no A_mul_B!, Ac_mul_B! routines

How does one perform matrix-vector multiplication with a SnpArray? Currently a SnpArray lacks A_mul_B! and Ac_mul_B! typed specifically for SnpMatrix, which precludes operations like x * y for SnpArray object x and Vector{AbstractFloat} object y. The routines Acsc_mul_B! and Acs_mul_B! exist, but they call A_mul_B! and Ac_mul_B!.

Best way to perform linear algebra on a subset of SnpArray v0.7?

For my application I often need to perform linear algebra on subsets of a SnpArray. What is the best way to do this?

My main problem is a SnpBitMatrix cannot be create on x_subset1 or x_subset2:

using LinearAlgebra, SnpArrays
x = SnpArray(undef, 10000, 10000)
x_subset1 = x[1:1000, 1:1000]
x_subset2 = @view x[1:1000, 1:1000]
julia> x_subsetbm1 = SnpBitMatrix{Float64}(x_subset1, center=true, scale=true);

ERROR: MethodError: no method matching SnpBitMatrix{Float64}(::Array{UInt8,2}; center=true, scale=true)
Closest candidates are:
  SnpBitMatrix{Float64}(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any) where T at /Users/biona001/.julia/packages/SnpArrays/pfwqg/src/linalg.jl:2 got unsupported keyword arguments "center", "scale"
  SnpBitMatrix{Float64}(::Any) where T<:AbstractArray at abstractarray.jl:22 got unsupported keyword arguments "center", "scale"
  SnpBitMatrix{Float64}(::SnpArray; model, center, scale) where T<:AbstractFloat at /Users/biona001/.julia/packages/SnpArrays/pfwqg/src/linalg.jl:19
Stacktrace:
 [1] top-level scope at none:0
julia> x_subsetbm2 = SnpBitMatrix{Float64}(x_subset2, center=true, scale=true);

ERROR: MethodError: no method matching SnpBitMatrix{Float64}(::SubArray{UInt8,2,SnpArray,Tuple{UnitRange{Int64},UnitRange{Int64}},false}; center=true, scale=true)
Closest candidates are:
  SnpBitMatrix{Float64}(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any) where T at /Users/biona001/.julia/packages/SnpArrays/pfwqg/src/linalg.jl:2 got unsupported keyword arguments "center", "scale"
  SnpBitMatrix{Float64}(::Any) where T<:AbstractArray at abstractarray.jl:22 got unsupported keyword arguments "center", "scale"
  SnpBitMatrix{Float64}(::SnpArray; model, center, scale) where T<:AbstractFloat at /Users/biona001/.julia/packages/SnpArrays/pfwqg/src/linalg.jl:19
Stacktrace:
 [1] top-level scope at none:0

I could instantiate a smaller SnpArray and copy desired elements into my new SnpArray, instantiate a SnpBitMatrix from this new SnpArray and use that to calculate things, but I feel like this is not a good solution:

x_subset3 = SnpArray(undef, 1000, 1000)
copyto!(x_subset3, @view x[1:1000, 1:1000])
xbm_subset = SnpBitMatrix{Float64}(x_subset3, model=ADDITIVE_MODEL, center=true, scale=true);

Cannot use `view()` on `A_mul_B!`

Since A_mul_B accepts SnpLike{2} for the matrix argument, I thought I could use view() to do multiplication on just a subset of my snparray. However the following shows this cannot be done.

S = SnpArray(rand(0:2, 10, 10))
b = rand(5)
answer = zeros(5)

Do multiplication on the top left corner:

julia> A_mul_B!(answer, view(S, 1:5, 1:5), b, similar(answer))
ERROR: type SubArray has no field A1
Stacktrace:
 [1] A_mul_B!(::Array{Float64,1}, ::SubArray{Tuple{Bool,Bool},2,SnpArrays.SnpArray{2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}, ::Array{Float64,1}, ::Array{Float64,1}) at /Users/biona001/.julia/v0.6/SnpArrays/src/SnpArrays.jl:728

edit:
Adding example to show that A_mul_B! works on subarrays without loading packages:

x = rand(10, 10)
b = rand(5)
answer = zeros(5)

julia> A_mul_B!(answer, view(x, 1:5, 1:5), b)
5-element Array{Float64,1}:
 1.59663
 1.19226
 1.17276
 0.675278
 1.56565

Cannot update SnpArrays

I get an invalid version range: ">= 2020"

$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0 (2020-08-01)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(@v1.5) pkg> update SnpArrays
   Updating registry at `~/.julia/registries/General`
   Updating git-repo `https://github.com/JuliaRegistries/General.git`
   Updating git-repo `https://github.com/OpenMendel/SnpArrays.jl.git`
ERROR: ArgumentError: invalid version range: ">= 2020"
Stacktrace:
 [1] Pkg.Types.VersionRange(::String) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/versions.jl:136
 [2] VersionSpec at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/versions.jl:219 [inlined]
 [3] load_package_data(::Type{Pkg.Types.VersionSpec}, ::String, ::Array{VersionNumber,1}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/Operations.jl:185
 [4] deps_graph(::Pkg.Types.Context, ::Dict{Base.UUID,String}, ::Dict{Base.UUID,Pkg.Types.VersionSpec}, ::Dict{Base.UUID,Pkg.Resolve.Fixed}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/Operations.jl:459
 [5] resolve_versions!(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/Operations.jl:367
 [6] up(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}, ::Pkg.Types.UpgradeLevel) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/Operations.jl:1215
 [7] up(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}; level::Pkg.Types.UpgradeLevel, mode::Pkg.Types.PackageMode, update_registry::Bool, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/API.jl:246
 [8] up at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/API.jl:222 [inlined]
 [9] #up#37 at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/API.jl:67 [inlined]
 [10] up(::Array{Pkg.Types.PackageSpec,1}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/API.jl:67
 [11] do_cmd!(::Pkg.REPLMode.Command, ::REPL.LineEditREPL) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/REPLMode/REPLMode.jl:404
 [12] do_cmd(::REPL.LineEditREPL, ::String; do_rethrow::Bool) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/REPLMode/REPLMode.jl:382
 [13] do_cmd at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/REPLMode/REPLMode.jl:377 [inlined]
 [14] (::Pkg.REPLMode.var"#24#27"{REPL.LineEditREPL,REPL.LineEdit.Prompt})(::REPL.LineEdit.MIState, ::Base.GenericIOBuffer{Array{UInt8,1}}, ::Bool) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/REPLMode/REPLMode.jl:546
 [15] #invokelatest#1 at ./essentials.jl:710 [inlined]
 [16] invokelatest at ./essentials.jl:709 [inlined]
 [17] run_interface(::REPL.Terminals.TextTerminal, ::REPL.LineEdit.ModalInterface, ::REPL.LineEdit.MIState) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/REPL/src/LineEdit.jl:2355
 [18] run_frontend(::REPL.LineEditREPL, ::REPL.REPLBackendRef) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl:1143
 [19] (::REPL.var"#38#42"{REPL.LineEditREPL,REPL.REPLBackendRef})() at ./task.jl:356

Parallelize linear algebra routines for SnpArray

ParallelSparseMatMul seemed to parallelized At_mul_B! for SparseMatrixCSC: https://github.com/madeleineudell/ParallelSparseMatMul.jl/blob/master/src/parallel_matmul.jl

I was experimenting with some of its ideas. It seems odd the below code sped up anything, but it does. Should I continue?

using SnpArrays, BenchmarkTools

#turns a SnpArray into a SharedArray{Tuple{Bool,Bool},2}
function share{T}(a::AbstractArray{T};kwargs...)
    sh = SharedArray{T}(size(a);kwargs...)
    for i=1:length(a)
        sh.s[i] = a[i]
    end
    return sh
end
share(A::SnpArray,pids::AbstractVector{Int}) = SharedSnpArray(share(A.A1,pids=pids),share(A.A2,pids=pids),pids)
share(A::SnpArray) = share(A::SnpArray,pids=workers())

#Constructor for SharedSnpArray
function SharedSnpArray(plinkFile::AbstractString) 
    share(SnpArray(plinkFile))
end

Check if SharedSnpArray can still do (SnpArray)-(vector) multiplication correctly (Yes)

x = SnpArray("hapmap3")
s = SharedSnpArray("hapmap3")
v = rand(13928)

julia> all(x*v .≈ s*v)
true

Is (SharedSnpArray)-(vector) faster than (SnpArray)-(vector)? (Yes)

julia> @benchmark x*v

BenchmarkTools.Trial:
  memory estimate:  5.38 KiB
  allocs estimate:  2
  --------------
  minimum time:     30.724 ms (0.00% GC)
  median time:      34.174 ms (0.00% GC)
  mean time:        33.786 ms (0.00% GC)
  maximum time:     39.272 ms (0.00% GC)
  --------------
  samples:          148
  evals/sample:     1

@benchmark s*v
BenchmarkTools.Trial:
  memory estimate:  2.69 KiB
  allocs estimate:  1
  --------------
  minimum time:     7.599 ms (0.00% GC)
  median time:      8.442 ms (0.00% GC)
  mean time:        8.905 ms (0.00% GC)
  maximum time:     12.584 ms (0.00% GC)
  --------------
  samples:          561
  evals/sample:     1

But a SharedSnpArray uses more memory than SnpArray: (why is it 16 and 80 bytes??)

julia> sizeof(x)
16

julia> sizeof(s)
80

unsafe creation of SnpData from Plink file

@kose-y When creating SnpData from Plink files,

snp_info = categorical!(

the CSV.read function opens the bim function without closing it. A safer way is

snp_info = open(plink_bim_fike) do io
    categorial!(CSV.read(io,  delim='\t', header=SNP_INFO_KEYS, 
    types=[String, String, Float64, Int, String, String]), [:allele1, :allele2])
end

Need to check the readdlm function as well

person_info = convert(DataFrame, readdlm(plink_fam_file, AbstractString))

May be safer to use

person_info = open(plink_fam_file) do io
    DataFrame(readdlm(io, AbstractString))
end

Repeatedly filtering Plink files based on integer indices

When using filter() with integer indices multiple times, the result is overwritten on the existing filtered SnpArray.

versioninfo()
Julia Version 1.0.1
Commit 0d713926f8 (2018-09-29 19:05 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, ivybridge)
const mouse = SnpArray(SnpArrays.datadir("mouse.bed"))
s0 = SnpArrays.filter(SnpArrays.datadir("mouse"), 1:2, 1:5)
s0
2×5 SnpArray:
 0x02  0x02  0x02  0x02  0x03
 0x02  0x02  0x03  0x02  0x02
s1 = SnpArrays.filter(SnpArrays.datadir("mouse"), 1:2, 1:3)
s2 = SnpArrays.filter(SnpArrays.datadir("mouse"), 1:2, 4:5)
#s3 = SnpArrays.filter(SnpArrays.datadir("mouse"), 3:4, 1:3);
s1
2×3 SnpArray:
 0x02  0x03  0x00
 0x02  0x02  0x00
s2
2×2 SnpArray:
 0x02  0x03
 0x02  0x02
s0
2×5 SnpArray:
 0x02  0x03  0x00  0x00  0x00
 0x02  0x02  0x00  0x00  0x00

Edge case for function summarize()

A trivial edge case issue: In either summarize() function if m = 0 (i.e., the SnpArray is empty), then each calculation of maf includes a divide by zero. I suggest simply making those statements conditional on m > 0.

filtering for the SnpData type

As suggested by Alfonso in the OpenMendel programming workshop, we should implement a filtering function for the SnpData type directly. Current filter function only works with SnpArray.

Feature request: calculate GRM within pedigrees

suggested by @KennethLange

The interface can be something like

grm_within_pedigree(A::SnpData; method::Symbol = :GRM, memory_limit::Real = 2.0^30, maf_threshold::Real = 0.01)

SnpData is required here to read in pedigrees. maf should be calculated based on all individuals though.

memory leak when copying/converting SnpArray to matrix

Reported by @ericsobel and @biona001.

Following code

using SnpArrays
hapmap = SnpArray(Pkg.dir("SnpArrays") * "/docs/hapmap3")
n, snps = size(hapmap)
outv = zeros(n)
@time for s in 1:snps
    copy!(outv, hapmap[:, s], model=:additive, impute=false, center=false, scale=false)
end

shows suspiciously high memory allocation

  1.987208 seconds (9.32 M allocations: 147.676 MiB, 0.62% gc time)

Using @views doesn't help:

@time for s in 1:snps
    copy!(outv, hapmap[:, s], model=:additive, impute=false, center=false, scale=false)
end

yields

  2.007899 seconds (9.72 M allocations: 151.644 MiB, 0.61% gc time)

Converting to a matrix shows similar memory allocation:

outm = zeros(n, snps)
@time copy!(outm, hapmap)

yields

  0.161410 seconds (9.19 M allocations: 140.673 MiB, 6.21% gc time)

Machine information:

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)

SnpArrays.jl version is v0.0.1 (d878a31).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.