Git Product home page Git Product logo

knockoffs.jl's Issues

Possibly reduce artifacts size

Knockoffs.jl will soon become a dependency for GhostKnockoffGWAS, which will be compiled into a relocatable app.

Knockoffs's dependency LowRankApprox.jl depends on FFTW.jl which in turn depends on MKL_jll.jl which is a staggering 596.668 MiB. Can we remove this dependency?

Fixed-X knockoffs X.ko'X.ko

Hi Benjamin,

I hope you are doing well!

I wanted to check about the following. The following seems not to match (for any of the fixed-X knockoff constructions):

me = fixed_knockoffs(X, :maxent)
maximum(abs.(me.Xko'me.Xko .- me.Sigma)) #large 

Do you know why that is? (Happy to provide a MWE if it helps, though it happens for all designs I have tried it on.)

Thank you,
Nikos

Performance regression

As reported by Jiaqi, there seems to be some significant performance regression for group knockoff solver.

MWE: (Knockoffs.jl v1.1.7 and on Julia 1.10)

using Knockoffs
using LinearAlgebra
using Random
using CSV
using DataFrames

# gnomad test data
datadir = "/Users/biona001/Benjamin_Folder/research/4th_project_groupKO/group_knockoff_test_data"
mapfile = CSV.read(joinpath(datadir, "Groups_2_127374341_128034347.txt"), DataFrame)
groups = mapfile[!, :group]
covfile = CSV.read(joinpath(datadir, "CorG_2_127374341_128034347.txt"), DataFrame)
Σ = covfile |> Matrix{Float64}
Σ = 0.99Σ + 0.01I

@time solve_s_group(
    Symmetric(Σ), groups, :maxent, 
    m = 1,          # number of knockoffs per variable to generate
    tol = 0.001,    # convergence tolerance
    outer_iter = 10,    # max number of coordinate descent iterations
    robust = false, # whether to use robust cholesky updates
    verbose = true    # whether to print informative intermediate results
)

┌ Warning: Maximum group size is 192, optimization may be slow. Consider running `modelX_gaussian_rep_group_knockoffs` to speed up convergence.
└ @ Knockoffs ~/.julia/dev/Knockoffs/src/group.jl:360
Maxent initial obj = -44822.11340914695
Iter 1 (PCA): obj = -26794.756871827496, δ = 2.6915336716594664, t1 = 24.96, t2 = 32.32
Iter 2 (CCD): obj = -23681.635922280064, δ = 0.5920258362537, t1 = 534.9, t2 = 881.71, t3 = 0.22
Iter 3 (PCA): obj = -23476.599883936684, δ = 1.0851838794654751, t1 = 559.7, t2 = 913.86
Iter 4 (CCD): obj = -23441.194586522815, δ = 0.01917853738286209, t1 = 1059.27, t2 = 1761.83, t3 = 0.44
Iter 5 (PCA): obj = -23428.13516213957, δ = 0.20912865614936338, t1 = 1085.4, t2 = 1794.69
... # terminated

On Knockoffs.jl v0.3.0 (Julia 1.6.7), see Fast Cholesky update notebook

solve_group_max_entropy_ccd: Optimizing 58997 variables
┌ Warning: Maximum group size is 192, optimization may be slow. Consider running `modelX_gaussian_rep_group_knockoffs` to speed up convergence.
└ @ Knockoffs /Users/biona001/.julia/dev/Knockoffs/src/group.jl:230
initial obj = -13811.93738418022
Iter 1: obj = -8863.340455999352, δ = 0.9033274319764998, t1 = 8.85, t2 = 18.23, t3 = 0.04
Iter 2: obj = -7509.4699925543855, δ = 0.7627302489344706, t1 = 19.04, t2 = 36.57, t3 = 0.08
Iter 3: obj = -7495.512979153219, δ = 0.153471691368552, t1 = 29.25, t2 = 54.77, t3 = 0.12
Iter 4: obj = -7491.027981075002, δ = 0.0467433749392695, t1 = 39.2, t2 = 72.95, t3 = 0.16
Iter 5: obj = -7489.0243504490645, δ = 0.007200479138871564, t1 = 49.08, t2 = 90.98, t3 = 0.2
Iter 6: obj = -7487.899557778442, δ = 0.004873425110742543, t1 = 58.98, t2 = 109.34, t3 = 0.24
Iter 7: obj = -7487.198762711129, δ = 0.0031886466238885947, t1 = 68.75, t2 = 127.98, t3 = 0.28
Iter 8: obj = -7486.7318922165505, δ = 0.002106998635746351, t1 = 78.17, t2 = 146.58, t3 = 0.32
Iter 9: obj = -7486.405698134362, δ = 0.0011212737859179697, t1 = 87.39, t2 = 165.21, t3 = 0.37
Iter 10: obj = -7486.1699950030525, δ = 0.0006708441935208511, t1 = 96.52, t2 = 183.44, t3 = 0.41
283.395020 seconds (3.23 M allocations: 329.596 MiB, 0.05% gc time, 0.61% compilation time)

Initial objective value being different is due to different initialization of S. However the timing seems much slower, even if ran on different computers.

HMM normalizing constant overflows

The following creates MWE data (test.bed/bim/fam)

using SnpArrays
using MendelIHT
using Random
Random.seed!(2022) # set seed for reproducibility
n = 1000 # sample size
p = 10000 # number of covariates
x = simulate_correlated_snparray("test.bed", n, p, block_length=20, prob=0.75)
make_bim_fam_files(x, rand(1000), "test")

and a call to fastPHASE HMM knockoffs will throw NaN error that is caused by normalizing constants overflowing

@time= hmm_knockoff(data_path, plink_outfile="test.fastphase.knockoffs")

Possible numerical error when initializing with equi correlation solution

For group knockoffs, when we use SDP/ME/MVR solvers, we initiailzing the optimzation with equicorrelated S matrix. However, that S matrix is the largest S such that (m+1)/m*Sigma-S > 0. This means that when we compute cholesky((m+1)/m*Sigma-S), we could potentially introduce numerical stabilities.

For this reason, the next minor patch release might initialize S with, for example, equi.S ./ 2 to naively circumvent this issue.

Normalizing to unit vector introduces numerical instabilities

MWE

Only centering and scaling each column to mean 0 variance 1 (but not to unit norm)

using Knockoffs, Random, StatsBase, Statistics

Random.seed!(2021)
n = 3000
p = 1000
X = randn(n, p)
zscore!(X, mean(X, dims=1), std(X, dims=1))
knockoff = knockoff_equi(X);

X̃ = knockoff.X̃
Σ = knockoff.Σ

julia> [vec(X̃' * X̃) vec(Σ)]
1000000×2 Matrix{Float64}:
 2999.0      2999.0
   68.3776     68.3776
  -22.3342    -22.3342
  -43.2302    -43.2302
  -93.228     -93.2281
   32.9963     32.9963
  -43.5092    -43.5092
   71.2002     71.2002
  -91.1404    -91.1404
  -28.554     -28.554
  -31.7598    -31.7598
  -13.1698    -13.1698
  -34.6885    -34.6885
   22.8903     22.8903
   15.2742     15.2742
   36.5119     36.5119
  -23.4127    -23.4127
    
  -95.6452    -95.6452
  -79.5508    -79.5508
 -150.037    -150.037
  -24.5637    -24.5637
  -86.4215    -86.4215
  -26.914     -26.914
    2.37256     2.37256
  -68.1161    -68.1161
  -38.4121    -38.4121
  -17.9981    -17.9981
   16.1139     16.1139
   31.8385     31.8385
  -19.9923    -19.9923
   31.6304     31.6305
  151.434     151.434
  -32.3494    -32.3494
 2999.0      2999.0

Normaling each column so that ||x_j||_2^2 = 1:

using Knockoffs, Random

Random.seed!(2021)
n = 3000
p = 1000
X = randn(n, p)
normalize_col!(X)
knockoff = knockoff_equi(X);

X̃ = knockoff.X̃
Σ = knockoff.Σ

julia> [vec(X̃' * X̃) vec(Σ)]
1000000×2 Matrix{Float64}:
  1.0006        0.999942
  0.0113254     0.0227959
 -0.0117438    -0.007447
 -0.00865152   -0.0144145
 -0.0298051    -0.0310659
  0.00509079    0.0110014
 -0.0118063    -0.0145065
  0.0198903     0.0237353
 -0.0204552    -0.0303879
 -0.00816176   -0.00951761
 -0.00766452   -0.0105895
 -0.00383353   -0.00439077
 -0.00735197   -0.0115651
  0.00505043    0.00762776
 -0.000776242   0.00509295
  0.0112497     0.0121713
 -0.00566405   -0.00780654
  
 -0.0245307    -0.0318917
 -0.0281486    -0.0265195
 -0.0363757    -0.0500211
 -0.00516446   -0.00819044
 -0.0156618    -0.02881
 -0.0116239    -0.00897332
 -0.00495919    0.00079108
 -0.0227494    -0.0227123
 -0.00883139   -0.0128069
 -0.0114954    -0.00600044
  0.00750958    0.00537207
  0.0062047     0.010616
 -0.00481099   -0.00666484
  0.00555084    0.010545
  0.0406162     0.050493
 -0.0076684    -0.0107781
  0.994321      0.999956

SDP solver a lot slower than Matlab/R equivalent

Currently the SDP solver in Knockoffs.jl is ~2 times slower than knockoff package in R and ~4 times slower than the knockoff package in Matlab. Speed difference is entirely due to underlying solver (Hypatia SDP solver+ Convex.jl in Julia vs R's Rdsdp vs Matlab's SDPT3 solver)

How do we could get more performance?

MWE

using Revise
using Knockoffs
using Test
using LinearAlgebra
using Random
using StatsBase
using Statistics
using Distributions
using ToeplitzMatrices
using RCall
using PositiveFactorizations
using UnicodePlots
using MATLAB
using SCS
using JuMP
using Convex

# simulate data
Random.seed!(2022)
n = 100
p = 400
ρ = 0.4
Sigma = Matrix(SymmetricToeplitz.^(0:(p-1))))
L = cholesky(Sigma).L
X = randn(n, p) * L # var(X) = L var(N(0, 1)) L' = var(Σ)
true_mu = zeros(p)

Knockoffs.jl

@time knockoff = modelX_gaussian_knockoffs(X, :sdp, true_mu, Sigma) # precompile
s = knockoff.s
  2.686531 seconds (507.05 k allocations: 399.686 MiB, 2.05% gc time)

Compare with Matteo's knockoffs in R

@rput Sigma X true_mu
@time begin
    R"""
    library(knockoff)
    r_s = create.solve_sdp(Sigma)
    X_ko = create.gaussian(X, true_mu, Sigma, method = "sdp", diag_s=r_s)
    """
end
@rget r_s X_ko
  1.208416 seconds (54 allocations: 1.328 KiB)

Compare with Matteo's knockoffs in Matlab

# compare with Matteo's knockoffs in MATLAB
@mput Sigma X true_mu
@time begin
    mat"""
    addpath("/Users/biona001/Documents/MATLAB/knockoffs_matlab")
    matlab_s = knockoffs.create.solveSDP(Sigma)
    """
end
@mget matlab_s
  0.715965 seconds (7 allocations: 192 bytes)

At least the answer agrees:

[s r_s matlab_s]
200×3 Matrix{Float64}:
 1.0       1.0       1.0
 0.971429  0.971427  0.971428
 0.811429  0.811431  0.811429
 0.875429  0.875427  0.875428
 0.849829  0.849829  0.849829
 0.860069  0.860068  0.860068
 0.855973  0.855973  0.855973
 0.857611  0.857611  0.857611
 0.856956  0.856956  0.856956
 0.857218  0.857218  0.857218
 0.857113  0.857113  0.857113
 0.857155  0.857155  0.857155
 0.857138  0.857138  0.857138
                    
 0.857155  0.857155  0.857155
 0.857113  0.857113  0.857113
 0.857218  0.857218  0.857218
 0.856956  0.856956  0.856956
 0.857611  0.857611  0.857611
 0.855973  0.855973  0.855973
 0.860069  0.860068  0.860068
 0.849829  0.849829  0.849829
 0.875429  0.875427  0.875428
 0.811429  0.811431  0.811429
 0.971429  0.971427  0.971428
 1.0       1.0       1.0

Efficient sampling of multiple knockoffs

There is a trick to sample m-knockoffs without forming a p(m+1) * p(m+1) covariance in memory. The trick was used in sample_mvn_efficient but not yet in condition. Need to do so asap.

Improve documentation

  • Group knockoffs: Missing documentation for representative group knockoffs
  • Calling from R: objective function wrong
  • Need docstrings for solve_s_graphical_group and related methods
  • GhostKnockoffs: update reference
  • Example in model-X knockoffs: increase sample size or effect size so power is better
  • The :SDP option in group knockoffs should be :sdp

Also update readme:

  • Maybe add message for pr/bugs
  • update installation instructions once package is registered

`choose_group_reps` returns stochastic/incorrect results

Related to #71

MWE:

using Knockoffs, LinearAlgebra
Sigma = Symmetric([1.0 0.14138129031570185 -0.07530156753912527 0.1425092255960416 -0.03975762881225776 -0.03975762881225776; 0.14138129031570185 1.0 -0.06269073019504869 0.18796864772097638 -0.042632500750932216 -0.042632500750932216; -0.07530156753912527 -0.06269073019504869 1.0 0.10393405423314547 0.8269051049537056 0.8269051049537056; 0.1425092255960416 0.18796864772097638 0.10393405423314547 1.0 0.1598138853324871 0.1598138853324871; -0.03975762881225776 -0.042632500750932216 0.8269051049537056 0.1598138853324871 1.0 0.9999999999999998; -0.03975762881225776 -0.042632500750932216 0.8269051049537056 0.1598138853324871 0.9999999999999998 1.0])
groups = [1, 2, 3, 4, 3, 3]

julia> group_reps = choose_group_reps(Sigma, groups, threshold=0.9)
6-element Vector{Int64}:
     1
     2
     3
     4
     5
 72037

julia> group_reps = choose_group_reps(Sigma, groups, threshold=0.9)
6-element Vector{Int64}:
               1
               2
               3
               4
               5
 139762467531552

julia> group_reps = choose_group_reps(Sigma, groups, threshold=0.9)
6-element Vector{Int64}:
     1
     2
     3
     4
     5
 66968

`choose_group_reps`: reducing over an empty collection is not allowed

MWE

using Knockoffs, LinearAlgebra
Sigma = Symmetric([1.0 0.14138129031570185 -0.07530156753912527 0.1425092255960416 -0.03975762881225776 -0.03975762881225776; 0.14138129031570185 1.0 -0.06269073019504869 0.18796864772097638 -0.042632500750932216 -0.042632500750932216; -0.07530156753912527 -0.06269073019504869 1.0 0.10393405423314547 0.8269051049537056 0.8269051049537056; 0.1425092255960416 0.18796864772097638 0.10393405423314547 1.0 0.1598138853324871 0.1598138853324871; -0.03975762881225776 -0.042632500750932216 0.8269051049537056 0.1598138853324871 1.0 0.9999999999999998; -0.03975762881225776 -0.042632500750932216 0.8269051049537056 0.1598138853324871 0.9999999999999998 1.0])
groups = [1, 2, 3, 4, 3, 3]
choose_group_reps(Sigma, groups, threshold=0.5)

ERROR: MethodError: reducing over an empty collection is not allowed; consider supplying `init` to the reducer
Stacktrace:
  [1] reduce_empty(op::Base.MappingRF{Base.var"#316#317"{…}, Base.BottomRF{…}}, ::Type{Pair{…}})
    @ Base ./reduce.jl:361
  [2] reduce_empty_iter
    @ ./reduce.jl:384 [inlined]
  [3] reduce_empty_iter
    @ ./reduce.jl:383 [inlined]
  [4] foldl_impl
    @ ./reduce.jl:49 [inlined]
  [5] mapfoldl_impl
    @ ./reduce.jl:44 [inlined]
  [6] mapfoldl
    @ ./reduce.jl:175 [inlined]
  [7] _findmax
    @ ./reduce.jl:908 [inlined]
  [8] findmax
    @ ./reducedim.jl:1226 [inlined]
  [9] _findmax
    @ ./reduce.jl:934 [inlined]
 [10] findmax
    @ ./reducedim.jl:1203 [inlined]
 [11] select_one(C::Matrix{Float64}, vlist::Vector{Int64}, RSS0::Int64, tol::Float64)
    @ Knockoffs /home/groups/sabatti/.julia/packages/Knockoffs/fRlxF/src/group.jl:2083
 [12] select_one(C::AbstractMatrix, vlist::Any, RSS0::Any, tol::Any)
    @ Knockoffs /home/groups/sabatti/.julia/packages/Knockoffs/fRlxF/src/group.jl:2081 [inlined]
 [13] select_best_rss_subset(C::SubArray{Float64, 2, Symmetric{…}, Tuple{…}, false}, k::Int64)
    @ Knockoffs /home/groups/sabatti/.julia/packages/Knockoffs/fRlxF/src/group.jl:2099
 [14] choose_group_reps::Symmetric{…}, groups::Vector{…}; threshold::Float64, prioritize_idx::Nothing, Σinv::Symmetric{…})
    @ Knockoffs /home/groups/sabatti/.julia/packages/Knockoffs/fRlxF/src/group.jl:1998
 [15] top-level scope
    @ REPL[4]:1
Some type information was truncated. Use `show(err)` to see complete types

This is on Julia v1.10 and Knockoffs.jl version v1.1.9

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.