aj-grant / navmix Goto Github PK

R 81.73% HTML 18.27%

navmix's Issues

Replicating Navmix

Good afternoon,

I hope this email finds you well

My name is Ange, I am currently trying to reproduce BMI results from the Navmix paper. I have some questions with regard to it.

I have performed all the steps and tried to reproduce the results but keep having four clusters and one noise cluster. I have the same number of variants, but I think this may be due to different beta or se. So here are some questions :
Did you use the raw or the rint gwas from the Neale lab (this is not clear in the paper) ? I have used raw, but I recognize that maybe you have used different datasets.
Did you harmonize the file? I have noticed that Neale's lab effect alleles were not the same as the Pulit BMI effect alleles.
As phenoscanner is not working, I was planning to use the Summary statistics of CARDIOGRAM 2022, I believe the results should be similar.
Do you have any other ideas/suggestions to help me troubleshooting?

Best

Ange

utilizing the correlation matrix

Hi Andrew,

In the code provided for the simulation data, I can see that the correlation matrix (b_cor) is used directly as navmix() input. Given that the input matrix is suggested to undergo both standardization and normalization (like what was done forb_prop), I am wondering if the b_cor matrix should undergo something similar?

For reference, I did the following:

# R.version                      
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          4                           
minor          1.0                         
year           2021                        
month          05                          
day            18                          
svn rev        80317                       
language       R                           
version.string R version 4.1.0 (2021-05-18)
nickname       Camp Pontanezen   

### Since my traits are highly correlated and determined in the same sample, I need to use the correlation matrix
corX <- cor(traits, method=c("spearman"))  # traits have undergone rank-based inverse normal transformation
b_cor = matrix(nrow = n_all, ncol = m) #n_all=number of variants, m=number of traits
for (j in 1:n_all){
    S1 = diag(se[j, ])
    S = S1 %*% corX %*% S1
    b_cor[j, ] = solve(sqrtm(S), bhat[j, ])
  }
nav_out_cor <- navmix(b_cor, ...) 

### I also created a b_prop matrix since I interpreted the paper as to say that the input should be standardized and normalized
# For every _Beta_ for each SNP x Trait combination, I divided by its SE. The results were stored in a matrix called b_std.
# To normalize, I used the code provided in the simulation example:
b_prop = navmix::row_norm(data.matrix(b_std))
# I noticed that b_prop is never used as input for navmix()... is that indeed correct? Otherwise, I would do:
nav_out_prop <- navmix(b_prop, ...)

# Only b_std and b_cor look to be used as navmix() input in the simulation code.

Please let me know if you require any clarification. To try and summarize my questions:

Does b_cor need to be standardized+normalized prior to using with navmix()?
Is b_prop supposed to be run with navmix()?

Thanks!
Jacqueline

[Question] Effect of sample size

The authors of a recent preprint using ClustImpute to cluster diabetes-associated variants based on associations with cardiometabolic traits recommend using a sample-size adjusted Z-score for clustering ($Z=\beta/(\sqrt{N}*SE)$) to "provide more uniform weighting": https://www.medrxiv.org/content/10.1101/2023.03.31.23287839v1. While the NAvMix paper suggests the impact of sample size should be relative modest, empirically this additional transformation does seem to influence the number and composition of clusters that are identified when effective sample sizes vary substantially across traits. I'm curious whether this additional transformation may be reasonable as an alternative primary or sensitivity analysis?

correlation matrix for traits

Hi there,

This algorithm fits very neatly in a few of our ongoing projects. However, for some, we wanted to make use of a correlation matrix, as traits are measured in overlapping samples/are correlated. While using the row_standardise function, we encountered complex numbers as a results, which is not quite what the downstream algorithm expected. Squaring the matrix helped, to make it all positive, but actually looses a lot of relevant information. Any advice would be much appreciated.

Best,
Maik

aj-grant / navmix Goto Github PK

navmix's Issues

Replicating Navmix

utilizing the correlation matrix

[Question] Effect of sample size

correlation matrix for traits

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent