Comments (5)
I'm not sure I follow exactly what you want to do.
Could you maybe provide some (inefficient) R code that works on a very small example?
from bigstatsr.
I have a dataset with repeated outcome measurements (7 time-points). I've transformed the data to long format, and I want to run a GW-LMM, with IID as the random effect (currently neither plink nor regenie can deal with repeated IIDs). I'm trying to figure out the fastest way to do this, and was hoping to use bigstatsr/bigsnpr to load the geno data and perfrom the regression using big_parallelize (or something similar). Below is a quickly simulated dataset to show the structure of the data I'm using and a summary of the model I'm trying to use. Im wondering if I would gain speed by splitting it into steps -- step1 residualising all SNPs for PCs and using snp_save to save the resids as an FBM? then in step2 running the mixed model using resids and big_apply? Anyway.. here's an example input and the output I'm looking for... which would then be combine across blocks of SNPs. Many thanks for you're help with this!
library(ggplot2)
library(dplyr)
library(tidyr)
library(faux)
library(GGally)
# simulating PCs from genotype data
pc <-
rnorm_multi(
n = 10000,
mu = 0,
sd = 1,
r = 0,
varnames = c(paste0("PC",1:10)),
empirical = F
) %>%
mutate(IID = row_number())
# simulating longitudinal outcome data with age covariate
df <-
rnorm_multi(
n = 10000,
mu = 50,
sd = 10,
r = 0.4,
varnames = c(paste0("Y_Q0",1:7)),
empirical = F
) %>%
mutate(
IID = row_number(),
Age_Q01 = rnorm(10000, mean=35, sd=5)
) %>%
mutate(
Age_Q02 = Age_Q01 + 0.5,
Age_Q03 = Age_Q01 + 1,
Age_Q04 = Age_Q01 + 3,
Age_Q05 = Age_Q01 + 5,
Age_Q06 = Age_Q01 + 7,
Age_Q07 = Age_Q01 + 8
)
# adjusting phenotype for genotype PCs (this 2 step process is done in regenie)
yRes <-
df %>%
full_join(pc, "IID") %>%
mutate(
across(matches("Y"),
~ rstandard(lm(.x~PC1+PC2+PC3+PC4+PC5+PC6+PC7+PC8+PC9+PC10,df))
)) %>%
select(IID, matches("Y_|Age"))%>%
pivot_longer(
cols = !IID,
names_to = c(".value", "Wave"),
names_sep = "_"
)
# adjusting genotypes for PCs (again... this 2 step process is done in regenie)
gRes <-
rnorm_multi(
n = 10000,
mu = 0,
sd = 1,
r = 0,
varnames = c(paste0("PC",1:10)),
empirical = F
) %>%
mutate(
IID = row_number(),
SNP = rbinom(10000, 2, 0.5)
) %>%
mutate(
SNPRes = rstandard(lm(as.formula(paste0("SNP ~ ",paste0("PC",1:10, collapse="+"))), data = .))
) %>%
select(IID,SNPRes)
df <- yRes %>% full_join(gRes)
# run mixed model... for each SNP I want the main effect of SNP and its interaction with age
output <-
summary(
lme4::lmer(Y ~ SNPRes + SNPRes*Age + (1|IID), data=df)
)$coefficients
from bigstatsr.
If you want to get a residualized version of the genotype matrix, that shouldn't be too hard. You can implement this using Rcpp or big_apply() using the linear algebra trick I'm using in big_univLinReg()
.
If you want to implement your own mixed model, I have no experience with this, so won't be of much help.
from bigstatsr.
Ok thanks!
from bigstatsr.
If you need help with this, feel free to comment and reopen.
from bigstatsr.
Related Issues (20)
- Task 1 failed: "'from' must be a finite number" HOT 2
- Error in docs for big_spLogReg {bigstatsr} HOT 1
- Add convenience function to make own parameter 'fun.scaling' HOT 1
- Downloading bigstatsr not working on mac HOT 1
- Question about predictor to instance ratio when using big_spLinReg or big_spLogReg HOT 3
- How to efficiently normalized imputation probabilities of the genotype matrix to between zero and one? HOT 1
- Bug in big_tcrossprodSelf: backingfile option missing HOT 5
- big_tcrossprodSelf: requested size is too large; suggest to enable ARMA_64BIT_WORD HOT 5
- How does dfmax work? HOT 2
- Low CPU %? HOT 3
- Could I use `big_spLogReg()` for multi-class L1-regularized logistic regression? HOT 11
- segfault with big_prodMat HOT 2
- Read data using ‘bigsnpr’ package and run Linear mixed effect model using other packages HOT 3
- Greater use of generics HOT 6
- Any matrix multiplication function support sparse matrix? HOT 3
- adding columns to FBM HOT 4
- Support for character type data? HOT 1
- OpenMP flag is not passed HOT 19
- Estimation of model performance HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bigstatsr.