genetics-statistics / gemma Goto Github PK
View Code? Open in Web Editor NEWGenome-wide Efficient Mixed Model Association
Home Page: https://github.com/genetics-statistics/GEMMA
License: GNU General Public License v3.0
Genome-wide Efficient Mixed Model Association
Home Page: https://github.com/genetics-statistics/GEMMA
License: GNU General Public License v3.0
Hi @pjotrp, after your changes, I get the following error when running make -f Makefile.linux
with g++ 4.8.5 and GSL 2.2.1 on RedHat Linux:
In file included from src/param.h:22:0,
from src/gemma.h:22,
from src/main.cpp:19:
src/debug.h:16:55: error: expected initializer before '__THROW'
const char *__function) __THROW
^
make: *** [src/main.o] Error 1
Hi,
I'm wondering if there's a way to obtain the proportion of variance explained after adjustment on each snp in the GEMMA output after univariate LMM for marker association tests.
Indeed, I can find the PVE of the NULL model in the log.txt (I suppose that the fixed covariates independant of the betaX vector are included in the null model, but I'm not sure ),
and I can find the Lamba after adjustment for each snp tested, but this Lambda gives me the proportion of variance explained by the genetic relatedness matrix compared to the residual one, if I understood correctly. I haven't access to the "new" total variance with the marker as covariate, neither to the variance explained by the tested marker. Is there a way I haven't seen to compute it based on GEMMA outputs ?
Thank you !
./src/io.o: In function GetabIndex(unsigned long, unsigned long, unsigned long)': io.cpp:(.text+0x4ae0): multiple definition of
GetabIndex(unsigned long, unsigned long, unsigned long)'
./src/param.o:param.cpp:(.text+0x2500): first defined here
./src/lmm.o: In function GetabIndex(unsigned long, unsigned long, unsigned long)': lmm.cpp:(.text+0x1760): multiple definition of
GetabIndex(unsigned long, unsigned long, unsigned long)'
./src/param.o:param.cpp:(.text+0x2500): first defined here
./src/vc.o: In function GetabIndex(unsigned long, unsigned long, unsigned long)': vc.cpp:(.text+0x1600): multiple definition of
GetabIndex(unsigned long, unsigned long, unsigned long)'
./src/param.o:param.cpp:(.text+0x2500): first defined here
./src/vc.o: In function ReadHeader(std::string const&, HEADER&)': vc.cpp:(.text+0x81d0): multiple definition of
ReadHeader(std::string const&, HEADER&)'
./src/io.o:io.cpp:(.text+0x6cd0): first defined here
collect2: error: ld returned 1 exit status
make: *** [bin/gemma] Error 1
Needed for packaging in Brew and Conda.
This is a minor issue: it's useful to ask git to ignore some files. For instance, when doing "git status -s", one usually doesn't want to see all object files generated by gcc/g++. This can easily be done by adding a file named ".gitignore" at the root of the repo (note the dot ".", it's a hidden file). Inside this file, one can add lines such as:
*.o
*~
More explanations here.
Hi
I have been trying to run a univariate linear mixed model. However I was hoping to use a grm I have made else where- it is the second in grm format. The first few lines are as as follows
1 1 8.923316e-01
1 2 -5.544370e-02
1 3 1.903249e-02
1 4 1.880089e-02
1 5 9.801378e-02
1 6 -3.117359e-02
1 7 2.540684e-01
1 8 -1.174230e-01
1 9 1.168132e-01
1 10 1.833550e-01
Then I am trying to run a lmm using the follow command
./gemma.linux -bfile mouse_hs1940 -km 2 -k mouse_plinkgctasorted.txt -lmm 2 -o mouse_plinkgctasorted
However I keep getting the following error. Any advice on how to fix this?
Reading Files ...
## number of total individuals = 1940
## number of analyzed individuals = 1410
## number of covariates = 1
## number of phenotypes = 1
## number of total SNPs = 12226
## number of analyzed SNPs = 10768
Start Eigen-Decomposition...
gsl: newton.c:74: ERROR: derivative is zero
Default GSL error handler invoked.
Aborted
The file will run the analysis when I supply a gemma grm, however the file I intend on supplying will be much larger and would prefer to make the grm elsewhere.
Can the manual (latex + pdf) as well as the demo files be added to the repo?
It isn't usual to include a built binary in the source repository. Could bin/gemma be removed from git?
Hi, I'm trying to install gemma onto a linux cluster. However my version of gsl is not in the standard '/usr/bin' location and cannot be installed there because of admin privileges.
Regardless, I cannot get the GEMMA Makefile to find where gsl is installed on the cluster. If anyone could give advice on how to edit the Makefile to correctly find gsl, that would be greatly appreciated.
Thanks!
Hi Xiang,
I am running v0.94. When I load in R the file kinship.cXX.txt
or kinship.sXX.txt
, I can see that there is an extra column.
In the code section starting at line 627 of param.h, I think, you shouldn't write the tab \t
after each number. Instead, for each line, you should write the first number, and then enter into a loop which write the tab followed by the number corresponding to this iteration.
Tim
ps: sorry, I'm waiting for my sysadmin to install LAPACK and other libraries to be able to compile GEMMA by myself. Otherwise, I'd have done a pull request...
Hi Xiang,
is this "not" a typo? (page 14 in the manual)
By default, SNPs with minor allele frequency above 1% will not be included in the analysis.
if I plan to include all SNPs in the analysis, I just need to use "-maf 0"?
Another two related question:
-notsnp
the same as -miss 0 -maf 0 -r2 1 -hwe 0
?-snps
, will some of the SNPs still be filtered out due to the missingness and/or minor allele frequencies?Thanks!
Hi Xiang,
Any time I try to run GEMMA with the "-gxe" option it gives me the following error:
Start Eigen-Decomposition...
pve estimate =0.130578
se(pve) =0.195554
gsl: brent.c:202: ERROR: function value is not finite 0.00%
Default GSL error handler invoked.
Segmentation fault (core dumped)
The only way it seems to run is if all the values in the "-gxe" file are 0.
Is this issue specific to me?
Thanks
Hello there,
I'm trying to compile GEMMA on my system. I changed my lapack libraries directory. So it should not be a problem with this. Here's what I get :
$ make
g++ -Wall -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static -c src/main.cpp -o src/main.o
g++ -Wall -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static -c src/param.cpp -o src/param.o
In file included from src/Eigen/Cholesky:24:0,
from src/Eigen/Dense:3,
from src/mathfunc.h:24,
from src/param.cpp:35:
src/Eigen/src/Cholesky/LDLT.h: In member function ‘void Eigen::internal::solve_retval<Eigen::LDLT<MatrixType, _UpLo>, Rhs>::evalTo(Dest&) const’:
src/Eigen/src/Cholesky/LDLT.h:505:39: attention : typedef ‘Scalar’ locally defined but not used [-Wunused-local-typedefs]
typedef typename LDLTType::Scalar Scalar;
^
src/param.cpp: In member function ‘void PARAM::ReadFiles()’:
src/param.cpp:281:137: attention : ‘W’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (ReadFile_bed (file_str, setSnps, W, indicator_idv, indicator_snp, snpInfo, maf_level, miss_level, hwe_level, r2_level, ns_test_tmp)==false) {error=true;}
^
g++ -Wall -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static -c src/gemma.cpp -o src/gemma.o
src/gemma.cpp:54:17: erreur fatale: ldr.h : Aucun fichier ou dossier de ce type
#include "ldr.h"
^
compilation terminée.
make: *** [src/gemma.o] Erreur 1
Sorry, the system is in french, but you see there that the error is coming from the fact that it's not finding the class file ldr.h.
Thanks for your help.
JC
I've been going through the demo and have experienced segmentation faults for one of the example processes. The rest worked fine.
On command line:
> ../bin/gemma -g mouse_hs1940.geno.txt.gz -p mouse_hs1940.pheno.txt -n 1 6 -a mouse_hs1940.anno.txt -k ./output/mouse_hs1940.cXX.txt -lmm -o mouse_hs1940_CD8MCH_lmm
Reading Files ...
## number of total individuals = 1940
## number of analyzed individuals = 1197
## number of covariates = 1
## number of phenotypes = 2
## number of total SNPs = 12226
## number of analyzed SNPs = 10758
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model:
1.3940
-0.2267 2.0817
se(Vg):
0.1567
0.1363 0.2359
REMLE estimate for Ve in the null model:
0.3489
0.0491 0.4144
se(Ve):
0.0206
0.0166 0.0267
REMLE likelihood = -2855.1664
MLE estimate for Vg in the null model:
1.3959
-0.2267 2.0854
se(Vg):
0.1568
0.1365 0.2361
MLE estimate for Ve in the null model:
0.3483
0.0490 0.4136
se(Ve):
0.0206
0.0166 0.0266
MLE likelihood = -2856.0280
Reading SNPs ==================================================100.00%
Segmentation fault (core dumped)
Running on gdb:
(gdb) file ../bin/gemma
Reading symbols from ../bin/gemma...done.
(gdb) run -g mouse_hs1940.geno.txt.gz -p mouse_hs1940.pheno.txt -n 1 6 -a mouse_hs1940.anno.txt -k ./output/mouse_hs1940.cXX.txt -lmm -o mouse_hs1940_CD8MCH_lmm
Starting program: /home/ubuntu/Nic/Programs/bugwas/GEMMA/bin/gemma -g mouse_hs1940.geno.txt.gz -p mouse_hs1940.pheno.txt -n 1 6 -a mouse_hs1940.anno.txt -k ./output/mouse_hs1940.cXX.txt -lmm -o mouse_hs1940_CD8MCH_lmm
Reading Files ...
## number of total individuals = 1940
## number of analyzed individuals = 1197
## number of covariates = 1
## number of phenotypes = 2
## number of total SNPs = 12226
## number of analyzed SNPs = 10758
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model:
1.3940
-0.2267 2.0817
se(Vg):
0.1567
0.1363 0.2359
REMLE estimate for Ve in the null model:
0.3489
0.0491 0.4144
se(Ve):
0.0206
0.0166 0.0267
REMLE likelihood = -2855.1664
MLE estimate for Vg in the null model:
1.3959
-0.2267 2.0854
se(Vg):
0.1568
0.1365 0.2361
MLE estimate for Ve in the null model:
0.3483
0.0490 0.4136
se(Ve):
0.0206
0.0166 0.0266
MLE likelihood = -2856.0280
Reading SNPs ==================================================100.00%
Program received signal SIGSEGV, Segmentation fault.
0x000000000046d369 in MVLMM::WriteFiles() ()
I got a similar error after using the down-loadable executable and compiling it myself.
If anyone can help me understand what this means and what I can do that would be great.
Thanks!
Normally, a make clean
invocation would remove the binary that was previously built, so that make will be sure to rebuild it. Currently, the Makefile target for clean only removes _.o and *float. targets (as well as *~ files which are not actually built).
I've been trying to use the Mac OS X version of GEMMA, but it's not working. It won't extract the files.
gunzip gemma.macosx.gz
> gunzip: gemma.macosx.gz: not in gzip format
Also, I've tried to compile GEMMA on macOS Sierra, but it's not working.
g++ -Wall -Weffc++ -O3 -std=gnu++11 -m64 ./src/main.o ./src/param.o ./src/gemma.o ./src/io.o ./src/lm.o ./src/lmm.o ./src/vc.o ./src/mvlmm.o ./src/bslmm.o ./src/prdt.o ./src/mathfunc.o ./src/gzstream.o ./src/eigenlib.o ./src/ldr.o ./src/bslmmdap.o ./src/logistic.o ./src/varcov.o /usr/local/opt/gsl@1/lib/libgsl.a /usr/local/opt/gsl@1/lib/libgslcblas.a -lz -o ./bin/gemma
g++: error: /usr/local/opt/gsl@1/lib/libgsl.a: No such file or directory
g++: error: /usr/local/opt/gsl@1/lib/libgslcblas.a: No such file or directory
make: *** [bin/gemma] Error 1
The latest version of Eigen C++ library is 3.3.3. Either: (1) update the source code or, (2) remove the source code and have the user download the Eigen source code separately.
Note: No changes were made to the original Eigen library source code.
When building gemma on my linux machine when using dynamic linking (i.e. when running make FORCE_DYNAMIC=1
), I get the following error:
g++ -Wall -O3 -DWITH_LAPACK -m64 ./src/main.o ./src/param.o ./src/gemma.o ./src/io.o ./src/lm.o ./src/lmm.o ./src/vc.o ./src/mvlmm.o ./src/bslmm.o ./src/prdt.o ./src/mathfunc.o ./src/gzstream.o ./src/lapack.o -lgsl -lgslcblas -pthread -lz -llapack -o ./bin/gemma
/usr/bin/ld: ./src/lapack.o: undefined reference to symbol 'dgemm_'
/usr/bin/ld: note: 'dgemm_' is defined in DSO /usr/lib/libblas.so.3gf so try adding it to the linker command line
/usr/lib/libblas.so.3gf: could not read symbols: Invalid operation
collect2: ld returned 1 exit status
make: *** [bin/gemma] Error 1
This can be fixed for me by modifying the Makefile line setting the LIBS variable when FORCE_DYNAMIC is set to include -lblas
It would be really nice to run the tests listed in the example directory. This would give some confidence that changes to the code base still deliver the same results.
I am trying to figure out how to make relatedness matrix and running into an error:
$gemma -bfile $BASE -epm ${BASE}.out.param.txt \
-emu ${BASE}.out.log.txt -ebv ${BASE}.out.bv.txt \
-k ${BASE}.cXX.txt \
-predict 1 -o ${BASE}.predicted
Reading Files ...
## number of total individuals = 94
## number of analyzed individuals = 94
## number of covariates = 1
## number of phenotypes = 1
gsl: init_source.c:29: ERROR: vector length n must be positive integer
Default GSL error handler invoked.
Segmentation fault
I am using pre-compiled 0.95alpha and 0.94.1 binaries on CentOS6.7.
Hi Xiang,
we used real genotypes and simulated phenotypes:
We fit the uLMM and get the following lines in the log file:
pve estimate in the null model = 0.129203
se(pve) in the null model = 0.102527
vg estimate in the null model = 33.8083
ve estimate in the null model = 71.0727
For the null model, I would expect pve ~= vg / (vg + ve), but it's not the case, where am I wrong?
Hi Xiang,
When I use gemma to perform gwas, I always getting 'pve estimate =0.99xxx se(pve) =-nan', and all results are nan(below). Here is my command line,
gemma -bfile mydata -k output/mydata_kinship.sXX.txt -miss 1 -maf 0.01 -r2 1 -lmm
Is there any wrong with the command line or my input datas (attachment)?
mydata_kinship.sXX.txt
1 chr rs ps n_miss allele1 allele0 af beta se l_remle p_wald
2 1 483 483 58 T C 0.121 nan -nan 1.000000e+05 nan
3 1 2740 2740 54 A T 0.386 nan -nan 1.000000e+05 nan
4 1 4213 4213 38 G C 0.140 nan -nan 1.000000e+05 nan
5 1 4222 4222 38 C A 0.402 nan -nan 1.000000e+05 nan
6 1 4227 4227 38 T A 0.189 nan -nan 1.000000e+05 nan
7 1 4235 4235 38 T G 0.378 nan -nan 1.000000e+05 nan
8 1 4291 4291 39 A C 0.185 nan -nan 1.000000e+05 nan
Best wishes!
The license accompanying the Eigen C++ library is missing, and should be added.
The source distribution (i.e. gemma-0.95.tar.gz) expands directly into the current working directory when expanded. This can sometimes be considered rude and as a result most software distributions provide tars that are set to expand into a subdirectory (usually named the same as the basename of the tar file).
Readme says:
INSTALLATION: An executable binary (x86 64bit Linux) is available in the bin directory.
However there is no bin
directory in this repository.
Using commit id bbbabdb
.
First test: calculate centered relatedness matrix.
x <- as.matrix(read.table("gemma/example/output/mouse_hs1940.cXX.txt"))
y <- as.matrix(read.table("gemma_new/example/output/mouse_hs1940.cXX.txt"))
print(max(abs(x - y)))
# R> 1e-12
Second test: association tests with a univariate LMM.
cols <- c("af","beta","se","l_remle","p_wald")
x <- read.table("~/git/gemma/example/output/mouse_hs1940_CD8_lmm.assoc.txt",
header = TRUE,stringsAsFactors = FALSE)
y <- read.table("~/git/gemma_new/example/output/mouse_hs1940_CD8_lmm.assoc.txt",
header = TRUE,stringsAsFactors = FALSE)
print(max(abs(x[cols] - y[cols])))
# R> 0
Third test: association tests with a multivariate LMM.
cols <- 7:13
x <- read.table("~/git/gemma/example/output/mouse_hs1940_CD8MCH_lmm.assoc.txt",
header = TRUE,stringsAsFactors = FALSE)
y <- read.table("~/git/gemma_new/example/output/mouse_hs1940_CD8MCH_lmm.assoc.txt",
header = TRUE,stringsAsFactors = FALSE)
print(max(abs(x[cols] - y[cols])))
# R> 0
Hi Xiang,
a student of mine is using GEMMA (only univariate LMM for the moment). In the log file, we see the following lines:
beta estimate in the null model = 57.4035
se(beta) = 0.504719
What does this beta correspond to? Is it the intercept of the model "y = W alpha + Z u + e"? Thus, is it the first element of the alpha vector? Could you specify this in the manual?
Thanks,
Tim
I am confused about the -n argument when creating the relatedness matrix. This is supplied in line 117 in example/demo.txt to generate a relatedness matrix based on the training data. As the resulting matrix has values for all individuals, not just those with missing values, I was wondering what difference the argument makes to the calculations.
@xiangzhou I'm wondering if there is much benefit to including the FORCE_FLOAT
option in GEMMA, which makes your code somewhat more complicated. You claim that this option is useful for reducing memory requirements, but have you tested this? Most computers (my guess is >99%) now have 64-bit architectures, which means that addressing is limited to 64 bits, which is the size of a double-precision floating-point number. In other words, a single-precision floating point number will also occupy 64 bits.
I can test this, but I'm quite certain that there will be no benefit to FORCE_FLOAT
on a 64-bit architecture. If this is the case, perhaps we could consider removing this option from your code?
Dear Pjotr and Xiang,
We have been running GEMMA on bacterial data, attempting to find epistatic interactions by encoding the alleles at one of the loci as a phenotype and testing its interaction with the other locus. The answers that GEMMA gives are correlated with the truth, but appear to have widely inflated significance values. Another noticeable feature is that for many loci p_score is much less inflated than p_wald or p_lrt. There is also a hint that values are most inflated for low-frequency variants. For example, one locus with 5 1s and 265 0s is associated with another locus with 3 1s and 268 0s, where the 3 1s at the second locus all match with 1s at the first. GEMMA as we run it gives a p value of 10^-31 for the Wald and LRT, with 10^-21 for the score test when a sensible value would be around 10^-6.
We are guessing there is some systematic error in how the data is being interpreted, which probably reflects a misunderstanding of input format or suchlike. The command line we are running currently looks like this:
gemma -g genfile_gemma.txt -p phenotype_general.txt -a snpfile_gemma.txt -gk 1 -o rel_matrix -maf 0
gemma -g genfile_gemma.txt -p $locus_pheno.txt -a snpfile_gemma.txt -k rel_matrix.cXX.txt -lmm 4 -o $locus -maf 0
We have marked in the snp file that all positions are at chromosome 24 (y-chrom), which we understood is the suggestion for haploid organisms.
Obviously we are happy to send the data files on request. Any idea what could be going wrong? All suggestions much appreciated. Looking forward to figuring this out!
Best,
Kaisa
You can add osx in the matrix, e.g., here. In fact, you can automatically generate a latest-release by pushing to
dropbox (or something) as Artem did.
Note: Currently the compiler options in the Makefile do not appear to be compatible with the XCode version of gcc (e.g., Apple LLVM version 8.1.0) so this will have to be addressed as well.
@xiangzhou According to the C++ standard, size_t
is an unsigned integer type, but you are assigning (constant) negative values to variables of this type, which seems to be generating warnings in some cases; e.g., in io.cpp
:
SNPINFO sInfo={"-9", rs, -9, -9, minor, major, -9, -9, (long int) -9};
So you should either replace these with unsigned values, or change the definition of SNPINFO
.
brew tap homebrew/science
brew tap tseemann/bioinformatics-linux
brew install gemma
# or brew install gemma --HEAD for current github master
Should work on MacOS too.
I will eventually get it into homebrew-science
main.
Hi, I would like to conduct a 5 phenotypes GWAS analysis using this great software.
I understand that there is a similar issue #45. However, we have done everything you suggested or discussed (quantile transform each phenotype, "second level" of standardization of relatedness matrix) there but still cannot fix the problem.
When the number of phenotypes included increased to >3, the calculation failed:
./bin/gemma -g /home/takeshi/GWAS/Okazaki/Okazaki-analysis/mvSNPTEST/GEMMAchr22txt.gz -p /home/takeshi/GWAS/Okazaki/Okazaki-analysis/GEMMApheno.txt -n 1 2 3 4 -k ./output/GEMMA_standardrelatedmat005.sXX.txt -lmm -o GEMMAtest1234
Reading Files ...
*## number of total individuals = 807
*## number of analyzed individuals = 795
*## number of covariates = 1
*## number of phenotypes = 4
*## number of total SNPs = 163763
*## number of analyzed SNPs = 114095
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model:
-nan
-nan -nan
-nan -nan -nan
-nan -nan -nan -nan
se(Vg):
-nan
-nan -nan
-nan -nan -nan
-nan -nan -nan -nan
REMLE estimate for Ve in the null model:
-nan
-nan -nan
-nan -nan -nan
-nan -nan -nan -nan
se(Ve):
-nan
-nan -nan
-nan -nan -nan
-nan -nan -nan -nan
REMLE likelihood = -nan
MLE estimate for Vg in the null model:
-nan
-nan -nan
-nan -nan -nan
-nan -nan -nan -nan
se(Vg):
-nan
-nan -nan
-nan -nan -nan
-nan -nan -nan -nan
MLE estimate for Ve in the null model:
-nan
-nan -nan
-nan -nan -nan
-nan -nan -nan -nan
se(Ve):
-nan
-nan -nan
-nan -nan -nan
-nan -nan -nan -nan
MLE likelihood = -nan
It seems to be irrelevant with which 4 phenotypes were included in the calculation.
By the way, any 3 out of interested phenotypes (n = 5) could finished calculation normally:
##
## GEMMA Version = 0.97
##
## Command Line Input = ./bin/gemma -g /home/takeshi/GWAS/Okazaki/Okazaki-analysis/mvSNPTEST/GEMMAchr22txt.gz -p /home/takeshi/GWAS/Okazaki/Okazaki-analysis/GEMMApheno.txt -n 1 2 3 -k ./output/GEMMA_standardrelatedmat005.sXX.txt -lmm -o GEMMAtest123
##
## Date = Thu Jul 20 10:38:35 2017
##
## Summary Statistics:
## number of total individuals = 807
## number of analyzed individuals = 799
## number of covariates = 1
## number of phenotypes = 3
## number of total SNPs = 163763
## number of analyzed SNPs = 114078
## REMLE log-likelihood in the null model = -3215.58
## MLE log-likelihood in the null model = -3218.11
## REMLE estimate for Vg in the null model:
9.92726e-06
5.33329e-13 9.73246e-06
-3.00269e-12 5.30004e-11 9.55377e-06
## se(Vg):
0.384034
0.268873 0.370849
0.252157 0.2977 0.329824
## REMLE estimate for Ve in the null model:
0.992726
0.00435316 0.973246
-0.00967254 0.538782 0.955377
## se(Ve):
0.378022
0.264058 0.363834
0.249701 0.29338 0.328996
## MLE estimate for Vg in the null model:
9.92726e-06 8.68966e-13 -4.42535e-12
8.68966e-13 9.73246e-06 5.26974e-11
-4.42535e-12 5.26974e-11 9.55377e-06
## se(Vg):
0.397542
0.27852 0.383646
0.260398 0.308059 0.339524
## MLE estimate for Ve in the null model:
0.991484 0.00434771 -0.00966044
0.00434771 0.972028 0.538108
-0.00966044 0.538108 0.954182
## se(Ve):
0.391563
0.273689 0.376616
0.257985 0.303754 0.338825
## estimate for B (d by c) in the null model (columns correspond to the covariates provided in the file):
-0.00648804
-0.00101604
0.00465392
## se(B):
0.0352486
0.034901
0.0345791
##
## Computation Time:
## total computation time = 2.63051 min
## computation time break down:
## time on eigen-decomposition = 0.0262652 min
## time on calculating UtX = 0.170258 min
## time on optimization = 1.97904 min
##
Could you please give some suggestions about this ?
Many thanks
Hi Xiang,
It seems that you need to replace the vecLib framework with the Accelerate framework (type "man Accelerate" in Terminal on a Mac computer) to make gemma compile successfully on a Mac, at least with more recent versions of the OS X.
Peter
@pcarbo can we default to Unix format files? I have fixed a few compile time bugs on Linux, but you are using some other format which messes the diffs up. It should be a simple setting in your editor. I'll add a pull request. Note, I am working from the master branch and I am assuming it should be working. If it is broken in some way we should use a different branch. Currently it is broken on Linux with a recent gcc.
I really appreciate the recent work that you've put into GEMMA.
I received errors when trying to compile version 0.96 with the Makefile.macosx file. It seems that my gsl is in a directory other than the one listed in Makefile.macosx. I realize that there's a note about needing to change the directories listed in the Makefile, but I was unsure which lines typically need to be changed. The comment in Makefile.macosx about using caution when changing code below a certain line made me question whether I was proceeding as I should.
I installed gsl with homebrew, i.e.,
brew install gsl
I then changed all instances of gsl@1 to gsl in Makefile.macosx. Interestingly, I have on my computer both gsl and gsl@2 subdirectories, but no gsl@1 subdirectory.
I believe that GEMMA now compiles and works as expected.
My question is: why did you use the directory paths with gsl@1? Is it worth noting the possible need to make such a change in the documentation? It may not be needed, as, perhaps, most users are keener on this than am I. Thanks again.
We have a GNU Guix package of latest GEMMA releases. It shoud be straightforward to create a bioconda package. Who wants to do this?
I clone the whole package on my desktop,When I try to build the package it failed
Below is the information:
bioguo@bioguo:/Volumes/Work/GEMMA$ make
g++ -Wall -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static -c src/main.cpp -o src/main.o
g++ -Wall -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static -c src/param.cpp -o src/param.o
In file included from src/param.cpp:35:
In file included from src/mathfunc.h:24:
In file included from src/Eigen/Dense:3:
In file included from src/Eigen/Cholesky:24:
src/Eigen/src/Cholesky/LDLT.h:505:39: warning: unused typedef 'Scalar' [-Wunused-local-typedef]
typedef typename LDLTType::Scalar Scalar;
^
1 warning generated.
g++ -Wall -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static -c src/gemma.cpp -o src/gemma.o
In file included from src/gemma.cpp:59:
In file included from src/mathfunc.h:24:
In file included from src/Eigen/Dense:3:
In file included from src/Eigen/Cholesky:24:
src/Eigen/src/Cholesky/LDLT.h:505:39: warning: unused typedef 'Scalar' [-Wunused-local-typedef]
typedef typename LDLTType::Scalar Scalar;
^
1 warning generated.
g++ -Wall -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static -c src/io.cpp -o src/io.o
src/io.cpp:183:2: warning: expression result unused [-Wunused-value]
!safeGetline(infile, line).eof();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:833:5: warning: expression result unused [-Wunused-value]
!safeGetline(infile, line).eof();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:1129:3: warning: expression result unused [-Wunused-value]
!safeGetline(infile, line).eof();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:1364:3: warning: expression result unused [-Wunused-value]
!safeGetline(infile, line).eof();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:1452:3: warning: expression result unused [-Wunused-value]
!safeGetline(infile, line).eof();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:2165:52: error: constant expression evaluates to -9 which cannot be narrowed to type 'size_t' (aka 'unsigned long') [-Wc++11-narrowing]
SNPINFO sInfo={"-9", rs, -9, -9, minor, major, -9, -9, (long int) -9};
^~
src/io.cpp:2165:52: note: insert an explicit cast to silence this issue
SNPINFO sInfo={"-9", rs, -9, -9, minor, major, -9, -9, (long int) -9};
^~
static_cast<size_t>( )
src/io.cpp:2638:3: warning: expression result unused [-Wunused-value]
!safeGetline(infile, line).eof();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:2761:3: warning: expression result unused [-Wunused-value]
!safeGetline(infile, line).eof();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:3211:3: warning: expression result unused [-Wunused-value]
!safeGetline(infile, line).eof();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:3284:3: warning: expression result unused [-Wunused-value]
!safeGetline(infile, line).eof();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:3480:8: warning: explicitly assigning value of variable of type 'double' to itself [-Wself-assign]
z=z;
~^~
src/io.cpp:3415:3: warning: expression result unused [-Wunused-value]
!safeGetline(infile, line).eof();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:3654:5: warning: expression result unused [-Wunused-value]
!safeGetline(infile, line).eof();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:3675:5: warning: expression result unused [-Wunused-value]
!safeGetline(infile, line).eof();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:3698:5: warning: expression result unused [-Wunused-value]
!safeGetline(infile, line).eof();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:3707:5: warning: expression result unused [-Wunused-value]
!safeGetline(infile, line).eof();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15 warnings and 1 error generated.
## make: **\* [src/io.o] Error 1
Below is the version of clang
bioguo@bioguo:/Volumes/Work/GEMMA$ clang -v
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Hello ,
I intend to use the mvLMM to find eventual positive pleiotropic effects in plants traits. GEMMA provides pvalues only for the so-called "full test" (ie, it will test if the marker effect for the phenotype is different from zero). However, I also would like to get the marginal pvalues (in my notation, marginal pvalues means a hypothesis test for each phenotype, as if we are fitting a univariate model). My guess is, if a SNP is significant for the mvLMM analysis and, at the same time, significant for two (or more) other traits I am looking at an eventual pleiotropic effect.
To get these marginal pvalues, should I use the univariate model (LMM) for each trait separately? If I understood correctly, this was the strategy used in the original paper by Zhou and Stephens, 2014 (Supplementary Table1), correct?
Thank you,
Felipe
I have installed the program and can run your example data set without problem. Similarly, I can run the program on small subsets of the SNPs with my phenotypic data. However, when I calculate relatedness based on the whole genome, the estimates of the relatedness matrix calculated in GEMMA appear to cause problems. I have diagnosed the relatedness matrix as the problem by using the full relatedness matrices in the sample example analyses of my data (two traits, three snps) that run with a relatedness matrix calculated from just the three snps.
The error is consistent, in that the program estimates a 0 Ve matrix, then crashes because of a singular matrix error:
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model:
1.6815
0.1562 1.6745
se(Vg):
0.3784
-nan -nan
REMLE estimate for Ve in the null model:
0.0000
0.0000 0.0000
se(Ve):
0.1228
-nan -nan
REMLE likelihood = -478.9822
MLE estimate for Vg in the null model:
1.6815
0.1563 1.6745
se(Vg):
0.1758
0.1246 0.1751
MLE estimate for Ve in the null model:
0.0000
0.0000 0.0000
se(Ve):
-nan
0.0000 0.0000
MLE likelihood = -140737488355560.3750
gsl: lu.c:262: ERROR: matrix is singular========================100.00%
It may be that the relatedness in my data set is the problem. It is certainly not what you find in human data. I am studying a set of 184 largely inbred lines, the Drosophila Genome Reference Panel. The majority of sites are fixed, but the proportion of heterozygotes is maybe 5% on average, but that varies among lines. In addition, about 5% of the line pairs are more related than second cousins, and a few seem to be full sibs. Bottom line is that genotypes are always very far from Hardy-Weinberg. I have not filtered the genome for high LD SNP pairs for the calculation of relatedness, although I am aware that this will pose problems for the actual genome-wide association analysis with more than a few SNPs.
I would be happy to furnish example data sets that create the problem if that would be helpful.
We should add leave one chromosome out support to GEMMA. From what I can tell from the source code this is not particularly hard because GEMMA already tracks chromosome information. We need to recompute the kinship matrix for n chromosomes leaving one out and then rerun the LMM and run GEMMA n+1 times and present a result file with n+1 scores. I think this should be an internal feature of GEMMA and one would like to save the precomputed kinship matrices for rerunning with covariates etc.
When try to test a SNP's association with phenotype by putting a bunch of other SNPs as covariates, returning the errors "gsl: brent.c:202: ERROR: function value is not finite".
Assuming there are 4 SNPs in total, SNP0, SNP1, SNP2 and SNP3.
The model likes this: Y ~ alpha1 * SNP1 + alpha2 * SNP2 + alpha3 * SNP3 + beta * SNP0 + u + e
where SNP0 is the SNP being tested, and SNP1, SNP2 and SNP3 are covariates.
SNP0, SNP1, SNP2 and SNP3 could have high correlation with each other, say r2 > 0.6.
The relatedness matrix is calculated by using all the SNPs available, say SNP0, SNP1, SNP2 and SNP3.
Has anyone tried to do something like this? How to figure out the infinite problem?
Hello,
I have attempted multitrait GWAS with two traits and a kinship matrix (no other covariates) and I think something must be wrong with my input data. Here's the command line output:
lindroth@lindroth-5810:/usr/share/gemma_program$ ./gemma -bfile WisAsp_BCFfiltered_VCFfiltered_vcf-merge_VCFfiltered-take2_maf05_plink_LinkImpute_LDprune -k output/kinship.cXX.txt -lmm 4 -n 13 15 -o wisasp_multitrait
Reading Files ...
## number of total individuals = 446
## number of analyzed individuals = 445
## number of covariates = 1
## number of phenotypes = 2
## number of total SNPs = 139570
## number of analyzed SNPs = 139570
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model:
2.0034
-0.2032 2.0683
se(Vg):
1.5507
1.0244 0.9791
REMLE estimate for Ve in the null model:
0.4170
-0.4272 0.4377
se(Ve):
0.4100
0.2699 0.2571
REMLE likelihood = -1177.0001
MLE estimate for Vg in the null model:
2.2770
-0.4818 2.3521
se(Vg):
0.2408
0.1944 0.2496
MLE estimate for Ve in the null model:
0.3447
-0.3532 0.3618
se(Ve):
-nan
-nan 0.0000
MLE likelihood = -684011.4970
Reading SNPs 0.00%
The se(Ve) has NaNs and the "Reading SNPs" does not advance from 0.00%. The command does not execute fully. Do you have any suggestions for what could be wrong with the input data? I can attach the data as a zip if needed.
Thank you very much!
Hilary
It would be preferrable if the R variables could be passed directly to GEMMA without having to save files.
Perhaps it could even be turned into a small R package (e.g., rgemma
).
GEMMA should give hints if matrix properties are invalid, such as discussed in #45 (comment). E.g.
@xiangzhou @pcarbo anything else we can think of?
Failures and warnings should be reported in the log file. These checks can be disabled with the --no-checks switch (i.e., dangerous mode but faster).
When running the new test testCenteredRelatednessMatrixK I get different resulting files on two build setups. One is the correct size, but the second has 1940 rows instead of 1410. The extra rows are the same as the tail of the previous - so, somehow, the output buffer gets overwritten. I'll scrutinize this more closely.
Mr zhou,
GEMMA only estimates the additive effect. for heterozygous population, it's very important to estimate dominant effect. Maybe I can record the genotype (AA,Aa,aa) as (1,1,0), (0,1,1) and (0,1,0) respectively to estimate dominant effect. But it is reasonable to estimate dominant effect base on additive effect as NULL model . So I hope dominant estimation can be added into GEMMA.
thanks.
WangMin
I get following build error with gcc (GCC) 5.2.1 20150902
:
src/io.cpp:2056:61: error: narrowing conversion of ‘-9’ from ‘int’ to ‘std::size_t {aka long unsigned int}’ inside { } [-Wnarrowing]
SNPINFO sInfo={"-9", rs, -9, -9, minor, major, -9, -9, -9};
Large numbers of compiler warnings are produced during compilation. The makefile has -Wall which prints these - all well and good. If you are interested in producing warnings then any that are produced should be fixed so that the code compiles clean.
Compiler warnings flagged up by -Wall can sometimes give rise to subtle bugs at run-time so it's prudent to make sure that any code compiles clean.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.