genetics-statistics / gemma Goto Github PK

Genome-wide Efficient Mixed Model Association

Home Page: https://github.com/genetics-statistics/GEMMA

License: GNU General Public License v3.0

C++ 90.52% Makefile 0.63% Shell 3.64% Perl 0.04% Roff 1.78% HTML 2.17% CSS 0.04% XSLT 0.33% Batchfile 0.04% C 0.37% Scheme 0.18% Ruby 0.27%

gemma's People

Contributors

Stargazers

Watchers

Forkers

wtsi-hgi yjingj flora0420 princetonuniversity xuzhiyuan0909 jjberg2 wjastle jessiewu kunalbhutani maojie6509 genenetwork fenghelian hlbarker wintermind alenzhao daskelly ypark millak biostatpzeng yangjl tensorstrings jiangchb prasunanand daylly huangyizhong owen-haha jianglinghan jiaqiangzhu jxzb1988 cmonjeau tankmermaid fboehm dannyarends forevertl daissi fengyq akhileshkaushal khjia msnorhalida vivekstha juzheng87 yupinglin0301 fkcoolman fabbondanza wingolab-org zengqin70 changyu01 geneticresources lvhuiyong vbay16 wangdi2014 coolevilgenius scau-drr quanrd mjfeldmann strao1986 wayne009007 burdettadam feihu01 guilhermedejong dilansarange ntduc11 hongqin ettasnow tw7649116 xueenci dgarrimar ccclab-bigdata superzhen625 inanyunus jgarte xiahui625649 diogro guochengying-7824 nvrivera dawei1203 wangpanqiao xma82 khewahah sunny1205124 huangliang0828 moorezhu sandyyy123 gcostaneto wook2014 philip-leftwich widetyperice pele0412 rbpisupati mattbareno adluchessi orz-cq 3410junwei tiffttu ziquanzhao xwmp3 agazw ariadnavillam archibaldchain zhchenyang

gemma's Issues

Compile error

Hi @pjotrp, after your changes, I get the following error when running make -f Makefile.linux with g++ 4.8.5 and GSL 2.2.1 on RedHat Linux:

In file included from src/param.h:22:0,
                 from src/gemma.h:22,
                 from src/main.cpp:19:
src/debug.h:16:55: error: expected initializer before '__THROW'
                               const char *__function) __THROW
                                                       ^
make: *** [src/main.o] Error 1

variance contribution of tested snps

Hi,
I'm wondering if there's a way to obtain the proportion of variance explained after adjustment on each snp in the GEMMA output after univariate LMM for marker association tests.
Indeed, I can find the PVE of the NULL model in the log.txt (I suppose that the fixed covariates independant of the betaX vector are included in the null model, but I'm not sure ),
and I can find the Lamba after adjustment for each snp tested, but this Lambda gives me the proportion of variance explained by the genetic relatedness matrix compared to the residual one, if I understood correctly. I haven't access to the "new" total variance with the marker as covariate, neither to the variance explained by the tested marker. Is there a way I haven't seen to compute it based on GEMMA outputs ?

Thank you !

GEMMA does not compile

./src/io.o: In function GetabIndex(unsigned long, unsigned long, unsigned long)': io.cpp:(.text+0x4ae0): multiple definition of GetabIndex(unsigned long, unsigned long, unsigned long)'
./src/param.o:param.cpp:(.text+0x2500): first defined here
./src/lmm.o: In function GetabIndex(unsigned long, unsigned long, unsigned long)': lmm.cpp:(.text+0x1760): multiple definition of GetabIndex(unsigned long, unsigned long, unsigned long)'
./src/param.o:param.cpp:(.text+0x2500): first defined here
./src/vc.o: In function GetabIndex(unsigned long, unsigned long, unsigned long)': vc.cpp:(.text+0x1600): multiple definition of GetabIndex(unsigned long, unsigned long, unsigned long)'
./src/param.o:param.cpp:(.text+0x2500): first defined here
./src/vc.o: In function ReadHeader(std::string const&, HEADER&)': vc.cpp:(.text+0x81d0): multiple definition of ReadHeader(std::string const&, HEADER&)'
./src/io.o:io.cpp:(.text+0x6cd0): first defined here
collect2: error: ld returned 1 exit status
make: *** [bin/gemma] Error 1

Please make a release

Needed for packaging in Brew and Conda.

add .gitignore to repo

This is a minor issue: it's useful to ask git to ignore some files. For instance, when doing "git status -s", one usually doesn't want to see all object files generated by gcc/g++. This can easily be done by adding a file named ".gitignore" at the root of the repo (note the dot ".", it's a hidden file). Inside this file, one can add lines such as:

*.o
*~

More explanations here.

Association Tests with Univariate Linear Mixed Models using a non gemma index

I have been trying to run a univariate linear mixed model. However I was hoping to use a grm I have made else where- it is the second in grm format. The first few lines are as as follows

1       1       8.923316e-01
1       2       -5.544370e-02
1       3       1.903249e-02
1       4       1.880089e-02
1       5       9.801378e-02
1       6       -3.117359e-02
1       7       2.540684e-01
1       8       -1.174230e-01
1       9       1.168132e-01
1       10      1.833550e-01

Then I am trying to run a lmm using the follow command
./gemma.linux -bfile mouse_hs1940 -km 2 -k mouse_plinkgctasorted.txt -lmm 2 -o mouse_plinkgctasorted

However I keep getting the following error. Any advice on how to fix this?

Reading Files ...
## number of total individuals = 1940
## number of analyzed individuals = 1410
## number of covariates = 1
## number of phenotypes = 1
## number of total SNPs = 12226
## number of analyzed SNPs = 10768
Start Eigen-Decomposition...
gsl: newton.c:74: ERROR: derivative is zero
Default GSL error handler invoked.
Aborted

The file will run the analysis when I supply a gemma grm, however the file I intend on supplying will be much larger and would prefer to make the grm elsewhere.

add doc and demo to repo

Can the manual (latex + pdf) as well as the demo files be added to the repo?

binary bin/gemma is included in source repository

It isn't usual to include a built binary in the source repository. Could bin/gemma be removed from git?

Makefile

Hi, I'm trying to install gemma onto a linux cluster. However my version of gsl is not in the standard '/usr/bin' location and cannot be installed there because of admin privileges.

Regardless, I cannot get the GEMMA Makefile to find where gsl is installed on the cluster. If anyone could give advice on how to edit the Makefile to correctly find gsl, that would be greatly appreciated.

Thanks!

output file with kinship has extra column

Hi Xiang,
I am running v0.94. When I load in R the file kinship.cXX.txt or kinship.sXX.txt, I can see that there is an extra column.
In the code section starting at line 627 of param.h, I think, you shouldn't write the tab \t after each number. Instead, for each line, you should write the first number, and then enter into a loop which write the tab followed by the number corresponding to this iteration.

Tim

ps: sorry, I'm waiting for my sysadmin to install LAPACK and other libraries to be able to compile GEMMA by myself. Otherwise, I'd have done a pull request...

SNP filters in GEMMA

Hi Xiang,

is this "not" a typo? (page 14 in the manual)

By default, SNPs with minor allele frequency above 1% will not be included in the analysis.

if I plan to include all SNPs in the analysis, I just need to use "-maf 0"?

Another two related question:

is -notsnp the same as -miss 0 -maf 0 -r2 1 -hwe 0?
if I specify a list of SNPs to be included in the analysis by using the option -snps, will some of the SNPs still be filtered out due to the missingness and/or minor allele frequencies?

Thanks!

"-gxe" not working

Hi Xiang,

Any time I try to run GEMMA with the "-gxe" option it gives me the following error:

Start Eigen-Decomposition...
pve estimate =0.130578
se(pve) =0.195554
gsl: brent.c:202: ERROR: function value is not finite 0.00%
Default GSL error handler invoked.
Segmentation fault (core dumped)

The only way it seems to run is if all the values in the "-gxe" file are 0.
Is this issue specific to me?

Thanks

Compilation problem "ldr.h : No such file or directory"

Hello there,

I'm trying to compile GEMMA on my system. I changed my lapack libraries directory. So it should not be a problem with this. Here's what I get :

$ make
g++ -Wall -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static -c src/main.cpp -o src/main.o
g++ -Wall -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static -c src/param.cpp -o src/param.o
In file included from src/Eigen/Cholesky:24:0,
from src/Eigen/Dense:3,
from src/mathfunc.h:24,
from src/param.cpp:35:
src/Eigen/src/Cholesky/LDLT.h: In member function ‘void Eigen::internal::solve_retval<Eigen::LDLT<MatrixType, _UpLo>, Rhs>::evalTo(Dest&) const’:
src/Eigen/src/Cholesky/LDLT.h:505:39: attention : typedef ‘Scalar’ locally defined but not used [-Wunused-local-typedefs]
typedef typename LDLTType::Scalar Scalar;
^
src/param.cpp: In member function ‘void PARAM::ReadFiles()’:
src/param.cpp:281:137: attention : ‘W’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (ReadFile_bed (file_str, setSnps, W, indicator_idv, indicator_snp, snpInfo, maf_level, miss_level, hwe_level, r2_level, ns_test_tmp)==false) {error=true;}
^
g++ -Wall -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static -c src/gemma.cpp -o src/gemma.o
src/gemma.cpp:54:17: erreur fatale: ldr.h : Aucun fichier ou dossier de ce type
#include "ldr.h"
^
compilation terminée.
make: *** [src/gemma.o] Erreur 1

Sorry, the system is in french, but you see there that the error is coming from the fact that it's not finding the class file ldr.h.

Thanks for your help.

Segmentation faults when running example in demo.txt

I've been going through the demo and have experienced segmentation faults for one of the example processes. The rest worked fine.

On command line:

> ../bin/gemma -g mouse_hs1940.geno.txt.gz -p mouse_hs1940.pheno.txt -n 1 6 -a mouse_hs1940.anno.txt -k ./output/mouse_hs1940.cXX.txt -lmm -o mouse_hs1940_CD8MCH_lmm

Reading Files ... 
## number of total individuals = 1940
## number of analyzed individuals = 1197
## number of covariates = 1
## number of phenotypes = 2
## number of total SNPs = 12226
## number of analyzed SNPs = 10758
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model: 
1.3940  
-0.2267 2.0817  
se(Vg): 
0.1567  
0.1363  0.2359  
REMLE estimate for Ve in the null model: 
0.3489  
0.0491  0.4144  
se(Ve): 
0.0206  
0.0166  0.0267  
REMLE likelihood = -2855.1664
MLE estimate for Vg in the null model: 
1.3959  
-0.2267 2.0854  
se(Vg): 
0.1568  
0.1365  0.2361  
MLE estimate for Ve in the null model: 
0.3483  
0.0490  0.4136  
se(Ve): 
0.0206  
0.0166  0.0266  
MLE likelihood = -2856.0280
Reading SNPs  ==================================================100.00%
Segmentation fault (core dumped)

Running on gdb:

(gdb) file ../bin/gemma
Reading symbols from ../bin/gemma...done.
(gdb) run -g mouse_hs1940.geno.txt.gz -p mouse_hs1940.pheno.txt -n 1 6 -a mouse_hs1940.anno.txt -k ./output/mouse_hs1940.cXX.txt -lmm -o mouse_hs1940_CD8MCH_lmm
Starting program: /home/ubuntu/Nic/Programs/bugwas/GEMMA/bin/gemma -g mouse_hs1940.geno.txt.gz -p mouse_hs1940.pheno.txt -n 1 6 -a mouse_hs1940.anno.txt -k ./output/mouse_hs1940.cXX.txt -lmm -o mouse_hs1940_CD8MCH_lmm
Reading Files ... 
## number of total individuals = 1940
## number of analyzed individuals = 1197
## number of covariates = 1
## number of phenotypes = 2
## number of total SNPs = 12226
## number of analyzed SNPs = 10758
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model: 
1.3940  
-0.2267 2.0817  
se(Vg): 
0.1567  
0.1363  0.2359  
REMLE estimate for Ve in the null model: 
0.3489  
0.0491  0.4144  
se(Ve): 
0.0206  
0.0166  0.0267  
REMLE likelihood = -2855.1664
MLE estimate for Vg in the null model: 
1.3959  
-0.2267 2.0854  
se(Vg): 
0.1568  
0.1365  0.2361  
MLE estimate for Ve in the null model: 
0.3483  
0.0490  0.4136  
se(Ve): 
0.0206  
0.0166  0.0266  
MLE likelihood = -2856.0280
Reading SNPs  ==================================================100.00%

Program received signal SIGSEGV, Segmentation fault.
0x000000000046d369 in MVLMM::WriteFiles() ()

I got a similar error after using the down-loadable executable and compiling it myself.

If anyone can help me understand what this means and what I can do that would be great.
Thanks!

make clean doesn't remove built binary

Normally, a make clean invocation would remove the binary that was previously built, so that make will be sure to rebuild it. Currently, the Makefile target for clean only removes _.o and *float. targets (as well as *~ files which are not actually built).

Extracting the compiled version of GEMMA is not working

I've been trying to use the Mac OS X version of GEMMA, but it's not working. It won't extract the files.

gunzip gemma.macosx.gz
> gunzip: gemma.macosx.gz: not in gzip format

Also, I've tried to compile GEMMA on macOS Sierra, but it's not working.

g++ -Wall -Weffc++ -O3 -std=gnu++11 -m64 ./src/main.o ./src/param.o ./src/gemma.o ./src/io.o ./src/lm.o ./src/lmm.o ./src/vc.o ./src/mvlmm.o ./src/bslmm.o ./src/prdt.o ./src/mathfunc.o ./src/gzstream.o ./src/eigenlib.o ./src/ldr.o ./src/bslmmdap.o ./src/logistic.o ./src/varcov.o /usr/local/opt/gsl@1/lib/libgsl.a /usr/local/opt/gsl@1/lib/libgslcblas.a -lz -o ./bin/gemma
g++: error: /usr/local/opt/gsl@1/lib/libgsl.a: No such file or directory
g++: error: /usr/local/opt/gsl@1/lib/libgslcblas.a: No such file or directory
make: *** [bin/gemma] Error 1

Update Eigen source files

The latest version of Eigen C++ library is 3.3.3. Either: (1) update the source code or, (2) remove the source code and have the user download the Eigen source code separately.

Note: No changes were made to the original Eigen library source code.

dynamic linking doesn't work without Makefile modifications

When building gemma on my linux machine when using dynamic linking (i.e. when running make FORCE_DYNAMIC=1), I get the following error:

g++ -Wall -O3 -DWITH_LAPACK -m64 ./src/main.o ./src/param.o ./src/gemma.o ./src/io.o ./src/lm.o ./src/lmm.o ./src/vc.o ./src/mvlmm.o ./src/bslmm.o ./src/prdt.o ./src/mathfunc.o ./src/gzstream.o ./src/lapack.o -lgsl -lgslcblas -pthread -lz -llapack -o ./bin/gemma
/usr/bin/ld: ./src/lapack.o: undefined reference to symbol 'dgemm_'
/usr/bin/ld: note: 'dgemm_' is defined in DSO /usr/lib/libblas.so.3gf so try adding it to the linker command line
/usr/lib/libblas.so.3gf: could not read symbols: Invalid operation
collect2: ld returned 1 exit status
make: *** [bin/gemma] Error 1

This can be fixed for me by modifying the Makefile line setting the LIBS variable when FORCE_DYNAMIC is set to include -lblas

Add test framework

It would be really nice to run the tests listed in the example directory. This would give some confidence that changes to the code base still deliver the same results.

make relatedness matrix: gsl ERROR: vector length n must be positive integer

I am trying to figure out how to make relatedness matrix and running into an error:

$gemma -bfile $BASE -epm ${BASE}.out.param.txt \
        -emu ${BASE}.out.log.txt -ebv ${BASE}.out.bv.txt \
        -k ${BASE}.cXX.txt \
        -predict 1 -o ${BASE}.predicted

Reading Files ...
## number of total individuals = 94
## number of analyzed individuals = 94
## number of covariates = 1
## number of phenotypes = 1
gsl: init_source.c:29: ERROR: vector length n must be positive integer
Default GSL error handler invoked.
Segmentation fault

I am using pre-compiled 0.95alpha and 0.94.1 binaries on CentOS6.7.

In log file, question about pve, vg and ve

Hi Xiang,
we used real genotypes and simulated phenotypes:

model: y = W alpha + X beta + Z u + e where X is N=~~300 individuals x P~~=6000 SNPs
u ~ N(0, sigma_u^2 K) with K estimated from equation 2.2 from Astle and Balding
e ~ N(0, sigma_e^2 I)
only 100 SNPs have an effect: beta = c(rnorm(n=100, mean=0, sd=1), rep(0, ncol(X)-100))
we also fixed h2 = 0.5 and sigma_e^2 = 1, thus sigma_u^2 = (h2 * sigma_e^2) / (1 - h2)

We fit the uLMM and get the following lines in the log file:

pve estimate in the null model = 0.129203
se(pve) in the null model = 0.102527
vg estimate in the null model = 33.8083
ve estimate in the null model = 71.0727

For the null model, I would expect pve ~= vg / (vg + ve), but it's not the case, where am I wrong?

Always getting 'pve estimate =0.99xxx se(pve) =-nan'

Hi Xiang,
When I use gemma to perform gwas, I always getting 'pve estimate =0.99xxx se(pve) =-nan', and all results are nan(below). Here is my command line,
gemma -bfile mydata -k output/mydata_kinship.sXX.txt -miss 1 -maf 0.01 -r2 1 -lmm
Is there any wrong with the command line or my input datas (attachment)?
mydata_kinship.sXX.txt

GEMMA_data.zip

1 chr rs ps n_miss allele1 allele0 af beta se l_remle p_wald
2 1 483 483 58 T C 0.121 nan -nan 1.000000e+05 nan
3 1 2740 2740 54 A T 0.386 nan -nan 1.000000e+05 nan
4 1 4213 4213 38 G C 0.140 nan -nan 1.000000e+05 nan
5 1 4222 4222 38 C A 0.402 nan -nan 1.000000e+05 nan
6 1 4227 4227 38 T A 0.189 nan -nan 1.000000e+05 nan
7 1 4235 4235 38 T G 0.378 nan -nan 1.000000e+05 nan
8 1 4291 4291 39 A C 0.185 nan -nan 1.000000e+05 nan

Best wishes!

Add MPL for Eigen

The license accompanying the Eigen C++ library is missing, and should be added.

source distribution created by 'tar' expands into .

The source distribution (i.e. gemma-0.95.tar.gz) expands directly into the current working directory when expanded. This can sometimes be considered rude and as a result most software distributions provide tars that are set to expand into a subdirectory (usually named the same as the basename of the tar file).

No `bin` folder

Readme says:
INSTALLATION: An executable binary (x86 64bit Linux) is available in the bin directory.
However there is no bin directory in this repository.

Tests using Eigen C++ library v3.3.3

Using commit id bbbabdb.

First test: calculate centered relatedness matrix.

x <- as.matrix(read.table("gemma/example/output/mouse_hs1940.cXX.txt"))
y <- as.matrix(read.table("gemma_new/example/output/mouse_hs1940.cXX.txt"))
print(max(abs(x - y)))
# R> 1e-12

Second test: association tests with a univariate LMM.

cols <- c("af","beta","se","l_remle","p_wald")
x <- read.table("~/git/gemma/example/output/mouse_hs1940_CD8_lmm.assoc.txt",
                header = TRUE,stringsAsFactors = FALSE)
y <- read.table("~/git/gemma_new/example/output/mouse_hs1940_CD8_lmm.assoc.txt",
                header = TRUE,stringsAsFactors = FALSE)
print(max(abs(x[cols] - y[cols])))
# R> 0

Third test: association tests with a multivariate LMM.

cols <- 7:13
x <- read.table("~/git/gemma/example/output/mouse_hs1940_CD8MCH_lmm.assoc.txt",
	        header = TRUE,stringsAsFactors = FALSE)
y <- read.table("~/git/gemma_new/example/output/mouse_hs1940_CD8MCH_lmm.assoc.txt",
	        header = TRUE,stringsAsFactors = FALSE)
print(max(abs(x[cols] - y[cols])))
# R> 0

In log file, what does "beta" correspond to?

Hi Xiang,
a student of mine is using GEMMA (only univariate LMM for the moment). In the log file, we see the following lines:

beta estimate in the null model =   57.4035
se(beta) =   0.504719

What does this beta correspond to? Is it the intercept of the model "y = W alpha + Z u + e"? Thus, is it the first element of the alpha vector? Could you specify this in the manual?

Thanks,
Tim

Relatedness matrix -n argument

I am confused about the -n argument when creating the relatedness matrix. This is supplied in line 117 in example/demo.txt to generate a relatedness matrix based on the training data. As the resulting matrix has values for all individuals, not just those with missing values, I was wondering what difference the argument makes to the calculations.

Do we need FORCE_FLOAT?

@xiangzhou I'm wondering if there is much benefit to including the FORCE_FLOAT option in GEMMA, which makes your code somewhat more complicated. You claim that this option is useful for reducing memory requirements, but have you tested this? Most computers (my guess is >99%) now have 64-bit architectures, which means that addressing is limited to 64 bits, which is the size of a double-precision floating-point number. In other words, a single-precision floating point number will also occupy 64 bits.

I can test this, but I'm quite certain that there will be no benefit to FORCE_FLOAT on a 64-bit architecture. If this is the case, perhaps we could consider removing this option from your code?

Inflated p-values using GEMMA on haploid organisms?

Dear Pjotr and Xiang,

We have been running GEMMA on bacterial data, attempting to find epistatic interactions by encoding the alleles at one of the loci as a phenotype and testing its interaction with the other locus. The answers that GEMMA gives are correlated with the truth, but appear to have widely inflated significance values. Another noticeable feature is that for many loci p_score is much less inflated than p_wald or p_lrt. There is also a hint that values are most inflated for low-frequency variants. For example, one locus with 5 1s and 265 0s is associated with another locus with 3 1s and 268 0s, where the 3 1s at the second locus all match with 1s at the first. GEMMA as we run it gives a p value of 10^-31 for the Wald and LRT, with 10^-21 for the score test when a sensible value would be around 10^-6.

We are guessing there is some systematic error in how the data is being interpreted, which probably reflects a misunderstanding of input format or suchlike. The command line we are running currently looks like this:

gemma -g genfile_gemma.txt -p phenotype_general.txt -a snpfile_gemma.txt -gk 1 -o rel_matrix -maf 0

gemma -g genfile_gemma.txt -p $locus_pheno.txt -a snpfile_gemma.txt -k rel_matrix.cXX.txt -lmm 4 -o $locus -maf 0

We have marked in the snp file that all positions are at chromosome 24 (y-chrom), which we understood is the suggestion for haploid organisms.

Obviously we are happy to send the data files on request. Any idea what could be going wrong? All suggestions much appreciated. Looking forward to figuring this out!

Best,

Kaisa

Add MacOSX build environment to Travis CI

You can add osx in the matrix, e.g., here. In fact, you can automatically generate a latest-release by pushing to
dropbox (or something) as Artem did.

Note: Currently the compiler options in the Makefile do not appear to be compatible with the XCode version of gcc (e.g., Apple LLVM version 8.1.0) so this will have to be addressed as well.

size_t set to signed values

@xiangzhou According to the C++ standard, size_t is an unsigned integer type, but you are assigning (constant) negative values to variables of this type, which seems to be generating warnings in some cases; e.g., in io.cpp:

SNPINFO sInfo={"-9", rs, -9, -9, minor, major, -9, -9, (long int) -9};

So you should either replace these with unsigned values, or change the definition of SNPINFO.

FYI - updated Brew formula to 0.96

brew tap homebrew/science
brew tap tseemann/bioinformatics-linux
brew install gemma
# or brew install gemma --HEAD for current github master

Should work on MacOS too.

I will eventually get it into homebrew-science main.

Multivariable GWAS (-lmm) has NaN values when including more than 3 phenotypes

Hi, I would like to conduct a 5 phenotypes GWAS analysis using this great software.

I understand that there is a similar issue #45. However, we have done everything you suggested or discussed (quantile transform each phenotype, "second level" of standardization of relatedness matrix) there but still cannot fix the problem.

When the number of phenotypes included increased to >3, the calculation failed:

./bin/gemma -g /home/takeshi/GWAS/Okazaki/Okazaki-analysis/mvSNPTEST/GEMMAchr22txt.gz -p /home/takeshi/GWAS/Okazaki/Okazaki-analysis/GEMMApheno.txt -n 1 2 3 4 -k ./output/GEMMA_standardrelatedmat005.sXX.txt -lmm -o GEMMAtest1234
Reading Files ... 
*## number of total individuals = 807
*## number of analyzed individuals = 795
*## number of covariates = 1
*## number of phenotypes = 4
*## number of total SNPs = 163763
*## number of analyzed SNPs = 114095
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model: 
-nan	
-nan	-nan	
-nan	-nan	-nan	
-nan	-nan	-nan	-nan	
se(Vg): 
-nan	
-nan	-nan	
-nan	-nan	-nan	
-nan	-nan	-nan	-nan	
REMLE estimate for Ve in the null model: 
-nan	
-nan	-nan	
-nan	-nan	-nan	
-nan	-nan	-nan	-nan	
se(Ve): 
-nan	
-nan	-nan	
-nan	-nan	-nan	
-nan	-nan	-nan	-nan	
REMLE likelihood = -nan
MLE estimate for Vg in the null model: 
-nan	
-nan	-nan	
-nan	-nan	-nan	
-nan	-nan	-nan	-nan	
se(Vg): 
-nan	
-nan	-nan	
-nan	-nan	-nan	
-nan	-nan	-nan	-nan	
MLE estimate for Ve in the null model: 
-nan	
-nan	-nan	
-nan	-nan	-nan	
-nan	-nan	-nan	-nan	
se(Ve): 
-nan	
-nan	-nan	
-nan	-nan	-nan	
-nan	-nan	-nan	-nan	
MLE likelihood = -nan

It seems to be irrelevant with which 4 phenotypes were included in the calculation.
By the way, any 3 out of interested phenotypes (n = 5) could finished calculation normally:

##
## GEMMA Version = 0.97
##
## Command Line Input = ./bin/gemma -g /home/takeshi/GWAS/Okazaki/Okazaki-analysis/mvSNPTEST/GEMMAchr22txt.gz -p /home/takeshi/GWAS/Okazaki/Okazaki-analysis/GEMMApheno.txt -n 1 2 3 -k ./output/GEMMA_standardrelatedmat005.sXX.txt -lmm -o GEMMAtest123 
##
## Date = Thu Jul 20 10:38:35 2017
##
## Summary Statistics:
## number of total individuals = 807
## number of analyzed individuals = 799
## number of covariates = 1
## number of phenotypes = 3
## number of total SNPs = 163763
## number of analyzed SNPs = 114078
## REMLE log-likelihood in the null model = -3215.58
## MLE log-likelihood in the null model = -3218.11
## REMLE estimate for Vg in the null model: 
9.92726e-06	
5.33329e-13	9.73246e-06	
-3.00269e-12	5.30004e-11	9.55377e-06	
## se(Vg): 
0.384034	
0.268873	0.370849	
0.252157	0.2977	0.329824	
## REMLE estimate for Ve in the null model: 
0.992726	
0.00435316	0.973246	
-0.00967254	0.538782	0.955377	
## se(Ve): 
0.378022	
0.264058	0.363834	
0.249701	0.29338	0.328996	
## MLE estimate for Vg in the null model: 
9.92726e-06	8.68966e-13	-4.42535e-12	
8.68966e-13	9.73246e-06	5.26974e-11	
-4.42535e-12	5.26974e-11	9.55377e-06	
## se(Vg): 
0.397542	
0.27852	0.383646	
0.260398	0.308059	0.339524	
## MLE estimate for Ve in the null model: 
0.991484	0.00434771	-0.00966044	
0.00434771	0.972028	0.538108	
-0.00966044	0.538108	0.954182	
## se(Ve): 
0.391563	
0.273689	0.376616	
0.257985	0.303754	0.338825	
## estimate for B (d by c) in the null model (columns correspond to the covariates provided in the file): 
-0.00648804	
-0.00101604	
0.00465392	
## se(B): 
0.0352486	
0.034901	
0.0345791	
##
## Computation Time:
## total computation time = 2.63051 min 
## computation time break down: 
##      time on eigen-decomposition = 0.0262652 min 
##      time on calculating UtX = 0.170258 min 
##      time on optimization = 1.97904 min 
##

Could you please give some suggestions about this ?

Many thanks

Compiling GEMMA on Mac OS X

Hi Xiang,

It seems that you need to replace the vecLib framework with the Accelerate framework (type "man Accelerate" in Terminal on a Mac computer) to make gemma compile successfully on a Mac, at least with more recent versions of the OS X.

Peter

Use Unix mode for source files

@pcarbo can we default to Unix format files? I have fixed a few compile time bugs on Linux, but you are using some other format which messes the diffs up. It should be a simple setting in your editor. I'll add a pull request. Note, I am working from the master branch and I am assuming it should be working. If it is broken in some way we should use a different branch. Currently it is broken on Linux with a recent gcc.

Makefile.macosx and gsl directory

I really appreciate the recent work that you've put into GEMMA.

I received errors when trying to compile version 0.96 with the Makefile.macosx file. It seems that my gsl is in a directory other than the one listed in Makefile.macosx. I realize that there's a note about needing to change the directories listed in the Makefile, but I was unsure which lines typically need to be changed. The comment in Makefile.macosx about using caution when changing code below a certain line made me question whether I was proceeding as I should.

I installed gsl with homebrew, i.e.,

brew install gsl

I then changed all instances of gsl@1 to gsl in Makefile.macosx. Interestingly, I have on my computer both gsl and gsl@2 subdirectories, but no gsl@1 subdirectory.

I believe that GEMMA now compiles and works as expected.

My question is: why did you use the directory paths with gsl@1? Is it worth noting the possible need to make such a change in the documentation? It may not be needed, as, perhaps, most users are keener on this than am I. Thanks again.

Provide GEMMA package in bioconda

We have a GNU Guix package of latest GEMMA releases. It shoud be straightforward to create a bioconda package. Who wants to do this?

could build success on Mac OS X

I clone the whole package on my desktop,When I try to build the package it failed
Below is the information:

bioguo@bioguo:/Volumes/Work/GEMMA$ make
g++ -Wall -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static  -c src/main.cpp -o src/main.o
g++ -Wall -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static  -c src/param.cpp -o src/param.o
In file included from src/param.cpp:35:
In file included from src/mathfunc.h:24:
In file included from src/Eigen/Dense:3:
In file included from src/Eigen/Cholesky:24:
src/Eigen/src/Cholesky/LDLT.h:505:39: warning: unused typedef 'Scalar' [-Wunused-local-typedef]
    typedef typename LDLTType::Scalar Scalar;
                                      ^
1 warning generated.
g++ -Wall -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static  -c src/gemma.cpp -o src/gemma.o
In file included from src/gemma.cpp:59:
In file included from src/mathfunc.h:24:
In file included from src/Eigen/Dense:3:
In file included from src/Eigen/Cholesky:24:
src/Eigen/src/Cholesky/LDLT.h:505:39: warning: unused typedef 'Scalar' [-Wunused-local-typedef]
    typedef typename LDLTType::Scalar Scalar;
                                      ^
1 warning generated.
g++ -Wall -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static  -c src/io.cpp -o src/io.o
src/io.cpp:183:2: warning: expression result unused [-Wunused-value]
        !safeGetline(infile, line).eof();
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:833:5: warning: expression result unused [-Wunused-value]
    !safeGetline(infile, line).eof();
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:1129:3: warning: expression result unused [-Wunused-value]
                !safeGetline(infile, line).eof();
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:1364:3: warning: expression result unused [-Wunused-value]
                !safeGetline(infile, line).eof();
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:1452:3: warning: expression result unused [-Wunused-value]
                !safeGetline(infile, line).eof();
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:2165:52: error: constant expression evaluates to -9 which cannot be narrowed to type 'size_t' (aka 'unsigned long') [-Wc++11-narrowing]
                  SNPINFO sInfo={"-9", rs, -9, -9, minor, major, -9, -9, (long int) -9};
                                                                 ^~
src/io.cpp:2165:52: note: insert an explicit cast to silence this issue
                  SNPINFO sInfo={"-9", rs, -9, -9, minor, major, -9, -9, (long int) -9};
                                                                 ^~
                                                                 static_cast<size_t>( )
src/io.cpp:2638:3: warning: expression result unused [-Wunused-value]
  !safeGetline(infile, line).eof();
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:2761:3: warning: expression result unused [-Wunused-value]
                !safeGetline(infile, line).eof();
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:3211:3: warning: expression result unused [-Wunused-value]
  !safeGetline(infile, line).eof();
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:3284:3: warning: expression result unused [-Wunused-value]
  !safeGetline(infile, line).eof();
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:3480:8: warning: explicitly assigning value of variable of type 'double' to itself [-Wself-assign]
      z=z;
      ~^~
src/io.cpp:3415:3: warning: expression result unused [-Wunused-value]
  !safeGetline(infile, line).eof();
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:3654:5: warning: expression result unused [-Wunused-value]
    !safeGetline(infile, line).eof();
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:3675:5: warning: expression result unused [-Wunused-value]
    !safeGetline(infile, line).eof();
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:3698:5: warning: expression result unused [-Wunused-value]
    !safeGetline(infile, line).eof();
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/io.cpp:3707:5: warning: expression result unused [-Wunused-value]
    !safeGetline(infile, line).eof();
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15 warnings and 1 error generated.
## make: **\* [src/io.o] Error 1

Below is the version of clang
bioguo@bioguo:/Volumes/Work/GEMMA$ clang -v
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

mvLMM and marginal pvalues

Hello ,
I intend to use the mvLMM to find eventual positive pleiotropic effects in plants traits. GEMMA provides pvalues only for the so-called "full test" (ie, it will test if the marker effect for the phenotype is different from zero). However, I also would like to get the marginal pvalues (in my notation, marginal pvalues means a hypothesis test for each phenotype, as if we are fitting a univariate model). My guess is, if a SNP is significant for the mvLMM analysis and, at the same time, significant for two (or more) other traits I am looking at an eventual pleiotropic effect.

To get these marginal pvalues, should I use the univariate model (LMM) for each trait separately? If I understood correctly, this was the strategy used in the original paper by Zhou and Stephens, 2014 (Supplementary Table1), correct?

Thank you,
Felipe

Singular Ve matrix

I have installed the program and can run your example data set without problem. Similarly, I can run the program on small subsets of the SNPs with my phenotypic data. However, when I calculate relatedness based on the whole genome, the estimates of the relatedness matrix calculated in GEMMA appear to cause problems. I have diagnosed the relatedness matrix as the problem by using the full relatedness matrices in the sample example analyses of my data (two traits, three snps) that run with a relatedness matrix calculated from just the three snps.

The error is consistent, in that the program estimates a 0 Ve matrix, then crashes because of a singular matrix error:
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model:
1.6815
0.1562 1.6745
se(Vg):
0.3784
-nan -nan
REMLE estimate for Ve in the null model:
0.0000
0.0000 0.0000
se(Ve):
0.1228
-nan -nan
REMLE likelihood = -478.9822
MLE estimate for Vg in the null model:
1.6815
0.1563 1.6745
se(Vg):
0.1758
0.1246 0.1751
MLE estimate for Ve in the null model:
0.0000
0.0000 0.0000
se(Ve):
-nan
0.0000 0.0000
MLE likelihood = -140737488355560.3750
gsl: lu.c:262: ERROR: matrix is singular========================100.00%

It may be that the relatedness in my data set is the problem. It is certainly not what you find in human data. I am studying a set of 184 largely inbred lines, the Drosophila Genome Reference Panel. The majority of sites are fixed, but the proportion of heterozygotes is maybe 5% on average, but that varies among lines. In addition, about 5% of the line pairs are more related than second cousins, and a few seem to be full sibs. Bottom line is that genotypes are always very far from Hardy-Weinberg. I have not filtered the genome for high LD SNP pairs for the calculation of relatedness, although I am aware that this will pose problems for the actual genome-wide association analysis with more than a few SNPs.

I would be happy to furnish example data sets that create the problem if that would be helpful.

Add LOCO support for Plink format (Bimbam is in 0.97)

We should add leave one chromosome out support to GEMMA. From what I can tell from the source code this is not particularly hard because GEMMA already tracks chromosome information. We need to recompute the kinship matrix for n chromosomes leaving one out and then rerun the LMM and run GEMMA n+1 times and present a result file with n+1 scores. I think this should be an internal feature of GEMMA and one would like to save the precomputed kinship matrices for rerunning with covariates etc.

Testing with covariates got errors "gsl: brent.c:202: ERROR: function value is not finite"

When try to test a SNP's association with phenotype by putting a bunch of other SNPs as covariates, returning the errors "gsl: brent.c:202: ERROR: function value is not finite".
Assuming there are 4 SNPs in total, SNP0, SNP1, SNP2 and SNP3.
The model likes this: Y ~ alpha1 * SNP1 + alpha2 * SNP2 + alpha3 * SNP3 + beta * SNP0 + u + e
where SNP0 is the SNP being tested, and SNP1, SNP2 and SNP3 are covariates.
SNP0, SNP1, SNP2 and SNP3 could have high correlation with each other, say r2 > 0.6.
The relatedness matrix is calculated by using all the SNPs available, say SNP0, SNP1, SNP2 and SNP3.
Has anyone tried to do something like this? How to figure out the infinite problem?

Multitrait GWAS (-lmm) has NaN values for Se(Ve)

Hello,

I have attempted multitrait GWAS with two traits and a kinship matrix (no other covariates) and I think something must be wrong with my input data. Here's the command line output:

lindroth@lindroth-5810:/usr/share/gemma_program$ ./gemma -bfile WisAsp_BCFfiltered_VCFfiltered_vcf-merge_VCFfiltered-take2_maf05_plink_LinkImpute_LDprune -k output/kinship.cXX.txt -lmm 4 -n 13 15  -o wisasp_multitrait

Reading Files ... 
## number of total individuals = 446
## number of analyzed individuals = 445
## number of covariates = 1
## number of phenotypes = 2
## number of total SNPs = 139570
## number of analyzed SNPs = 139570
Start Eigen-Decomposition...
REMLE estimate for Vg in the null model: 
2.0034	
-0.2032	2.0683	
se(Vg): 
1.5507	
1.0244	0.9791	
REMLE estimate for Ve in the null model: 
0.4170	
-0.4272	0.4377	
se(Ve): 
0.4100	
0.2699	0.2571	
REMLE likelihood = -1177.0001
MLE estimate for Vg in the null model: 
2.2770	
-0.4818	2.3521	
se(Vg): 
0.2408	
0.1944	0.2496	
MLE estimate for Ve in the null model: 
0.3447	
-0.3532	0.3618	
se(Ve): 
-nan	
-nan	0.0000	
MLE likelihood = -684011.4970
Reading SNPs                                                    0.00%

The se(Ve) has NaNs and the "Reading SNPs" does not advance from 0.00%. The command does not execute fully. Do you have any suggestions for what could be wrong with the input data? I can attach the data as a zip if needed.

Thank you very much!
Hilary

Create R wrapper/interface for GEMMA

It would be preferrable if the R variables could be passed directly to GEMMA without having to save files.

Perhaps it could even be turned into a small R package (e.g., rgemma).

Check matrix properties (5 out of 7 implemented in 0.97)

GEMMA should give hints if matrix properties are invalid, such as discussed in #45 (comment). E.g.

Fail if K has negative eigen values
Fail if K is not symmetric
Fail if K is not positive definite
Warn in eigen values are very small
Warn if K is ill conditioned
Warn on related pairs
Warn on MAF problems

@xiangzhou @pcarbo anything else we can think of?

Failures and warnings should be reported in the log file. These checks can be disabled with the --no-checks switch (i.e., dangerous mode but faster).

Odd sized kinship file

When running the new test testCenteredRelatednessMatrixK I get different resulting files on two build setups. One is the correct size, but the second has 1940 rows instead of 1410. The extra rows are the same as the tail of the previous - so, somehow, the output buffer gets overwritten. I'll scrutinize this more closely.

Include estimates of dominance effects

Mr zhou,

GEMMA only estimates the additive effect. for heterozygous population, it's very important to estimate dominant effect. Maybe I can record the genotype (AA,Aa,aa) as (1,1,0), (0,1,1) and (0,1,0) respectively to estimate dominant effect. But it is reasonable to estimate dominant effect base on additive effect as NULL model . So I hope dominant estimation can be added into GEMMA.
thanks.

WangMin

build error narrowing conversion of ‘-9’ from ‘int’ to ‘std::size_t'

I get following build error with gcc (GCC) 5.2.1 20150902:

src/io.cpp:2056:61: error: narrowing conversion of ‘-9’ from ‘int’ to ‘std::size_t {aka long unsigned int}’ inside { } [-Wnarrowing]
    SNPINFO sInfo={"-9", rs, -9, -9, minor, major, -9, -9, -9};

compiler warnings

Large numbers of compiler warnings are produced during compilation. The makefile has -Wall which prints these - all well and good. If you are interested in producing warnings then any that are produced should be fixed so that the code compiles clean.
Compiler warnings flagged up by -Wall can sometimes give rise to subtle bugs at run-time so it's prudent to make sure that any code compiles clean.

genetics-statistics / gemma Goto Github PK

gemma's People

Contributors

Stargazers

Watchers

Forkers

gemma's Issues

Recommend Projects

Recommend Topics

Recommend Org