weizhouumich / saige Goto Github PK

View Code? Open in Web Editor NEW

187.0 187.0 72.0 2.03 GB

License: GNU Lesser General Public License v3.0

Shell 11.53% R 44.10% C++ 30.08% HTML 0.29% CSS 0.14% Roff 13.74% CMake 0.12%

saige's People

Contributors

Stargazers

Watchers

saige's Issues

Error in if (traitType == «binary») { : argument is of length zero

Hi!

I have been running SAIGE version 0.20 without problems, but after upgrading to version 0.26 I keep getting an error when trying to run step 2.
The cmd.sh example script works, so I guess the installation is ok.

Step 1 run seems to run without problem on this particular phenotype, and regarding the trait type, this is in the log-file:

$traitType
[1] «binary»
-and-
Migrene is a binary trait

When running step 2, I keep getting this error message:

Error in try(if (length(which(opt == ««)) > 0) stop(«Missing arguments»)) :
Missing arguments
17756 samples have been used to fit the glmm null model
variance Ratio is 0.906
58173 sample IDs are found in sample file
[1] 58173 4
[1] «IID» «IndexInModel» «IndexDose.x» «IndexDose.y»
17756 samples were used in fitting the NULL glmm model and are found in sample file
minMAC: 1
minMAF: 1e-04
Minimum MAF of markers to be testd is 1e-04
Analysis started at 1.52e+09 Seconds
setgenoTest here

file extenstion is dose
set GenoTestObj from dosage file!
NSampleTest 58173
Mtest 2928

Testing markers: 2928 , samples: 58173
isVariant: TRUE
Error in if (traitType == «binary») { : argument is of length zero
Calls: SPAGMMATtest
Execution halted

This is the code from step 1:

logfile_step1=./Migraine_women_step1.log
{
Rscript /mnt/work/Programs/SAIGE_0.26/SAIGE/extdata/step1_fitNULLGLMM.R
--plinkfile=/path/Dataset_pruned
--phenoFile=/path/Headache_women_genotyped.txt
--phenoCol=Migrene
--covarColList=PC1,PC2,PC3,PC4,Age
--sampleIDColinphenoFile=IID
--traitType=binary
--skipModelFitting=FALSE
--outputPrefix=Migraine_women
} 2>&1 | tee «$logfile_step1»¨

And this is the code from step 2

logfile_step2=./Migrene_women_step2.log
{
Rscript /mnt/work/Programs/SAIGE_0.26/SAIGE/extdata/step2_SPAtests.R
--dosageFile=/path/Dataset_imputed.dose
--dosageFileNrowSkip=0
--dosageFileNcolSkip=5
--dosageFilecolnamesSkip=CHR,SNP,POS,A1,A2
--minMAF=0.0001
--sampleFile=/path/samplelist_IID.txt
--GMMATmodelFile=./Migraine_women.rda
--varianceRatioFile=./Migraine_women.varianceRatio.txt
--SAIGEOutputFile=./Migraine_women_results.txt
--numLinesOutput=2
--IsOutputAFinCaseCtrl=TRUE
} 2>&1 | tee «$logfile_step2»

I have tried several phenotypes and different dosage-files, but I keep getting the same error message..

Could you please help me figure out what I am doing wrong here..?

Thanks,
Sigrid

LOCO for non-human data

I am running SAIGE with non-human data (29 autosomes), however, when I run using the LOCO argument it automatically assumes 22 autosomes when specifying chromosomeStartIndexVec (see below). Is there a way to override this so that it will consider all 29 autosomes in my dataset.

....
Leave-one-chromosome-out is activated
chromosomeStartIndexVec: 0 3922 5051 7056 9077 10853 12642 13835 14336 17521 20104 22092 24181 26491 28662 30754 32575 33590 35109 36783 38618 39973
11115 samples have genotypes
....

New 3.0.0 version of SPATEST R package does not work with SAIGE

My colleague gets errors about missing values.

step 2 unexpectedly terminate

Hi,
I got an issue for step2, which unexpectedly terminate with a message:

"Analysis started at 1537078520 Seconds
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Abort (core dumped)
"
Does anyone aware of what kind of problem is this?
Thanks!

Step two runs on all cores with some versions of openblas

This is the fix we are currently using:

require(inline)

openblas.set.num.threads <- cfunction(signature(ipt="integer"),
                                      body = 'openblas_set_num_threads(*ipt);',
                                      otherdefs = c ('extern void openblas_set_num_threads(int);'),
                                      libargs = c ('-L/opt/openblas/lib -lopenblas'),
                                      language = "C",
                                      convention = ".C"
                                      )

openblas.set.num.threads(1)

library(SAIGE)
SPAGMMATtest(dosageFile = dosageFile,
...)

Can you include option to read bfile from plink? for Step-2

Even though we can convert plink to vcf, it might be impractical for large files

Can you please add this option

Thanks

Difference of p.value and p.value.NA in output files

Hi,

I am currently running GWAS with SAIGE. Everything went well, just we are a bit confused of annotating the output files. There are two columns for P-value, one is "p.value" and another one is "p.value.NA". Could you please provide some information on what is the difference? Thanks in advance!

Best regards,
Tenghao

Only header printed in output without error when using VCF - step 2

Dear Wei,

Sorry to disturb you. I just downloaded the latest version of SAIGE to run in R version 3.5.1 (SAIGE_0.29.4.2). I have installed the package into R successfully and all the dependencies.

Stage 1 ran without problems and stage 2 also appears to run well (although extremely quickly) according to the log file (attached). Just without actually carrying out any SPA tests.

The output file contains only a header. Is this behaviour you have found before? I tried the example you provide in the package and that runs without problems and does print the summary statistics out, so I've been trying to compare the files in my analysis to the example. The potential problems I can think of are:

Our SNP ID's contain colons - is this OK?
Our VCF file contains multiple types of information, whereas the VCF files in the example contain either only DS or only GT. Is this a problem?

I have tried to attach a picture showing the first few lines of our VCF file.

Any advice or suggestions you have to get round this problem would be much appreciated.

Best wishes,

Jamie

slurm-2462623.txt

covariates

To whom it may concern,

I'd like to ask that when we run GWAS using SAIGE, do we need to fit PCs as covariates since SAIGE calculated GRM itself?

Best wishes,
Yeda

No more traitType option in SPAGMMATest?

As of 0.25, there seems to be no traitType option in the SAIGE SPAGMMATtest*. How do I analyse a quantitative trait with the new SAIGE? :)

Thanks!

At least, there is no mention of it in the help-section.

SAIGE is not compatible with R 3.5.0 (and compile fails)

Hi,
the latest SAIGE binary doesn't work with the latest R version (3.5.0), this error is displayed:

> library("SAIGE")
Error: package or namespace load failed for âSAIGEâ:
 package âSAIGEâ was installed by an R version with different internals; it needs to be reinstalled for use with this R version

I tried compiling versions 0.26 and 0.29 from the sources but they both fail, at least on latest Ubuntu (Cosmic) or Debian testing, you can see the compile warnings and the error here: https://pastebin.com/61wR1sMq

Thanks!

ERROR: SNP has B = 16 (not 8)

Hi there,

I am trying to run SAIGE on a binary trait from the UKBB and I wanted to test a small subset of SNPs before running all chromosomes. I am running SAIGE_0.29.3.

I fitted the model using the 93 511 SNPs from the kinship calculations, as you did in the SAIGE manuscript for 424772 samples. My model was fitted using Sex, Age, PC1, PC2, PC3, PC4 as non genetic covariates. I am trying to run SAIGE on 10 markers on chromsome 22. The bgen file contains genotype calls for all samples and only the 10 markers to be tested.

I get the following error when trying to run step2 of SAIGE:

Minimum MAF of markers to be testd is  1e-04 
Analysis started at  1.54e+09 Seconds
no query list is provided
424772 samples are found in the bgen file
10 markers are found in the bgen file 
isVariant:  TRUE 
It is a binary trait
Analyzing  8381  cases and  416391  controls 
ERROR: rs9628178 has B = 16 (not 8)

Could you please help me out to find the cause of this specific error?

SAIGE step 1 error

Hi,

I got this error message when fitting the null model (SAIGE step1) on a UK Biobank binary trait:

Loading required package: optparse Error in SPAtest:::Saddle_Prob_fast(q = qtilde, g = g, mu = mu, gNA = g[NAset], : argument "output" is missing, with no default Calls: fitNULLGLMM ... scoreTest_SPAGMMAT_forVarianceRatio_binaryTrait -> <Anonymous> Execution halted

My step 1 command was:
/services/tools/R-3.2.1/bin/Rscript --vanilla /home/projects/ssi_gen1/data/UKbiobank/genotype/scripts/step1_fitNULLGLMM.R \ --plinkFile=/home/projects/ssi_gen1/data/UKbiobank/genotype/SAIGE/step1_SAIGE.ch1-22_clean \ --phenoFile=/home/projects/ssi_gen1/data/UKbiobank/genotype/SAIGE/hernias_pheno_plus_covariates.txt \ --phenoCol=SPECIFIC5 \ --covarColList=sex,yob,pca1,pca2,pca3,pca4 \ --sampleIDColinphenoFile=IID \ --traitType=binary \ --outputPrefix=/home/projects/ssi_gen1/data/UKbiobank/genotype/SAIGE/SPECIFIC5 \ --nThreads=16 \ --LOCO=TRUE
Do you know what might be the problem? Thanks

Running SAIGE genome wide?

Hi there,

The "--chrom=" argument is mandatory at the moment for step2_SPAtests.R. For some of my tasks this means I need to break my input vcf down by chroms, which is good for large vcfs but not so much for small vcfs. I was wondering if you would have a version that simply scans through an input vcf?

Many thanks,
Zhihao

Step2 error: "Error in t(B^2) %*% mu21 : non-conformable arguments"

Hi!

I have discovered a new problem using SAIGE version 0.26.6 in some of the phenotypes, running step 2.

This is the last part of the log-file:

numPassMarker: 48
user system elapsed
6.748 0.236 6.978
[1] 76
numPassMarker: 48
Error in t(B^2) %*% mu21 : non-conformable arguments
Calls: SPAGMMATtest ... rbind -> scoreTest_SAIGE_binaryTrait -> Score_Test_Sparse
Execution halted

This is the code:

Rscript /mnt/work/R_packages/x86_64-pc-linux-gnu-library/3.4/SAIGE/extdata/step2_SPAtests.R
--dosageFile=/path/to/genofile.dose
--dosageFileNrowSkip=0
--dosageFileNcolSkip=5
--dosageFilecolnamesSkip=CHR,SNP,POS,A1,A2
--minMAF=0.0001
--sampleFile=/path/to/samplefile.txt
--GMMATmodelFile=phenotype.rda
--varianceRatioFile=phenotype.varianceRatio.txt
--SAIGEOutputFile=phenotype_results.txt
--numLinesOutput=2
--IsOutputAFinCaseCtrl=TRUE

Running step 1 did not show any errors.
I have run the phenotypes with an older SAIGE version previously with no errors.
I am also running the script in parallel, with most of the phenotypes working, so the code itself seems to be ok.

Googling have given me no answers to the problem.
Do you know what can cause this?

Thanks alot for any help!

Sigrid``

Problem with reading vcf

Hi there,

Thanks for developing this software. I liked your paper and was trying to test it on some of my VCFs.

I realised that in the second step (SPAtests) it loops through all variants in the input, in my case it's a genotype VCF, conditioning on "isVariant" variable being true

SAIGE/R/SAIGE_SPATest.R

Line 360 in f50adcf

while(isVariant){

which is from here

SAIGE/src/SAIGE_readDosage_vcf.cpp

Line 77 in c65f602

bool getGenoOfnthVar_vcfDosage_pre(){

I have variants in my VCF skipped because they are not determined as a variant by the function. It could be that my VCF is not properly formatted. Could you please point me to some directions for the exact criteria that determine whether an entry is a variant?

Many thanks,
Zhihao

Unable to run SAIGE with BGEN files

Hi Wei,

I've been unable to get SAIGE to run properly with BGEN files (version 1.2 with 8 bit compression).
I receive the following error:

Error in getDosage_bgen_noquery() : BGenError
Calls: SPAGMMATtest -> getDosage_bgen_noquery -> .Call
Execution halted

Running SAIGE with BGEN does produce a SAIGE.txt output file, but the p-values of positive controls in the SAIGE.txt file do not align at all with results from a logistic model. Do you know what might be causing this?

Thanks for any help,
Josh

Load SAIGE package error

I am getting an error when attempting to load the SAIGE library in R. I am using the latest package version SAIGE_0.29.4.2_R_x86_64-pc-linux-gnu.tar.gz in R version 3.5.1.

The package installs successfully, however when I attempt to load the package I get the following error:

> library(SAIGE)
Error: package or namespace load failed for ‘SAIGE’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/usr/local/lib/R/site-library/SAIGE/libs/SAIGE.so':
  libhts.so.2: cannot open shared object file: No such file or directory

I have installed all required dependencies successfully as well.

missing data in step 2

I would like to use SAIGE with whole-exome data. The data are in plink format and contain missing values. I performed step 1 with no issues. For step 2, I understand I cannot use plink files. I reformatted my files into bgen format, but SAIGE complains due to the missing values. I am now reformatting my data into vcf, as I believe it's the only option for hard call genotypes for step 2? Would vcf files with missing data work for step 2 please? If not, do you have any suggestions? Many thanks!

fitNULLGLMM.R error

Hello,

I am very interested in running SAIGE, and I have managed to install it successfully, but when I try to run it with my data, I get the following error:

$ Rscript ~/Software/SAIGE-master/extdata/step1_fitNULLGLMM.R --plinkFile=./snps.unique_seg.maf_0.05 --phenoFile=./snps.phe --phenoCol=y --sampleIDColinphenoFile=IID --traitType=binary --outputPrefix=./saige_output_test --nThreads=8
Warning message:
package ‘SAIGE’ was built under R version 3.4.3 
Loading required package: optparse
$plinkFile
[1] "./snps.unique_seg.maf_0.05"

$phenoFile
[1] "./snps.phe"

$phenoCol
[1] "y"

$covarColList
[1] ""

$sampleIDColinphenoFile
[1] "IID"

$centerVariables
[1] ""

$skipModelFitting
[1] FALSE

$traitType
[1] "binary"

$outputPrefix
[1] "./saige_output_test"

$numMarkers
[1] 30

$nThreads
[1] 8

$invNormalize
[1] FALSE

$help
[1] FALSE

8  threads are set to be used  
155  samples have genotypes
formula is  y~ 
Error in parse(text = x, keep.source = FALSE) : 
  <text>:2:0: unexpected end of input
1: y~
   ^
Calls: fitNULLGLMM ... formula -> formula.character -> formula -> eval -> parse
Execution halted

As far as I can tell, my data is formatted properly. Unfortunately, I cannot figure out what the problem might seem to be. I have 155 samples with phenotypes denoted "0" or "1", with no covariates.

However, when I tried running it with an artificial covariate, and I instead get the following error message (perhaps because the covariate is just random numbers? But I don't have any covariates to include in the true analysis):

Error in if (max(abs(tau - tau0)/(abs(tau) + abs(tau0) + tol)) < tol) break : 
  missing value where TRUE/FALSE needed
Calls: fitNULLGLMM -> system.time -> glmmkin.ai_PCG_Rcpp_Binary
In addition: Warning messages:
1: glm.fit: algorithm did not converge 
2: glm.fit: algorithm did not converge 
Timing stopped at: 11.22 3.952 2.234
Execution halted

It may be difficult to troubleshoot without me giving more info - I can provide more information as needed, but am not sure what to provide up front. Note that I am using R 3.4.1, but I doubt this would be the issue.

Any help would be greatly appreciated!

Thank you,
Conrad Izydorczyk

Where did centerVariables in fitNULLGLMM go? (SAIGE 0.25)

I used to center age: how do I do that in the new SAIGE (0.25)?

The option seems to be gone.

running SAIGE per chromosome

Hi, I am wondering if it would be possible to run SAIGE per chromosome when LOCO is set to FALSE?
I am asking because my genotype and imputed dosage files are already broken down per chromosome and to have to merge them seems a little cumbersome and the files are huge.
Can you please clarify?
Thanks!
Ruth

Could you please make a repo for code?

I'd like to contribute PRs :)

Categorical covariates not working yet, but including sex is okay?

I do not understand this section from the fitNULLGLMM R docs:

covarColList: vector of characters. Covariates to be used in the glm
          model e.g c("Sex", "Age")

qCovarCol: vector of characters. Categorical covariates to be used in
          the glm model (NOT work yet)

Isn't sex a categorical variable?

Source code for SAIGE

Hi,
I'm having a glibc problem with installing SAIGE.

Error: package or namespace load failed for ‘SAIGE’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/usr/local/apps/R/gcc_6.2.0/library/SAIGE/libs/SAIGE.so':
/lib64/libc.so.6: version `GLIBC_2.17' not found (required by /usr/local/apps/R/gcc_6.2.0/library/SAIGE/libs/SAIGE.so)
Error: loading failed
Execution halted
ERROR: loading fail

Is it possible that the source can be made available in order to compile on our systems.

Sylvia

Unable to load shared object /home/endrebak/anaconda3/lib/R/library/SAIGE/libs/SAIGE.so

endrebak@hunt-genes-home ~/local> R CMD INSTALL SAIGE_0.18.tar.gz
...
...
...
installing to /home/endrebak/anaconda3/lib/R/library/SAIGE/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
During startup - Warning message:
Setting LC_CTYPE failed, using "C"
Error in dyn.load(file, DLLpath = DLLpath, ...) :
  unable to load shared object '/home/endrebak/anaconda3/lib/R/library/SAIGE/libs/SAIGE.so':
  /home/endrebak/anaconda3/lib/R/library/SAIGE/libs/SAIGE.so: undefined symbol: _ZN7genfile4bgen4View6createERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Error: loading failed
Execution halted
ERROR: loading failed
* removing '/home/endrebak/anaconda3/lib/R/library/SAIGE'

Suggestions for how to fix this?

What might the causes of this error message be? (Error in setgeno(genofile, subSampleInGeno, memoryChunk) : vector::_M_range_check)

SAIGE 0.26

I'd like to debug this, but I have been unable to. Where could I start? If you could add a verbose option to SAIGE that would be great :)

During startup - Warning message:
Setting LC_CTYPE failed, using "C"
Warning message:
package 'SAIGE' was built under R version 3.4.3
48  threads are set to be used
69716  samples have genotypes
formula is  vte2~Sex+birthyear+batch+PC1+PC2+PC3+PC4
56150  samples have non-missing phenotypes
13566  samples in geno file do not have phenotypes
56150  samples will be used for analysis
colnames(data.new) is  Y minus1 Sex birthyear batch PC1 PC2 PC3 PC4
out.transform$Param.transform$qrr:  8 8
vte2  is a binary trait

Call:  glm(formula = formula.new, family = binomial, data = data.new)

Coefficients:
   minus1        Sex  birthyear      batch        PC1        PC2        PC3
  4.56768    0.02913    0.83645    0.04198    0.01863   -0.07889   -0.04670
      PC4
  0.01134

Degrees of Freedom: 56150 Total (i.e. Null);  56142 Residual
Null Deviance:      77840
Residual Deviance: 7942         AIC: 7958
[1] "Start reading genotype plink file here"
nbyte: 17429
nbyte: 14038
reserve: 3506476032

M: 249749, N: 69716
size of genoVecofPointers: 2
here
Error in setgeno(genofile, subSampleInGeno, memoryChunk) :
  vector::_M_range_check
Calls: fitNULLGLMM ... glmmkin.ai_PCG_Rcpp_Binary -> system.time -> setgeno -> .Call
Timing stopped at: 275.632 4.676 280.411
Timing stopped at: 275.768 4.676 280.547
Execution halted

If you have no idea I understand. Then I'll just try again when SAIGE 0.27 is out :)

Floating point exception(core dumped)

I receive this error when trying to run Step 1 with LOCO=TRUE.

Leave-one-chromosome-out is activated
chromosomeStartIndexVec: 0 3922 5051 7056 9077 10853 12642 13835 14336 17521 20104 22092 24181 26491 28662 30754 32575 33590 35109 36783 38618 39973
11115 samples have genotypes
formula is phen~ 1
11115 samples have non-missing phenotypes
11115 samples will be used for analysis
colnames(data.new) is Y minus1
out.transform$Param.transform$qrr: 1 1
phen is a binary trait

Call: glm(formula = formula.new, family = binomial, data = data.new)

Coefficients:
minus1
1.461

Degrees of Freedom: 11115 Total (i.e. Null); 11114 Residual
Null Deviance: 15410
Residual Deviance: 10750 AIC: 10760
[1] "Start reading genotype plink file here"
nbyte: 2779
nbyte: 2779
reserve: 115550544
/var/tmp/slurmd/job10762729/slurm_script: line 23: 13914 Floating point exception(core dumped) Rscript Step1_new.R --plinkFile=plinkfile --phenoFile=file.phen --phenoCol=phen --sampleIDColinphenoFile=id --numMarkers=100 --traitType=binary --outputPrefix=output --nThreads=1

Input is as follows:

Rscript Step1_new.R \
	--plinkFile=plinkfile \
	--phenoFile=file.phen \
	--phenoCol=phen \
	--sampleIDColinphenoFile=id \
	--numMarkers=100 \
	--traitType=binary \
	--outputPrefix=output \
	--nThreads=1

When the phenotype file contains an index column SAIGE does cannot read the file

When the phenofile looks like:

1 Whatever
2 Blabla
3 Spoon
4 Swordfish
5 .......

instead of

Whatever
Blabla
Spoon
Swordfish
.......

SAIGE exits with the error:

During startup - Warning message:
Setting LC_CTYPE failed, using "C"
Warning message:
package 'SAIGE' was built under R version 3.4.3
48  threads are set to be used
69716  samples have genotypes
Error in fitNULLGLMM(plinkFile = plinkFile, phenoFile = phenoFile, phenoCol = phenoCol,  :
  ERROR! column for pheno.all does not exsit in the phenoFile
In addition: Warning message:
In data.table:::fread(phenoFile, header = T, stringsAsFactors = FALSE) :
  Starting data input on line 2 and discarding line 1 because it has too few or too many items to be column names or data: FID	IID	PATID	MATID	Sex	Age	BirthYear	batch	PC1	PC2PC3	PC4	pheno.all	pheno.cases	pheno.controls
Execution halted

Would you consider making SAIGE accept both? It is easy to fix this in my pipeline, but I guess many others will have the same problem :)

Running step2 on subset of dosage file

Hi there,

I want to run SAIGE on some binary traits from the UKBB and wanted to know whether its possible to perform step2, association testing only on a subset of samples and variants from an existing bgen file? Is this already possible or do we have to subset our dosage file for the samples and variants we would like to test for every binary trait we want to test?

Thank you for the help,
Best,

Florian

library error

I have installed the SAIGE package in R successfully. However, when I library the package, the error shows like this:

Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/home/yeda.wu/R/x86_64-pc-linux-gnu-library/3.4/SAIGE/libs/SAIGE.so':
liblapack.so.3: cannot open shared object file: No such file or directory
In addition: Warning message:
package 'SAIGE' was built under R version 3.4.3
Error: package or namespace load failed for ‘SAIGE'

I am not sure what’s going on here. Could you please give me some advice?

Thanks in advance.

calculating odds ratio

Hi,

How can we convert the beta from step 2 to odds ratio?

The exp(beta) gave me values from 1 to 30 for the top 10 hits.

Thanks,
Amy

double free or corruption error

Hi there,

I am getting an error message of "R: double free or corruption (!prev)" - full log pasted at the bottom. I was wondering if you have encountered this error before?

Thanks,
Zhihao

`
486801 samples have genotypes
formula is age_at_menopause~age+chip+pc1+pc2+pc3+pc4+pc5
34080 samples have non-missing phenotypes
452737 samples in geno file do not have phenotypes
34064 samples will be used for analysis
colnames(data.new) is Y minus1 age chip pc1 pc2 pc3 pc4 pc5
out.transform$Param.transform$qrr: 8 8
age_at_menopause is a quantitative trait
[1] "Start reading genotype plink file here"
nbyte: 121701
nbyte: 8516
reserve: 85180000

M: 10000, N: 486801
size of genoVecofPointers: 1
here
time: 22784.1
[1] "Genotype reading is done"
Fixed-effect coefficients:
minus1 age chip pc1 pc2
-3.899318e+00 1.565513e-02 -4.877275e-03 -6.925518e-04 2.302982e-04
pc3 pc4 pc5
8.042034e-06 3.747540e-03 -1.672029e-03
inital tau is 0.005859873 0.005859873
[1] "ok1"
iter from getPCG1ofSigmaAndVector 26
iter from getPCG1ofSigmaAndVector 22
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 25
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 26
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 26
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 26
iter from getPCG1ofSigmaAndVector 26
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 26
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 26
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 26
iter from getPCG1ofSigmaAndVector 28
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 26
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 28
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 26
iter from getPCG1ofSigmaAndVector 26
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 26
iter from getPCG1ofSigmaAndVector 27
iter from getPCG1ofSigmaAndVector 26
iter from getPCG1ofSigmaAndVector 26
iter from getPCG1ofSigmaAndVector 33
A1(0,0) 1.39036e+09
iter from getPCG1ofSigmaAndVector 29
AI 1.3904e+09 3.1515e+07
3.1515e+07 7.1666e+07

Trace 4.6066e+06
1.2052e+06

YPAPY 604621
cov 1.7203e-07 1.2404e-14 8.0786e-14 -1.1885e-14 -3.0021e-14 1.6740e-14 4.4670e-14 -2.6604e-14
-1.3932e-14 2.1842e-07 -4.2133e-08 -1.3894e-09 -6.2503e-11 1.2665e-09 7.5174e-09 -7.4913e-09
-7.6886e-15 -4.2132e-08 1.5268e-06 2.0266e-11 1.3647e-09 1.5490e-09 -1.1190e-10 5.7976e-09
7.3620e-15 -1.3894e-09 1.9793e-11 2.3279e-07 1.7434e-09 -9.7932e-09 -3.2563e-08 2.3300e-08
-2.2561e-15 -6.2528e-11 1.3653e-09 1.7434e-09 2.2588e-07 -2.7287e-09 -1.3776e-08 1.2486e-08
1.5910e-14 1.2665e-09 1.5491e-09 -9.7932e-09 -2.7287e-09 2.3176e-07 2.7183e-08 -1.9142e-08
1.9238e-14 7.5174e-09 -1.1164e-10 -3.2563e-08 -1.3776e-08 2.7183e-08 3.3390e-07 -9.0709e-08
-2.7797e-15 -7.4913e-09 5.7974e-09 2.3300e-08 1.2486e-08 -1.9142e-08 -9.0709e-08 3.1626e-07

tauv3 0.009632391 0.005254446

Iteration 1 :
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 16
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 18
iter from getPCG1ofSigmaAndVector 18
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 20
iter from getPCG1ofSigmaAndVector 20
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 20
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 20
iter from getPCG1ofSigmaAndVector 18
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 20
iter from getPCG1ofSigmaAndVector 20
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 20
iter from getPCG1ofSigmaAndVector 20
iter from getPCG1ofSigmaAndVector 20
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 20
iter from getPCG1ofSigmaAndVector 20
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 18
iter from getPCG1ofSigmaAndVector 20
iter from getPCG1ofSigmaAndVector 20
iter from getPCG1ofSigmaAndVector 18
iter from getPCG1ofSigmaAndVector 20
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 19
iter from getPCG1ofSigmaAndVector 22
A1(0,0) 3.19269e+08
iter from getPCG1ofSigmaAndVector 21
AI 3.1927e+08 2.0894e+07
2.0894e+07 5.4824e+07

Trace 2.9273e+06
1.1153e+06

YPAPY 489333
cov 2.8277e-07 -3.5684e-15 7.2558e-14 5.8791e-14 -1.5673e-14 -1.1703e-14 2.4636e-14 -1.9592e-14
-3.1248e-14 3.4329e-07 -5.3851e-08 -1.8603e-09 -1.0478e-10 1.7017e-09 9.8410e-09 -9.7593e-09
1.7753e-13 -5.3851e-08 2.0257e-06 2.8053e-10 1.7084e-09 1.9096e-09 1.6526e-10 8.1851e-09
-4.9256e-15 -1.8603e-09 2.8038e-10 3.6227e-07 2.2353e-09 -1.2903e-08 -4.2687e-08 3.0489e-08
-1.7443e-14 -1.0474e-10 1.7068e-09 2.2354e-09 3.5313e-07 -3.5739e-09 -1.8037e-08 1.6416e-08
-1.1821e-15 1.7017e-09 1.9100e-09 -1.2903e-08 -3.5739e-09 3.6069e-07 3.5759e-08 -2.5054e-08
2.3638e-14 9.8410e-09 1.6807e-10 -4.2687e-08 -1.8037e-08 3.5759e-08 4.9483e-07 -1.1870e-07
-7.8719e-15 -9.7593e-09 8.1840e-09 3.0489e-08 1.6416e-08 -2.5054e-08 -1.1870e-07 4.7153e-07

0,03.19269e+08
0,12.08942e+07
1,02.08942e+07
1,15.48242e+07
tau2 0 0
Variance component estimates:
[1] 0 0
Fixed-effect coefficients:
[1] -3.898816e+00 1.555553e-02 -4.100984e-03 -7.062043e-04 2.671797e-04
[6] 3.221108e-05 3.851775e-03 -1.980942e-03

Iteration 2 :
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 0
A1(0,0) -nan
iter from getPCG1ofSigmaAndVector 0
AI nan nan
nan nan

Trace nan
nan

YPAPY -nan
cov nan nan nan nan nan nan nan 0
0 nan nan nan nan nan nan 0
2.7816e-42 nan nan nan nan nan nan 0
0 nan nan nan nan nan nan 0
3.3637e-34 nan nan nan nan nan nan 0
0 nan nan nan nan nan nan 0
2.0000e+00 nan nan nan nan nan nan 0
0 nan nan nan nan nan nan nan
*** Error in /path/to/executable/R': double free or corruption (!prev): 0x0000000007e091d0 ***

Can't open archive

I saw your poster at ASHG and I'd like to try out SAIGE, but can't seem to open the archive. When I try running gunzip, it returns an error that the file is not in gzip format.

Thanks,
-Jonathan

Error in if (MAF >= testMinMAF & markerInfo >= minInfo)

Hi,

I am running SAIGE on ~700 GWAS samples. I got the error message from step2. Any help would be appreciated.

Many thanks,
Wei Wei

.
.
.
[1] 6728
numPassMarker: 973
user system elapsed
5.998 0.836 10.529
[1] 6730
numPassMarker: 973
Error in if (MAF >= testMinMAF & markerInfo >= minInfo) { :
missing value where TRUE/FALSE needed
Calls: SPAGMMATtest
Execution halted

Install fails with "Error in getOctD(x, offset, len) : invalid octal digit"

Hi,

Trying to install this package. It fails with the exact same error on CentOS 6.9 (R 3.5.0, also tried R 3.4.0, R 3.3.1) and Ubuntu 14.04 (tried R 3.5.0).

The output:

buildhost<14:03:05> R CMD INSTALL SAIGE_0.29.1_R_x86_64-pc-linux-gnu.tar.gz
Error in getOctD(x, offset, len) : invalid octal digit
buildhost<14:03:09>

Thanks!

terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

During startup - Warning message:
Setting LC_CTYPE failed, using "C"
Warning message:
package 'SAIGE' was built under R version 3.4.3
24  threads are set to be used
69716  samples have genotypes
formula is  LDL~Sex+batch+BirthYear+PC1+PC2+PC3+PC4
54888  samples have non-missing phenotypes
14828  samples in geno file do not have phenotypes
54888  samples will be used for analysis
colnames(data.new) is  Y minus1 Sex batch BirthYear PC1 PC2 PC3 PC4
out.transform$Param.transform$qrr:  8 8
LDL  is a quantitative trait
[1] "Start reading genotype plink file here"
nbyte: 17429
nbyte: 13722
reserve: 3427555328

M: 249749, N: 69716
size of genoVecofPointers: 2
here
2terminate called after throwing an instance of 'std::bad_alloc'
 what():  std::bad_alloc

Do you have an idea what might cause this? We're using 0.26. We have restarted the run with memoryChunk=0.5, we used 2 when the error above occured.

Installing SAIGE fails

Using R 3.4.2 and having installed all dependencies, some linker step fails "cannot find -lboost_iostreams". That library should be provided by the install package (-L../bgen/3rd_party/boost_1_55_0/boost/iostreams) or built from the sources in the install package. Could the build of the library have failed?

Permit separate .bim/.bed/.fam files

It would be ideal to allow the end-user to specify separate .bim/.bed/.fam files, rather than requiring that they all be at the same path with the same prefix. There are plenty of instances where that assumption falls down. (For example, with UK Biobank datasets shared at an institution level, the .bim/.bed files will be separate from the .fam files.)

Could you please put all gz files in archive?

When you move the new versions to archive/ it breaks my build scripts. The best practice is to have a fixed link to each version that is never changed.

Thanks.

Update cmd.sh examples to reflect minMAC limitations

In the README you note, "Since the SPA test always provides close to 0 p-values for variants with MAC < 3, please use at least minMAC = 3 to filter out the results"

In the cmd.sh examples, however, every example for which minMAC is specified has it set to 1. Might be good to update cmd.sh to reflect the README. I'd be happy to submit a pull request. (Or, you could consider programmatically blocking any minMAC < 3 if the results are never going to be sensible.)

Possible to speed up using speedglm instead of glm?

For, e.g., logistic regression on UK Biobank sized data sets with many covariates (e.g., all 40 PCs), speedglm is much faster to compute than glm. (In my experience, the speedup is at least a few orders of magnitude.) Because of the symlink issue in #41 , I don't think I can try to rebuild this, but I wonder if you'd consider testing a build using speedglm to see if it suits your needs.

UK biobank analysis

Hi,

I have been trying to analyse a binary trait in UK biobank first release but I keep getting this error.
[1] "Genotype reading is done"
inital tau is 1 0.5
iGet_Coef: 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1
iter from getPCG1ofSigmaAndVector 1

error: inv_sympd(): matrix is singular or not positive definite
Error in getCoefficients(Y, X, W, tau, maxiter = maxiterPCG, tol = tolPCG) :
inv_sympd(): matrix is singular or not positive definite
Timing stopped at: 1.502 0.1 1.955

Apparently the kinship matrix is non positive definite or is singular.
I have tried to use only 10k random SNPs, change the number of people, removing relatives etc.
I have already used these samples with GCTA so I am relatively confident this should not happen.
I was wondering if there is a way to feed it a previously estimated kinship matrix which I already know is invertible.

Thanks a lot

Nicola

Singular matrix error

Hi there,

I'm getting singular matrix problems from (I think) this line:

SAIGE/src/SAIGE_fitGLMM_fast.cpp

Line 889 in 9820dc2

arma::fmat cov = inv_sympd(Xmatt * Sigma_iX);

I am running it on a UK Biobank phenotype. The detailed error message is shown at the bottom.
Any help would be appreciated.

Thanks,
Zhihao

` 486801 samples have genotypes
formula is hypertension_medicated~body_mass_index+pc3+chip+pc2+pc4+pc1+age+pc5+sex+blood_pressure_medication
91215 samples have non-missing phenotypes
395626 samples in geno file do not have phenotypes
91175 samples will be used for analysis
colnames(data.new) is Y 1 body_mass_index pc3 chip pc2 pc4 pc1 age pc5 sex blood_pressure_medication
out.transform$Param.transform$qrr: 11 11
hypertension_medicated is a binary trait

 Call:  glm(formula = formula.new, family = binomial, data = data.new)

 Coefficients:
	   (Intercept)            body_mass_index
	     -24.40748                   -2.13629
		   pc3                       chip
	       0.14096                   -0.04922
		   pc2                        pc4
	      -0.03856                   -0.09343
		   pc1                        age
	       0.09213                    2.14034
		   pc5                        sex
	      -0.09750                    1.65620
 blood_pressure_medication
	      10.33080

 Degrees of Freedom: 91174 Total (i.e. Null);  91164 Residual
 Null Deviance:      62940
 Residual Deviance: 3940         AIC: 3962
 [1] "Start reading genotype plink file here"

 M: 10000, N: 486801
 0.0356183 0.00524815 0.0573787 0.0422704 0.0161503 0.056419 0.0380532 0.0499534 0.0246614 0.173924 0.0590019 0.0328434 0.0568632 0.0194681 0.0357774 0.000433233 0.0169783 0.196808 0.0223252 0.0615794 0.151456 0.0164409 0.0504086 0.0905895 0.0220126 0.0583932 0.00920208 0.391917 0.0197148 0.0419468 0.0351412 0.0826981 0.101892 0.2047 5.48396e-06 0.0134302 0.0203565 5.48396e-06 0.236501 0.0108253 0.0596874 0.00540718 5.48396e-06 0.0576638 0.221843 0.314116 0.0199452 0.0222704 0.0188539 0.0898821 0.151429 0.0376309 0.0303482 0.202764 0.280011 0.0270304 0.0276885 0.0379874 0.143855 0.0341431 0.0151522 0.388188 0.0336551 0.12164 0.050244 0.0571703 0.0311544 0.195585 0.0843323 0.0181135 0.0305292 0.0310995 0.126882 0.0520976 0.0198684 0.00971758 0.00109679 0.0340609 8.22594e-05 5.48396e-06 0.045994 0.0153825 5.48396e-05 0.0845462 0.159644 0.147584 0.229092 0.0432904 0.338207 0.496052 0.172218 7.12915e-05 0.0821223 0.3498 0.0433507 0.314176 0.0531615 0.0511818 0.0248643 5.48396e-06
 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 M = 10000
 N = 486801
 time: 28979.6
 [1] "Genotype reading is done"
 iGet_Coef:  1
 iter from getPCG1ofSigmaAndVector 5
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4
 iter from getPCG1ofSigmaAndVector 4

 error: inv_sympd(): matrix is singular or not positive definite
 Timing stopped at: 254.244 14.74 50.065

 Stderr:
 Warning message:
 package ‘SAIGE’ was built under R version 3.4.3
 Loading required package: optparse
 Error in getCoefficients(Y, X, W, tau, maxiter = maxiterPCG, tol = tolPCG) :
   inv_sympd(): matrix is singular or not positive definite
 Calls: fitNULLGLMM ... glmmkin.ai_PCG_Rcpp_Binary -> Get_Coef -> getCoefficients -> .Call
 In addition: Warning messages:
 1: glm.fit: fitted probabilities numerically 0 or 1 occurred
 2: glm.fit: fitted probabilities numerically 0 or 1 occurred
 Execution halted

Error running step 2 with VCF

My script is as follows (see below). How do I solve this error, (also below)? I do not specify a dosage file, but the default setting seem to look for one.

Rscript step2_SPAtests.R \
        --vcfFile=VCF.file.vcf \
        --vcfField=GT \
        --chrom=$number \
        --minMAF=0.0001 \
        --minMAC=1 \
        --GMMATmodelFile=file.chr$number.rda \
        --varianceRatioFile=file.chr$number.varianceRatio.txt \
        --SAIGEOutputFile=out.file.chr$number \
        --numLinesOutput=2 \
        --IsOutputAFinCaseCtrl=TRUE

I get this error:

Warning message:
package ‘SAIGE’ was built under R version 3.4.3
$dosageFile
[1] ""

$dosageFileNrowSkip
[1] 0

$dosageFileNcolSkip
[1] 5

$dosageFilecolnamesSkip
[1] ""

$vcfFile
[1] "file.vcf"

$vcfFileIndex
[1] ""

$vcfField
[1] "GT"

$bgenFile
[1] ""

$bgenFileIndex
[1] ""

$savFile
[1] ""

$savFileIndex
[1] ""

$chrom
[1] "1"

$start
[1] 1

$end
[1] 2.5e+08

$minMAF
[1] 1e-04

$minMAC
[1] 1

$sampleFile
[1] ""

$GMMATmodelFile
[1] "/mnt/users/masincla/aquagen_data/SAIGE/output/file.rda"

$varianceRatioFile
[1] "/mnt/users/masincla/aquagen_data/SAIGE/output/file.varianceRatio.txt"

$SAIGEOutputFile
[1] "output/file"

$numLinesOutput
[1] 2

$IsOutputAFinCaseCtrl
[1] TRUE

$help
[1] FALSE

Error in try(if (length(which(opt == "")) > 0) stop("Missing arguments")) :
Missing arguments
11115 samples have been used to fit the glmm null model
variance Ratio is 0.222
Error in SPAGMMATtest(dosageFile = opt$dosageFile, dosageFileNrowSkip = opt$dosageFileNrowSkip, :
ERROR! sampleFile does not exsit

Step 1 error

Hi Wei,

I have tried to run SAIGE on a large data set with 340,000 individuals and it showed the following error message,

Warning message:
package ‘SAIGE’ was built under R version 3.4.4
Loading required package: optparse
$plinkFile
[1] "/hpc/grid/hgcb/workspace/users/yez/UKBB/SAIGE_BINARY/data/ukb_v3_imp_unrelated_hwe10_ALL_HQ_genotyped_merged_cleaned_pruned"

$phenoFile
[1] "/hpc/grid/hgcb/workspace/users/yez/UKBB/SAIGE_BINARY/data/ukbb_n340409_qtnormBMI.phe.cov"

$phenoCol
[1] "binaryBMI"

$traitType
[1] "binary"

$invNormalize
[1] FALSE

$covarColList
[1] "PC1,PC2,PC3,PC4,PC5,PC6,PC5,PC6,PC7,PC8,PC9,PC10"

$sampleIDColinphenoFile
[1] "IID"

$numMarkers
[1] 30

$nThreads
[1] 24

$skipModelFitting
[1] FALSE

$traceCVcutoff
[1] 1

$ratioCVcutoff
[1] 1

$LOCO
[1] TRUE

$outputPrefix
[1] "/hpc/grid/hgcb/workspace/users/yez/UKBB/SAIGE_BINARY/results/UKBB_SAIGE_step1_test"

$help
[1] FALSE

24 threads are set to be used
Leave-one-chromosome-out is activated
chromosomeStartIndexVec: 0 28088 57702 83499 108995 131872 157626 178886 198156 214035 232879 251734 269521 283027 295004 305462 316430 326497 337098 345371 353896 359145
487409 samples have genotypes
formula is binaryBMI~PC1+PC2+PC3+PC4+PC5+PC6+PC5+PC6+PC7+PC8+PC9+PC10
340409 samples have non-missing phenotypes
147006 samples in geno file do not have phenotypes
340403 samples will be used for analysis
colnames(data.new) is Y minus1 PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
out.transform$Param.transform$qrr: 11 11
binaryBMI is a binary trait

Call: glm(formula = formula.new, family = binomial, data = data.new)

Coefficients:
minus1 PC1 PC2 PC3 PC4 PC5 PC6
3.068310 -0.016061 -0.012399 0.001987 0.059132 -0.043205 -0.016568
PC7 PC8 PC9 PC10
0.010491 -0.001013 0.021151 0.007057

Degrees of Freedom: 340403 Total (i.e. Null); 340392 Residual
Null Deviance: 471900
Residual Deviance: 123900 AIC: 124000
[1] "Start reading genotype plink file here"
nbyte: 121853
nbyte: 85101
reserve: 31015448576

M: 364446, N: 487409
size of genoVecofPointers: 23
here
setgeno mark1
setgeno mark2
setgeno mark5
setgeno mark6
time: 2.10495e+06
[1] "Genotype reading is done"
inital tau is 1 0.5
iGet_Coef: 1
iter from getPCG1ofSigmaAndVector 15
iter from getPCG1ofSigmaAndVector 12
iter from getPCG1ofSigmaAndVector 14
iter from getPCG1ofSigmaAndVector 13
iter from getPCG1ofSigmaAndVector 14
iter from getPCG1ofSigmaAndVector 15
iter from getPCG1ofSigmaAndVector 15
iter from getPCG1ofSigmaAndVector 13
iter from getPCG1ofSigmaAndVector 14
iter from getPCG1ofSigmaAndVector 13
iter from getPCG1ofSigmaAndVector 14
iter from getPCG1ofSigmaAndVector 13
Tau:[1] 1.0 0.5
Fixed-effect coefficients:
[,1]
[1,] 3.0682561398
[2,] -0.0152217094
[3,] -0.0111878449
[4,] 0.0004227166
[5,] 0.0554068275
[6,] -0.0415724479
[7,] -0.0142366933
[8,] 0.0095465006
[9,] -0.0020530727
[10,] 0.0213206727
[11,] 0.0076414300
...
*** caught segfault ***
address 0xd6e38, cause 'memory not mapped'

Traceback:
1: .Call("_SAIGE_getCoefficients", PACKAGE = "SAIGE", Yvec, Xmat, wVec, tauVec, maxiterPCG, tolPCG)
2: getCoefficients(Y, X, W, tau, maxiter = maxiterPCG, tol = tolPCG)
3: Get_Coef(y, X, tau, family, alpha0, eta0, offset, verbose = verbose, maxiterPCG = maxiterPCG, tolPCG = tolPCG, maxiter = maxiter)
4: glmmkin.ai_PCG_Rcpp_Binary(plinkFile, fit0, tau = c(0, 0), fixtau = c(0, 0), maxiter = maxiter, tol = tol, verbose = TRUE, nrun = 30, tolPCG = tolPCG, maxiterPCG = maxiterPCG, subPheno = dataMerge_sort, obj.noK = obj.noK, out.transform = out.transform, tauInit = tauInit, memoryChunk = memoryChunk, LOCO = LOCO, chromosomeStartIndexVec = chromosomeStartIndexVec, chromosomeEndIndexVec = chromosomeEndIndexVec, traceCVcutoff = traceCVcutoff)
5: system.time(modglmm <- glmmkin.ai_PCG_Rcpp_Binary(plinkFile, fit0, tau = c(0, 0), fixtau = c(0, 0), maxiter = maxiter, tol = tol, verbose = TRUE, nrun = 30, tolPCG = tolPCG, maxiterPCG = maxiterPCG, subPheno = dataMerge_sort, obj.noK = obj.noK, out.transform = out.transform, tauInit = tauInit, memoryChunk = memoryChunk, LOCO = LOCO, chromosomeStartIndexVec = chromosomeStartIndexVec, chromosomeEndIndexVec = chromosomeEndIndexVec, traceCVcutoff = traceCVcutoff))
6: fitNULLGLMM(plinkFile = opt$plinkFile, phenoFile = opt$phenoFile, phenoCol = opt$phenoCol, traitType = opt$traitType, invNormalize = opt$invNormalize, covarColList = covars, qCovarCol = NULL, sampleIDColinphenoFile = opt$sampleIDColinphenoFile, nThreads = opt$nThreads, numMarkers = opt$numMarkers, skipModelFitting = opt$skipModelFitting, traceCVcutoff = opt$traceCVcutoff, ratioCVcutoff = opt$ratioCVcutoff, LOCO = opt$LOCO, outputPrefix = opt$outputPrefix)
An irrecoverable exception occurred. R is aborting now ...

*** caught segfault ***
address 0x1042ac, cause 'memory not mapped'

*** caught segfault ***
address 0xd2e08, cause 'memory not mapped'

*** caught segfault ***
address 0x1083b4, cause 'memory not mapped'

*** caught segfault ***
address 0x15628, cause 'memory not mapped'

*** caught segfault ***
address 0xacd3c, cause 'memory not mapped'

*** caught segfault ***
address 0xd7294, cause 'memory not mapped'

*** caught segfault ***
address 0xd3238, cause 'memory not mapped'

*** caught segfault ***
address 0xdd23c, cause 'memory not mapped'

*** caught segfault ***
address 0xd81d4, cause 'memory not mapped'

*** caught segfault ***
address 0x40630, cause 'memory not mapped'

*** caught segfault ***
address 0xf2dcc, cause 'memory not mapped'

*** caught segfault ***
address 0xd6f74, cause 'memory not mapped'

*** caught segfault ***
address 0xdc184, cause 'memory not mapped'
...
Traceback:
1: .Call("_SAIGE_getCoefficients", PACKAGE = "SAIGE", Yvec, Xmat, wVec, tauVec, maxiterPCG, tolPCG)
2: getCoefficients(Y, X, W, tau, maxiter = maxiterPCG, tol = tolPCG)
3: Get_Coef(y, X, tau, family, alpha0, eta0, offset, verbose = verbose, maxiterPCG = maxiterPCG, tolPCG = tolPCG, maxiter = maxiter)
4: glmmkin.ai_PCG_Rcpp_Binary(plinkFile, fit0, tau = c(0, 0), fixtau = c(0, 0), maxiter = maxiter, tol = tol, verbose = TRUE, nrun = 30, tolPCG = tolPCG, maxiterPCG = maxiterPCG, subPheno = dataMerge_sort, obj.noK = obj.noK, out.transform = out.transform, tauInit = tauInit, memoryChunk = memoryChunk, LOCO = LOCO, chromosomeStartIndexVec = chromosomeStartIndexVec, chromosomeEndIndexVec = chromosomeEndIndexVec, traceCVcutoff = traceCVcutoff)
5: system.time(modglmm <- glmmkin.ai_PCG_Rcpp_Binary(plinkFile, fit0, tau = c(0, 0), fixtau = c(0, 0), maxiter = maxiter, tol = tol, verbose = TRUE, nrun = 30, tolPCG = tolPCG, maxiterPCG = maxiterPCG, subPheno = dataMerge_sort, obj.noK = obj.noK, out.transform = out.transform, tauInit = tauInit, memoryChunk = memoryChunk, LOCO = LOCO, chromosomeStartIndexVec = chromosomeStartIndexVec, chromosomeEndIndexVec = chromosomeEndIndexVec, traceCVcutoff = traceCVcutoff))
6: fitNULLGLMM(plinkFile = opt$plinkFile, phenoFile = opt$phenoFile, phenoCol = opt$phenoCol, traitType = opt$traitType, invNormalize = opt$invNormalize, covarColList = covars, qCovarCol = NULL, sampleIDColinphenoFile = opt$sampleIDColinphenoFile, nThreads = opt$nThreads, numMarkers = opt$numMarkers, skipModelFitting = opt$skipModelFitting, traceCVcutoff = opt$traceCVcutoff, ratioCVcutoff = opt$ratioCVcutoff, LOCO = opt$LOCO, outputPrefix = opt$outputPrefix)
An irrecoverable exception occurred. R is aborting now ...
/home/yez07/.lsbatch/1536322785.995250: line 8: 395109 Segmentation fault (core dumped) Rscript --vanilla /hpc/grid/hgcb/workspace/users/yez/UKBB/SAIGE_BINARY/rpgm/step1_fitNULLGLMM.R --plinkFile=/hpc/grid/hgcb/workspace/users/yez/UKBB/SAIGE_BINARY/data/ukb_v3_imp_unrelated_hwe10_ALL_HQ_genotyped_merged_cleaned_pruned --phenoFile=/hpc/grid/hgcb/workspace/users/yez/UKBB/SAIGE_BINARY/data/ukbb_qtnormBMI.phe.cov --phenoCol=binaryBMI --covarColList=PC1,PC2,PC3,PC4,PC5,PC6,PC5,PC6,PC7,PC8,PC9,PC10 --sampleIDColinphenoFile=IID --traitType=binary --outputPrefix=/hpc/grid/hgcb/workspace/users/yez/UKBB/SAIGE_BINARY/results/UKBB_SAIGE_step1_test --nThreads=24 --LOCO=TRUE

...

Started at Sun Sep 9 14:33:28 2018.
Terminated at Wed Sep 19 14:34:08 2018.
Results reported at Wed Sep 19 14:34:08 2018.

I have checked all the phenotypes and genotypes, they should be in the format as suggested by the SAIGE manual, and I have tried twice, which took me 20 days from start of submitting to quitting from the cluster. It would be really helpful if you could provide me some details how I should hunt down the issues. The message above is not a complete error message, if you want, I could upload the original log file.

Thanks a lot in advance!

Harold

Error in base::colSums(x, na.rm = na.rm, dims = dims, ...)

I am running Step 2 with a dosage file and I am unable to figure out how to trouble shoot this error. ANy ideas for how this can be fixed?

Error:
.....
numPassMarker: 50
Error in base::colSums(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
Calls: SPAGMMATtest ... Score_Test_Sparse -> colSums -> colSums ->
Execution halted

Input:

Rscript step2_SPAtests.R \
--dosageFile=file.traw \
--dosageFileNrowSkip=1 \
--dosageFileNcolSkip=6 \
--dosageFilecolnamesSkip="CHR","SNP","(C)M","POS","COUNTED","ALT" \
--chrom=$number \
--minMAF=0.0001 \
--sampleFile=sample.file.txt \
--GMMATmodelFile=file.rda \
--varianceRatioFile=file.varianceRatio.txt \
--SAIGEOutputFile=output.file\
--numLinesOutput=2 \
--IsOutputAFinCaseCtrl=TRUE

Build is not reproducible due to symlinks

I am trying to make a reproducible build, in order to fork this package to permit separate .bim/.bed/.fam files (see #40 ). However, I cannot easily reproduce your build, because this .git repo is full of symlinks. E.g.:

~/code/SAIGE/thirdParty/cget/bin $ ls -l
sav -> /net/hunt/zhowei/project/imbalancedCaseCtrlMixedModel/Rpackage_SPAGMMAT/SAIGE/thirdParty/cget/cget/pkg/statgen__savvy/install/bin/sav

As you can see, this file (and many others) symlink to something that expects to be at the path /net/hunt/zhowei . Is it possible to remove the symlinks and to instead include the dependencies?

Error with Fixed-effect coefficients (All NaN)

Hello all,

I am running SAIGE v0.26.6 and all of the fixed-effect coefficients are showing up as NaN. How can I fix this?

Here is the entire error:

Fixed-effect coefficients:
[,1]
[1,] NaN
[2,] NaN
[3,] NaN
[4,] NaN
[5,] NaN
[6,] NaN
[7,] NaN
[8,] NaN
[9,] NaN
[10,] NaN
[11,] NaN
[12,] NaN
[13,] NaN
[14,] NaN
[15,] NaN
[16,] NaN
[17,] NaN
[18,] NaN
[19,] NaN
[20,] NaN
[21,] NaN
[22,] NaN
[23,] NaN
Error in if (max(abs(alpha - alpha0)/(abs(alpha) + abs(alpha0) + tol.coef)) < :
missing value where TRUE/FALSE needed
Calls: fitNULLGLMM ... system.time -> glmmkin.ai_PCG_Rcpp_Binary -> Get_Coef
Timing stopped at: 298 3.256 159.1
Execution halted
Running step 2
Error in parse_args(parser, positional_arguments = 0) :
Error in getopt(spec = spec, opt = args) :
" " is not a valid option, or does not support an argument
Execution halted

ERROR: step2_SPAtests.R ... has B = 16 (not 8)

I got an error message in Step2_SPAtests below, where "1:214154719_G_T" is the first SNP's name in my BGEN file. My BGEN file includes only 38 SNPs and are in GP (genotype probabilities) format.
Your help will be appreciated very much.
Jianan

......
450106 samples have been used to fit the glmm null model
variance Ratio is 0.849
487409 sample IDs are found in sample file
[1] 487409 4
[1] "IID" "IndexInModel" "IndexDose.x" "IndexDose.y"
450106 samples were used in fitting the NULL glmm model and are found in sample file
minMAC: 1
minMAF: 1e-04
Minimum MAF of markers to be testd is 1e-04
Analysis started at 1.52e+09 Seconds
no query list is provided
487409 samples are found in the bgen file
38 markers are found in the bgen file
isVariant: TRUE
It is a binary trait
Analyzing 25529 cases and 424577 controls
ERROR: 1:214154719_G_T has B = 16 (not 8)

weizhouumich / saige Goto Github PK

saige's People

Contributors

Stargazers

Watchers

Forkers

saige's Issues

Recommend Projects

Recommend Topics

Recommend Org