syntheke / bayesr Goto Github PK

View Code? Open in Web Editor NEW

43.0 43.0 27.0 17.09 MB

Bayesian hierarchical model for complex trait analysis

Fortran 100.00%

bayesr's People

Contributors

Stargazers

Watchers

bayesr's Issues

Format/Readme file issue

Hi,
first of all, thanks for your bayesR model.
I'm starting heavy testing on simulated (WGS) data... and I've encountered a couple of issues:

1) Line 60: write(21,903) 'SNP output ', snpout
Format 903 for snpout is i8, whereas snpout is logical. When running the pgm I get:

At line 60 of file bayesR.f90 (unit = 21, file = 'simout.log')
Fortran runtime error: Expected INTEGER for item 3 in formatted transfer, got LOGICAL
(a,t30,'= ',i8)
            ^

I've patched it locally by adding a new format and editing line 60 as below.

In line 60: write(21,909) 'SNP output ', snpout
(Inserted) line 295: 909 format(a,t30,'= ',l)

2) The readme file should be edited

The command line provided:
bayesR -bfile simdata -out simout -numit 10000 -burnin 5000 -seed 333
does not work when running the example provided.
It should be changed to:
bayesR -bfile example/simle -out simout -numit 10000 -burnin 5000 -seed 333

simout.log file now looks like this:

Program BayesR
      Run started at 2015-01-30 09:34:47
Prefix for input files       : example/simle
Prefix for output files      : simout
Phenotype column             =        1
No. of loci                  =    20000
No. of individuals           =     1000
No. of training individuals  =     1000
Prior Vara                   =   0.010000 -2.000000
Prior Vare                   =   0.010000 -2.000000
Model size                   =        0
No. of cycles                =      100
Burnin                       =       50
Thinning rate                =       10
No. of mixtures              =        4
Variance of dist             =    0.00000   0.00010   0.00100   0.01000
Dirichlet prior              =    1.00000   1.00000   1.00000   1.00000
Seed                         =      333
SNP output                   = F

Again, thank you for your work. We will be testing it intensively in the next few weeks.
Best regards,
E. Nicolazzi

Problem with user specification of negative values for the degrees of freedom Va (-dfvara)

I tried to manually specified the prior variance of the SNP effect (-dfvara) but I get an error. For positive values it works

Example code

./bayesR -bfile d -out res -dfvara -2.0 (default value)
./bayesR -bfile d -out res -dfvara -3.0

Error message

Missing argument for :-dfvara 5
STOP ERROR: Problem parsing the command line arguments

Thanks in advance for your help.

Printing additional information for all SNPs with -snpout argument

Hi, I think this is not an issue per se but a specific case where I would like to print the effect size of all SNPs and not only the selected ones.
I need to sample from the posterior distribution the SNP effects. To run bayesR I’m using this command line:

bayesR -bfile trainPopData -out simout -numit 100000 -burnin 1000 -thin 10 -snpout

the -snpout argument provides me with additional information for SNPs selected within the model in the format of: mixture class:SNP#:effect size. However, I would like the effect size for all SNPs and not only for the selected ones. Please, do you know how can I obtain this full information?

Thank you very much

How to adjust the proportion of SNPs with no additive effect ?

After running BayesR with default parameters, the proportion of SNPs from 4 distribution (variance equals 0,0.0001,0.001,0.01) approximately are 0.84, 0.15, 0.005, 0.0006, respectively.
But I want to compare the difference of prediction using BayesR and BayesB, so when running BayesR, the proportion of SNPs with 0 varience should be set as 0.95, as same as π using BayesB.
“-delta 2” sets the prior to 2 for all mixture components. But what exactly is the parameter meaning? If the number is larger, how does it affect the result of genomic prodiction?

Could you please tell me how to set the proportion? Any suggestions or tips would be greatly appreciated.

Shape parameter must be positive

Hello-

I am running the BayesR with the default parameters. As it runs it prints out the above message "Shape parameter must be positive" and I am wondering what this is referring to in regards to the input files?

Thanks

Is there a site restriction

Hello, when I try to run a file with 50,000 markers, the program breaks without any error. The program hangs in sleep mode but never ends. Because there is no error message, I don't know what happened to the program. Is it because the number of markers exceeds the limit?

Way to do a weighted phenotype analysis with BayesR?

Hello,

I was wondering if there was a way to weight the phenotypes based on the number of records went into creating the phenotype? Similarly to the weighted analysis in GenSel?

Thanks

Could I input covariate file into BayesR model?

Hi,

I want to use BayesR model to do GWAS for my traits, and there are some covariates needed to be taken into consideration, but no descriptions about how to add covariates in the manual, so is there any way to input covariate file into BayesR software?

Thanks,
Best!

Issue to compile bayesR

I installed bayesR with the following command :

git clone https://github.com/syntheke/bayesR.git

I installed gfortran with:

brew gcc

and when I tried to compile in src folder , I get this issue :

~/bayesR/src$ gfortran –o ../binary/bayesR –O2 -cpp RandomDistributions.f90 baymods.f90 bayesR.f90
ld: file not found: –o
collect2: error: ld returned 1 exit status

Could you help me on that?

Thank you

bayesR performance problem

Dear Auth,
When I build bayesR with intel parallel studio 2020 upfate4 ifort:
ifort -o bayesR_ifort -O3 -axCORE-AVX512 -fpp RandomDistributions.f90 baymods.f90 bayesR.f90
Then I run the example1 as command:
./bayesRv2_ifort -bfile simdata -out simout -numit 10000 -burnin 5000 -seed 333 -nthreads 28
The total command run about more than 16 minutes.
I check your example that your simout.log, only run for more than 1 mintues.
Then I use "strace ./bayesRv2_ifort -bfile simdata -out simout -numit 10000 -burnin 5000 -seed 333 -nthreads 28"
I found at last lots of sched_yield() and the process wait and wait.
So my question is:
First, Is there any errors of my operations?
Second, in example1, what is your test server environment, such as os version, gcc/ifort version， build command etc.

Thanks a lot!

BayesR: Some questions about computation time of bayesR.

Thank you very much for developing this software. But now I have some doubts about the BayesR. It's mostly about computation time. There are 202840 SNPs and 8438 subjects. I include three covariates. In order to reduce the computation burden, according to the BayesRMannual published in Github (https://github.com/syntheke/bayesR), I used the “500SNPs” strategy by modifying the two options(--msize -mrep). With reference to your manual's experience about time and memory, I would guess that my data would take less than six hours. But actually I run the first step about 120 hours, the first step didn’t complete. I really can't find the reason, so please give me some guidance and suggestions. Thank you very much!
This is the command I used:
##train a model
./bayesRv2 -bfile traindata1 -out traindata1
-numit 50000 -burnin 20000 -seed 333
-blocksize 4 -nthreads 4
-msize 500 -mrep 5000
-covar cov_train1.txt

##prediction
./bayesRv2 -bfile testdata1 -out testdata1
-predict -model traindata1.model -freq traindata1.frq
-param traindata1.param -covar cov_test1.txt
-alpha traindata1.alpha

Kind Regards,

Jiefang

Missing phenotypic values in .fam file

Based on the 'BayesRmanual-0.75.pdf' file, which saying the missing phenotypes to be coded as ‘NA’, I believe missing values are allowed in the .fam file.

However, as I included missing values in the fam (denoted as NA), BayesRv2 generated the following error message

forrtl: severe (24): end-of-file during read, unit -5, file Internal List-Directed Read
Image PC Routine Line Source
BayesRv2 000000010EA2E15A Unknown Unknown Unknown
BayesRv2 000000010EA56B97 Unknown Unknown Unknown
BayesRv2 000000010EA55582 Unknown Unknown Unknown
BayesRv2 000000010E9FE812 Unknown Unknown Unknown
BayesRv2 000000010EA0D16D Unknown Unknown Unknown

BayesRv2 000000010E9F74AE Unknown Unknown Unknown

I tried to generate a new file by keeping only the individuals with valid phenotypic values and BayesRv2 works fine. To bypass the issue, I simply generated new plink binary files for each trait. But it would be great if anyone would check whether it is a common issue or BayesR.

FYI, my system is OSX.

Thanks

allele frequency

hi,
I tried BayesRv2, using the flat type input files, and noticed frequency of the "2" allele reported in the ".frq" file is not correct (see below)
0.326044
0.324882
0.325784
0.326069
0.321921
0.325860
0.321845
0.325358
0.321918
0.324215
0.328116
0.324279
0.326674
0.326469
...

while the correct frequency should be:
1
0
0
0.506483
0.780649
0.492503
0.870877
0.357338
0.142133
0.554992
0.532854
0.467058
0.800759
0.997883
...

would you have any thought on this issue?

How do I run BayesR with large memory ?

I want to run BayesR with 428895 SNPs and 7621, but how can I give 120G memory to BayesR to save run time? Adding number of threads did not work. Following is my pbs script:

#!/bin/bash
#PBS -S /bin/bash
#PBS -N bayesR_1_1
#PBS -l walltime=40:00:00
#PBS -l procs=32
#PBS -l mem=200000mb

workDir=/home/fz4/gp/bayesR/HD/mutipleBreed
bayesRv2=/home/fz4/bin/exeSofts/bayesRv2

cd $workDir
$bayesRv2 -bfile adjADG_HD_train_breed1_cv1 -out adjADG_HD_train_breed1_cv1 -nthreads 32

Any tips and suggestions will be greatly appreciated.

End-of-file during read error, discrepancy between PLINK .bim file and simulated .bim file?

Hi,

I am eager to use BayesR in my genomic prediction as I have heard good things; however, I seem to be having an issue. When I try to run the program with my data I get this error:

bayesR -bfile plink -out test_bayesR -numit 10000 -burnin 2000

forrtl: severe (24): end-of-file during read, unit -5, file Internal List-Directed Read
Image PC Routine Line Source
bayesR 00000000004A9A2E Unknown Unknown Unknown
bayesR 00000000004A84C6 Unknown Unknown Unknown
bayesR 0000000000461022 Unknown Unknown Unknown
bayesR 000000000041D5CB Unknown Unknown Unknown
bayesR 000000000041CB32 Unknown Unknown Unknown
bayesR 00000000004370D3 Unknown Unknown Unknown
bayesR 00000000004358FC Unknown Unknown Unknown
bayesR 000000000040BC57 Unknown Unknown Unknown
bayesR 00000000004146B1 Unknown Unknown Unknown
bayesR 000000000040322C Unknown Unknown Unknown
libc.so.6 00007F879EB1B445 Unknown Unknown Unknown
bayesR 0000000000403129 Unknown Unknown Unknown

I don't have any problem running the program with the example data, so I checked the differences between the simulated data and my own and found that columns 5 and 6 of the simulated data are either 0, 1 or 2, whereas columns 5 and 6 of default PLINK output are the reference and alternate alleles themselves, i.e. A, T, G, C. For example:

Simulated data:
1 rs1 0 88671 1 2
1 rs2 0 114576 1 2
1 rs3 0 115699 2 1
1 rs4 0 155552 2 1
1 rs5 0 175528 1 2

My data:
2323-211_BBMergePreAssembled_Trinity_TR10034_c1_g1_i1 . 0 558 T C
2323-211_BBMergePreAssembled_Trinity_TR10054_c0_g1_i1 . 0 1134 G A
2323-211_BBMergePreAssembled_Trinity_TR10092_c0_g1_i1 . 0 1108 A T
2323-211_BBMergePreAssembled_Trinity_TR10114_c1_g3_i1 . 0 441 G C
2323-211_BBMergePreAssembled_Trinity_TR10129_c0_g1_i1 . 0 56 T C
(We don't have chromosome mapping data for this organism, these are probes from which SNPs are called).

Is there a way of converting from the alleles to the dosage information when using this format, or indeed, is this actually the problem? Someone else here suggested that their problem was with missing data, but even when removing all individuals with missing phenotype data I still run into the error.

Any help would be appreciated,
Thanks,
Tal

How include random effects in the model

Dear,

I am using this command line to include fixed effects in the model:

bayesR -bfile trainPopData -out simout -covar covariates.txt -numit 100000 -burnin 1000 -thin 10 -permute -snpout

It worked pretty well. However, I also have random effects that I would like to add in the model. Please, is there anyway to do that?
Thank you

fam file

There seem to be two problems regarding the fam file

error occurs if columns are tab delimited
error occurs if a phenotype is loaded with -n [num] where num > 55; see below for an error

forrtl: severe (24): end-of-file during read, unit -5, file Internal List-Directed Read
Image PC Routine Line Source
bayesR 0000000000432F0E Unknown Unknown Unknown
bayesR 00000000004594BD Unknown Unknown Unknown
bayesR 0000000000457C26 Unknown Unknown Unknown
bayesR 0000000000413E78 Unknown Unknown Unknown
bayesR 000000000042AE53 Unknown Unknown Unknown
bayesR 000000000040309E Unknown Unknown Unknown
libc-2.17.so 00002B8555068555 __libc_start_main Unknown Unknown
bayesR 0000000000402FA9 Unknown Unknown Unknown

direction of effect estimates

Hi,

I'm running genomic prediction analysis using bayesR along with a few other methods. One thing I noticed is that BayesR seems to report effect sizes in opposite direction with the other models I ran. Specifically, BayesR reports effects for A2, but not A1. Can you help confirm if this is the correct understanding of the output? So that I can simply reverse the sign of the SNP weights.

Thanks!

Using -dfvara -3.0

Hello, I am trying to specify the heritability of my trait when using BayesR. For my input I have entered:

bayesRv2 -bfile bfile_name -vara 0.49 -dfvara -3.0 -out outfile_name

However, I receive the following error:
Missing argumnet for :-dfvara 5
ERROR: Problem parsing the command line arguments

When I input -dfvara 3.0 it works, but the log file indicates 3.0 and not -3.0

Is there a way to resolve this issue? Thank you for any help you can provide!

At line 1736 of file baymods.f90; Fortran runtime error: End of file

Hello,
I'm trying to use this tool to implement bayesR from some genomic prediction. I'm first trying to get my data to be formatted correctly. I have a VCF with several sites/individuals + a quantitative phenotype. I used plink2 to turn the vcf into .bed, .bim, and .fam files. Then when trying to run:
bayesRv2 -bfile input_prefix
I get this error
At line 1736 of file baymods.f90; Fortran runtime error: End of file
I tried several iterations of making sure my .fam and .bim files had all the information present in sample datasets and it looks right to me.
I compiled the code using gfortran and it did work on sample data.
Thanks,
Evan L

syntheke / bayesr Goto Github PK

bayesr's People

Contributors

Stargazers

Watchers

Forkers

bayesr's Issues

However, as I included missing values in the fam (denoted as NA), BayesRv2 generated the following error message

BayesRv2 000000010E9F74AE Unknown Unknown Unknown

Recommend Projects

Recommend Topics

Recommend Org