syntheke / bayesr Goto Github PK
View Code? Open in Web Editor NEWBayesian hierarchical model for complex trait analysis
Bayesian hierarchical model for complex trait analysis
Hi,
first of all, thanks for your bayesR model.
I'm starting heavy testing on simulated (WGS) data... and I've encountered a couple of issues:
1) Line 60: write(21,903) 'SNP output ', snpout
Format 903 for snpout is i8, whereas snpout is logical. When running the pgm I get:
At line 60 of file bayesR.f90 (unit = 21, file = 'simout.log')
Fortran runtime error: Expected INTEGER for item 3 in formatted transfer, got LOGICAL
(a,t30,'= ',i8)
^
I've patched it locally by adding a new format and editing line 60 as below.
write(21,909) 'SNP output ', snpout
909 format(a,t30,'= ',l)
2) The readme file should be edited
The command line provided:
bayesR -bfile simdata -out simout -numit 10000 -burnin 5000 -seed 333
does not work when running the example provided.
It should be changed to:
bayesR -bfile example/simle -out simout -numit 10000 -burnin 5000 -seed 333
simout.log file now looks like this:
Program BayesR
Run started at 2015-01-30 09:34:47
Prefix for input files : example/simle
Prefix for output files : simout
Phenotype column = 1
No. of loci = 20000
No. of individuals = 1000
No. of training individuals = 1000
Prior Vara = 0.010000 -2.000000
Prior Vare = 0.010000 -2.000000
Model size = 0
No. of cycles = 100
Burnin = 50
Thinning rate = 10
No. of mixtures = 4
Variance of dist = 0.00000 0.00010 0.00100 0.01000
Dirichlet prior = 1.00000 1.00000 1.00000 1.00000
Seed = 333
SNP output = F
Again, thank you for your work. We will be testing it intensively in the next few weeks.
Best regards,
E. Nicolazzi
I tried to manually specified the prior variance of the SNP effect (-dfvara) but I get an error. For positive values it works
Example code
./bayesR -bfile d -out res -dfvara -2.0 (default value)
./bayesR -bfile d -out res -dfvara -3.0
Error message
Missing argument for :-dfvara 5
STOP ERROR: Problem parsing the command line arguments
Thanks in advance for your help.
Hi, I think this is not an issue per se but a specific case where I would like to print the effect size of all SNPs and not only the selected ones.
I need to sample from the posterior distribution the SNP effects. To run bayesR I’m using this command line:
bayesR -bfile trainPopData -out simout -numit 100000 -burnin 1000 -thin 10 -snpout
the -snpout
argument provides me with additional information for SNPs selected within the model in the format of: mixture class:SNP#:effect size
. However, I would like the effect size for all SNPs and not only for the selected ones. Please, do you know how can I obtain this full information?
Thank you very much
Could you please tell me how to set the proportion? Any suggestions or tips would be greatly appreciated.
Hello-
I am running the BayesR with the default parameters. As it runs it prints out the above message "Shape parameter must be positive" and I am wondering what this is referring to in regards to the input files?
Thanks
Hello, when I try to run a file with 50,000 markers, the program breaks without any error. The program hangs in sleep mode but never ends. Because there is no error message, I don't know what happened to the program. Is it because the number of markers exceeds the limit?
Hello,
I was wondering if there was a way to weight the phenotypes based on the number of records went into creating the phenotype? Similarly to the weighted analysis in GenSel?
Thanks
Hi,
I want to use BayesR model to do GWAS for my traits, and there are some covariates needed to be taken into consideration, but no descriptions about how to add covariates in the manual, so is there any way to input covariate file into BayesR software?
Thanks,
Best!
Hi
I installed bayesR with the following command :
git clone https://github.com/syntheke/bayesR.git
I installed gfortran with:
brew gcc
and when I tried to compile in src folder , I get this issue :
~/bayesR/src$ gfortran –o ../binary/bayesR –O2 -cpp RandomDistributions.f90 baymods.f90 bayesR.f90
ld: file not found: –o
collect2: error: ld returned 1 exit status
Could you help me on that?
Thank you
Dear Auth,
When I build bayesR with intel parallel studio 2020 upfate4 ifort:
ifort -o bayesR_ifort -O3 -axCORE-AVX512 -fpp RandomDistributions.f90 baymods.f90 bayesR.f90
Then I run the example1 as command:
./bayesRv2_ifort -bfile simdata -out simout -numit 10000 -burnin 5000 -seed 333 -nthreads 28
The total command run about more than 16 minutes.
I check your example that your simout.log, only run for more than 1 mintues.
Then I use "strace ./bayesRv2_ifort -bfile simdata -out simout -numit 10000 -burnin 5000 -seed 333 -nthreads 28"
I found at last lots of sched_yield() and the process wait and wait.
So my question is:
First, Is there any errors of my operations?
Second, in example1, what is your test server environment, such as os version, gcc/ifort version, build command etc.
Thanks a lot!
Thank you very much for developing this software. But now I have some doubts about the BayesR. It's mostly about computation time. There are 202840 SNPs and 8438 subjects. I include three covariates. In order to reduce the computation burden, according to the BayesRMannual published in Github (https://github.com/syntheke/bayesR), I used the “500SNPs” strategy by modifying the two options(--msize -mrep). With reference to your manual's experience about time and memory, I would guess that my data would take less than six hours. But actually I run the first step about 120 hours, the first step didn’t complete. I really can't find the reason, so please give me some guidance and suggestions. Thank you very much!
This is the command I used:
##train a model
./bayesRv2 -bfile traindata1 -out traindata1
-numit 50000 -burnin 20000 -seed 333
-blocksize 4 -nthreads 4
-msize 500 -mrep 5000
-covar cov_train1.txt
##prediction
./bayesRv2 -bfile testdata1 -out testdata1
-predict -model traindata1.model -freq traindata1.frq
-param traindata1.param -covar cov_test1.txt
-alpha traindata1.alpha
Kind Regards,
Jiefang
Based on the 'BayesRmanual-0.75.pdf' file, which saying the missing phenotypes to be coded as ‘NA’, I believe missing values are allowed in the .fam file.
forrtl: severe (24): end-of-file during read, unit -5, file Internal List-Directed Read
Image PC Routine Line Source
BayesRv2 000000010EA2E15A Unknown Unknown Unknown
BayesRv2 000000010EA56B97 Unknown Unknown Unknown
BayesRv2 000000010EA55582 Unknown Unknown Unknown
BayesRv2 000000010E9FE812 Unknown Unknown Unknown
BayesRv2 000000010EA0D16D Unknown Unknown Unknown
I tried to generate a new file by keeping only the individuals with valid phenotypic values and BayesRv2 works fine. To bypass the issue, I simply generated new plink binary files for each trait. But it would be great if anyone would check whether it is a common issue or BayesR.
FYI, my system is OSX.
Thanks
hi,
I tried BayesRv2, using the flat type input files, and noticed frequency of the "2" allele reported in the ".frq" file is not correct (see below)
0.326044
0.324882
0.325784
0.326069
0.321921
0.325860
0.321845
0.325358
0.321918
0.324215
0.328116
0.324279
0.326674
0.326469
...
while the correct frequency should be:
1
0
0
0.506483
0.780649
0.492503
0.870877
0.357338
0.142133
0.554992
0.532854
0.467058
0.800759
0.997883
...
would you have any thought on this issue?
I want to run BayesR with 428895 SNPs and 7621, but how can I give 120G memory to BayesR to save run time? Adding number of threads did not work. Following is my pbs script:
#!/bin/bash
#PBS -S /bin/bash
#PBS -N bayesR_1_1
#PBS -l walltime=40:00:00
#PBS -l procs=32
#PBS -l mem=200000mb
workDir=/home/fz4/gp/bayesR/HD/mutipleBreed
bayesRv2=/home/fz4/bin/exeSofts/bayesRv2
cd $workDir
$bayesRv2 -bfile adjADG_HD_train_breed1_cv1 -out adjADG_HD_train_breed1_cv1 -nthreads 32
Any tips and suggestions will be greatly appreciated.
Hi,
I am eager to use BayesR in my genomic prediction as I have heard good things; however, I seem to be having an issue. When I try to run the program with my data I get this error:
bayesR -bfile plink -out test_bayesR -numit 10000 -burnin 2000
forrtl: severe (24): end-of-file during read, unit -5, file Internal List-Directed Read
Image PC Routine Line Source
bayesR 00000000004A9A2E Unknown Unknown Unknown
bayesR 00000000004A84C6 Unknown Unknown Unknown
bayesR 0000000000461022 Unknown Unknown Unknown
bayesR 000000000041D5CB Unknown Unknown Unknown
bayesR 000000000041CB32 Unknown Unknown Unknown
bayesR 00000000004370D3 Unknown Unknown Unknown
bayesR 00000000004358FC Unknown Unknown Unknown
bayesR 000000000040BC57 Unknown Unknown Unknown
bayesR 00000000004146B1 Unknown Unknown Unknown
bayesR 000000000040322C Unknown Unknown Unknown
libc.so.6 00007F879EB1B445 Unknown Unknown Unknown
bayesR 0000000000403129 Unknown Unknown Unknown
I don't have any problem running the program with the example data, so I checked the differences between the simulated data and my own and found that columns 5 and 6 of the simulated data are either 0, 1 or 2, whereas columns 5 and 6 of default PLINK output are the reference and alternate alleles themselves, i.e. A, T, G, C. For example:
Simulated data:
1 rs1 0 88671 1 2
1 rs2 0 114576 1 2
1 rs3 0 115699 2 1
1 rs4 0 155552 2 1
1 rs5 0 175528 1 2
My data:
2323-211_BBMergePreAssembled_Trinity_TR10034_c1_g1_i1 . 0 558 T C
2323-211_BBMergePreAssembled_Trinity_TR10054_c0_g1_i1 . 0 1134 G A
2323-211_BBMergePreAssembled_Trinity_TR10092_c0_g1_i1 . 0 1108 A T
2323-211_BBMergePreAssembled_Trinity_TR10114_c1_g3_i1 . 0 441 G C
2323-211_BBMergePreAssembled_Trinity_TR10129_c0_g1_i1 . 0 56 T C
(We don't have chromosome mapping data for this organism, these are probes from which SNPs are called).
Is there a way of converting from the alleles to the dosage information when using this format, or indeed, is this actually the problem? Someone else here suggested that their problem was with missing data, but even when removing all individuals with missing phenotype data I still run into the error.
Any help would be appreciated,
Thanks,
Tal
Dear,
I am using this command line to include fixed effects in the model:
bayesR -bfile trainPopData -out simout -covar covariates.txt -numit 100000 -burnin 1000 -thin 10 -permute -snpout
It worked pretty well. However, I also have random effects that I would like to add in the model. Please, is there anyway to do that?
Thank you
There seem to be two problems regarding the fam file
forrtl: severe (24): end-of-file during read, unit -5, file Internal List-Directed Read
Image PC Routine Line Source
bayesR 0000000000432F0E Unknown Unknown Unknown
bayesR 00000000004594BD Unknown Unknown Unknown
bayesR 0000000000457C26 Unknown Unknown Unknown
bayesR 0000000000413E78 Unknown Unknown Unknown
bayesR 000000000042AE53 Unknown Unknown Unknown
bayesR 000000000040309E Unknown Unknown Unknown
libc-2.17.so 00002B8555068555 __libc_start_main Unknown Unknown
bayesR 0000000000402FA9 Unknown Unknown Unknown
Hi,
I'm running genomic prediction analysis using bayesR along with a few other methods. One thing I noticed is that BayesR seems to report effect sizes in opposite direction with the other models I ran. Specifically, BayesR reports effects for A2, but not A1. Can you help confirm if this is the correct understanding of the output? So that I can simply reverse the sign of the SNP weights.
Thanks!
Hello, I am trying to specify the heritability of my trait when using BayesR. For my input I have entered:
bayesRv2 -bfile bfile_name -vara 0.49 -dfvara -3.0 -out outfile_name
However, I receive the following error:
Missing argumnet for :-dfvara 5
ERROR: Problem parsing the command line arguments
When I input -dfvara 3.0 it works, but the log file indicates 3.0 and not -3.0
Is there a way to resolve this issue? Thank you for any help you can provide!
Hello,
I'm trying to use this tool to implement bayesR from some genomic prediction. I'm first trying to get my data to be formatted correctly. I have a VCF with several sites/individuals + a quantitative phenotype. I used plink2 to turn the vcf into .bed, .bim, and .fam files. Then when trying to run:
bayesRv2 -bfile input_prefix
I get this error
At line 1736 of file baymods.f90; Fortran runtime error: End of file
I tried several iterations of making sure my .fam and .bim files had all the information present in sample datasets and it looks right to me.
I compiled the code using gfortran
and it did work on sample data.
Thanks,
Evan L
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.