z0on / 2brad_denovo Goto Github PK
View Code? Open in Web Editor NEWGenome-wide de novo genotyping with 2bRAD
Genome-wide de novo genotyping with 2bRAD
use doBcf 1 instead of doVcf 1
Line 419 in 0e4e270
Hi Misha,
I hope all is well! I have a question about a recent ANGSD update (v0.933) which no longer supports the -doVcf flag and instead requires a -doBcf flag. This version now creates a bcf file instead of a vcf. The file format looks similar to the vcf file created by earlier ANGSD versions and both say they are format vcf v4.2, however, I think they may be coding missing data differently?
When I use PGDspider to convert the old vcf file to bayescan input the second column (twice the number of individuals in that pop) is the same across all loci for the pop. When I convert the bcf file to bayescan input the second column is slightly different for different loci within a pop. The bayescan manual says this can happen for different loci because it is accounting for missing data. See examples below:
#Converting vcf output from ANGSD v0.921 to Bayescan input following your code using PGDspider less vcf.bayescan
[loci]=10120
[populations]=8
[pop]=1
1 30 2 2 28
2 30 2 28 2
3 30 2 28 2
4 30 2 4 26
5 30 2 2 28
6 30 2 25 5
7 30 2 27 3
8 30 2 28 2
9 30 2 5 25
10 30 2 27 3
#Converting bcf output from ANGSD v0.933 to Bayescan input following your code using PGDspider less bcf.bayescan
[loci]=10120
[populations]=8
[pop]=1
1 28 2 2 26
2 28 2 26 2
3 28 2 27 1
4 30 2 4 26
5 30 2 2 28
6 26 2 21 5
7 28 2 25 3
8 28 2 26 2
9 28 2 5 23
10 28 2 25 3
I am currently running both to see if there are major differences between the two in number of outliers but I imagine there will be issues because the way it calculates the allele frequencies will be different. Which would be the better way to go? Thank you!
We are having an issue running HetMajorityProb.py. Python version is 2.7.12 Can you please help? Thanks
zcat sfilt.geno.gz | python ~/2bRAD_denovo/HetMajorityProb.py | awk '$6 < 0.75 {print $1"\t"$2}' > allSites
awk: cmd. line:1: $6 < 0.75 {print $1"\t"$2}
awk: cmd. line:1: ^ backslash not last character on line
Traceback (most recent call last):
File "/home/2bRAD_denovo/HetMajorityProb.py", line 28, in
stdout.write("\t".join([chrom, pos, str(len(pr_heteroz)), str(num_heteroz), str(h_expected), str(utail_prob)]) + "\n")
IOError: [Errno 32] Broken pipe
Hello,
I'm trying to install the required packages and I have an issue, most likely because I'm a noob.
I'm installing ngsF but I receive the following error message:
andrea@andrea-HP:~/ngsF$ make HTSSRC=../htslib
g++ -O3 -Wall -I -I/home/andrea/htslib -I -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -D_USE_KNETFILE ngsF.cpp parse_args.o read_data.o EM.o shared.o -lgsl -lgslcblas -lm -L -lz -lpthread /home/andrea/htslib/libhts.a -o ngsF
/usr/bin/ld: read_data.o: in function init_output(params*, out_data*)': read_data.cpp:(.text+0x253): undefined reference to
gzopen'
/usr/bin/ld: read_data.cpp:(.text+0x271): undefined reference to gzread' /usr/bin/ld: read_data.cpp:(.text+0x28f): undefined reference to
gzread'
/usr/bin/ld: read_data.cpp:(.text+0x2ac): undefined reference to gzread' /usr/bin/ld: read_data.cpp:(.text+0x32f): undefined reference to
gzread'
/usr/bin/ld: read_data.cpp:(.text+0x38e): undefined reference to `gzclose'
collect2: error: ld returned 1 exit status
make: *** [Makefile:40: ngsF] Error 1
I checked for zlib and it's installed on my system.
Any idea how to unstuck this?
Thanks!
Hi Misha,
For hard-calling SNPs using GATK, the function for GenomeAnalysisTK.jar / UnifiedGenotyper is no longer supported in the current version; instead it's replaced by HaplotypeCaller.
https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller
There are some file formatting and other issues with adapting UnifiedGenotyper to HaplotypeCaller in the updated version. Overall though was wondering if you had any suggestions on setting new parameters for HaplotypeCaller, vs. downloading and using the old version of GATK with UnifiedGenotyper, or running through ANGSD caller instead.
Thanks,
Shelby
Hello,
I have been using your scripts to format and thin the data from angsd to the dadi format to run GADMA.
When I run GADMA (on the full or thinned file) it stops with the following error message:
raise SyntaxError("Construction of data_dict failed: " + str(e))
SyntaxError: Construction of data_dict failed: 'Allele2' is not in list
Allele2 is in the header of the input file, for example, the first lines look as follow:
REF OUT Allele1 West East NEG Allele2 West East NEG Gene Position
cag CGG a 0 0 0 T 16 14 18 NW_021703766.1 18085
The GADMA developer suggested that there may be problem with the dadi format. I was wondering whether you had heard of similar issues and could advice on how to fix it.
Thank you very much,
Best wishes,
Marie
Hi,
I'm trying to convert files for two pops using this script but getting the error
Error in
[.data.frame(sfs, , 2) : undefined columns selected
The files look like this
==> ../SFS/wbm_par5_filtered.sfs <==
370852794.275136 796465.625156 671974.027855 455452.741808 352765.173275 270635.372224 212309.831657 171066.441767 139317.093726 117808.605686 106515.485767 97080.887813 84353.795309 75661.287553 71121.145427 68711.955636 67755.344356 68543.439203 73780.923838 77212.326136 94793.208109 94973.818405 11894313.194159
> ==> ../SFS/beng_par5_filtered.sfs <==
> 348398020.670957 616056.539667 434489.161227 304745.964758 242005.838217 199375.157620 167487.991092 143066.809409 125862.137423 109980.958992 99609.061586 90920.572153 83000.454505 77025.261765 74636.908585 69652.208053 68631.425787 66388.365671 66318.632284 74497.831215 107331.610928 160531.323274 11067692.114832
What could be wrong?
Hello,
With the recent release of Novaseq X series, is it still recommended to spike in 20% of PhiX libraries with your 2bRAD samples to avoid the problem of reading Invariant bases (adaptor, restriction site), or do we not have to worry about that with the newer sequencers?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.