z0on / 2brad_denovo Goto Github PK

View Code? Open in Web Editor NEW

21.0 21.0 20.0 21.45 MB

Genome-wide de novo genotyping with 2bRAD

Perl 52.57% R 12.95% Python 21.82% Awk 0.16% HiveQL 1.01% Shell 11.49%

2brad_denovo's People

Contributors

Stargazers

Watchers

2brad_denovo's Issues

deprecated code

use doBcf 1 instead of doVcf 1

2bRAD_denovo/2bRAD_README.sh

Line 419 in 0e4e270

 TODO="-doMajorMinor 1 -doMaf 1 -doCounts 1 -makeMatrix 1 -doIBS 1 -doCov 1 -doGeno 8 -doVcf 1 -doPost 1 -doGlf 2" 

Differences between ANGSD versions' vcf/bcf conversion to input file for Bayescan

Hi Misha,
I hope all is well! I have a question about a recent ANGSD update (v0.933) which no longer supports the -doVcf flag and instead requires a -doBcf flag. This version now creates a bcf file instead of a vcf. The file format looks similar to the vcf file created by earlier ANGSD versions and both say they are format vcf v4.2, however, I think they may be coding missing data differently?

When I use PGDspider to convert the old vcf file to bayescan input the second column (twice the number of individuals in that pop) is the same across all loci for the pop. When I convert the bcf file to bayescan input the second column is slightly different for different loci within a pop. The bayescan manual says this can happen for different loci because it is accounting for missing data. See examples below:

#Converting vcf output from ANGSD v0.921 to Bayescan input following your code using PGDspider less vcf.bayescan

[loci]=10120

[populations]=8

[pop]=1
1 30 2 2 28
2 30 2 28 2
3 30 2 28 2
4 30 2 4 26
5 30 2 2 28
6 30 2 25 5
7 30 2 27 3
8 30 2 28 2
9 30 2 5 25
10 30 2 27 3

#Converting bcf output from ANGSD v0.933 to Bayescan input following your code using PGDspider less bcf.bayescan
[loci]=10120

[populations]=8

[pop]=1
1 28 2 2 26
2 28 2 26 2
3 28 2 27 1
4 30 2 4 26
5 30 2 2 28
6 26 2 21 5
7 28 2 25 3
8 28 2 26 2
9 28 2 5 23
10 28 2 25 3

I am currently running both to see if there are major differences between the two in number of outliers but I imagine there will be issues because the way it calculates the allele frequencies will be different. Which would be the better way to go? Thank you!

error running HetMajorityProb.py

We are having an issue running HetMajorityProb.py. Python version is 2.7.12 Can you please help? Thanks

zcat sfilt.geno.gz | python ~/2bRAD_denovo/HetMajorityProb.py | awk '$6 < 0.75 {print $1"\t"$2}' > allSites
awk: cmd. line:1: $6 < 0.75 {print $1"\t"$2}
awk: cmd. line:1: ^ backslash not last character on line
Traceback (most recent call last):
File "/home/2bRAD_denovo/HetMajorityProb.py", line 28, in
stdout.write("\t".join([chrom, pos, str(len(pr_heteroz)), str(num_heteroz), str(h_expected), str(utail_prob)]) + "\n")
IOError: [Errno 32] Broken pipe

undefined reference to `gzopen'

Hello,
I'm trying to install the required packages and I have an issue, most likely because I'm a noob.
I'm installing ngsF but I receive the following error message:

andrea@andrea-HP:~/ngsF$ make HTSSRC=../htslib
g++ -O3 -Wall -I -I/home/andrea/htslib -I -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -D_USE_KNETFILE ngsF.cpp parse_args.o read_data.o EM.o shared.o -lgsl -lgslcblas -lm -L -lz -lpthread /home/andrea/htslib/libhts.a -o ngsF
/usr/bin/ld: read_data.o: in function init_output(params*, out_data*)': read_data.cpp:(.text+0x253): undefined reference to gzopen'
/usr/bin/ld: read_data.cpp:(.text+0x271): undefined reference to gzread' /usr/bin/ld: read_data.cpp:(.text+0x28f): undefined reference to gzread'
/usr/bin/ld: read_data.cpp:(.text+0x2ac): undefined reference to gzread' /usr/bin/ld: read_data.cpp:(.text+0x32f): undefined reference to gzread'
/usr/bin/ld: read_data.cpp:(.text+0x38e): undefined reference to `gzclose'
collect2: error: ld returned 1 exit status
make: *** [Makefile:40: ngsF] Error 1

I checked for zlib and it's installed on my system.
Any idea how to unstuck this?
Thanks!

GATK update

Hi Misha,

For hard-calling SNPs using GATK, the function for GenomeAnalysisTK.jar / UnifiedGenotyper is no longer supported in the current version; instead it's replaced by HaplotypeCaller.

https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller

There are some file formatting and other issues with adapting UnifiedGenotyper to HaplotypeCaller in the updated version. Overall though was wondering if you had any suggestions on setting new parameters for HaplotypeCaller, vs. downloading and using the old version of GATK with UnifiedGenotyper, or running through ANGSD caller instead.

Thanks,
Shelby

issue with running GADMA using the dadi output from realsfs2dadi

Hello,

I have been using your scripts to format and thin the data from angsd to the dadi format to run GADMA.

When I run GADMA (on the full or thinned file) it stops with the following error message:
raise SyntaxError("Construction of data_dict failed: " + str(e))
SyntaxError: Construction of data_dict failed: 'Allele2' is not in list

Allele2 is in the header of the input file, for example, the first lines look as follow:

REF OUT Allele1 West East NEG Allele2 West East NEG Gene Position
cag CGG a 0 0 0 T 16 14 18 NW_021703766.1 18085

The GADMA developer suggested that there may be problem with the dadi format. I was wondering whether you had heard of similar issues and could advice on how to fix it.

Thank you very much,
Best wishes,

Marie

running sfs2dadi.R

Hi,
I'm trying to convert files for two pops using this script but getting the error
Error in [.data.frame(sfs, , 2) : undefined columns selected The files look like this

==> ../SFS/wbm_par5_filtered.sfs <==
370852794.275136 796465.625156 671974.027855 455452.741808 352765.173275 270635.372224 212309.831657 171066.441767 139317.093726 117808.605686 106515.485767 97080.887813 84353.795309 75661.287553 71121.145427 68711.955636 67755.344356 68543.439203 73780.923838 77212.326136 94793.208109 94973.818405 11894313.194159

> ==> ../SFS/beng_par5_filtered.sfs <==
> 348398020.670957 616056.539667 434489.161227 304745.964758 242005.838217 199375.157620 167487.991092 143066.809409 125862.137423 109980.958992 99609.061586 90920.572153 83000.454505 77025.261765 74636.908585 69652.208053 68631.425787 66388.365671 66318.632284 74497.831215 107331.610928 160531.323274 11067692.114832

What could be wrong?

sequencing 2bRAD libraries on Hiseq 4000 to Novaseq X series

Hello,

With the recent release of Novaseq X series, is it still recommended to spike in 20% of PhiX libraries with your 2bRAD samples to avoid the problem of reading Invariant bases (adaptor, restriction site), or do we not have to worry about that with the newer sequencers?

z0on / 2brad_denovo Goto Github PK

2brad_denovo's People

Contributors

Stargazers

Watchers

Forkers

2brad_denovo's Issues

deprecated code

Differences between ANGSD versions' vcf/bcf conversion to input file for Bayescan

error running HetMajorityProb.py

undefined reference to `gzopen'

GATK update

issue with running GADMA using the dadi output from realsfs2dadi

running sfs2dadi.R

sequencing 2bRAD libraries on Hiseq 4000 to Novaseq X series

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent