Git Product home page Git Product logo

2brad_denovo's People

Contributors

z0on avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

2brad_denovo's Issues

Differences between ANGSD versions' vcf/bcf conversion to input file for Bayescan

Hi Misha,
I hope all is well! I have a question about a recent ANGSD update (v0.933) which no longer supports the -doVcf flag and instead requires a -doBcf flag. This version now creates a bcf file instead of a vcf. The file format looks similar to the vcf file created by earlier ANGSD versions and both say they are format vcf v4.2, however, I think they may be coding missing data differently?

When I use PGDspider to convert the old vcf file to bayescan input the second column (twice the number of individuals in that pop) is the same across all loci for the pop. When I convert the bcf file to bayescan input the second column is slightly different for different loci within a pop. The bayescan manual says this can happen for different loci because it is accounting for missing data. See examples below:

#Converting vcf output from ANGSD v0.921 to Bayescan input following your code using PGDspider less vcf.bayescan

[loci]=10120

[populations]=8

[pop]=1
1 30 2 2 28
2 30 2 28 2
3 30 2 28 2
4 30 2 4 26
5 30 2 2 28
6 30 2 25 5
7 30 2 27 3
8 30 2 28 2
9 30 2 5 25
10 30 2 27 3

#Converting bcf output from ANGSD v0.933 to Bayescan input following your code using PGDspider less bcf.bayescan
[loci]=10120

[populations]=8

[pop]=1
1 28 2 2 26
2 28 2 26 2
3 28 2 27 1
4 30 2 4 26
5 30 2 2 28
6 26 2 21 5
7 28 2 25 3
8 28 2 26 2
9 28 2 5 23
10 28 2 25 3

I am currently running both to see if there are major differences between the two in number of outliers but I imagine there will be issues because the way it calculates the allele frequencies will be different. Which would be the better way to go? Thank you!

error running HetMajorityProb.py

We are having an issue running HetMajorityProb.py. Python version is 2.7.12 Can you please help? Thanks

zcat sfilt.geno.gz | python ~/2bRAD_denovo/HetMajorityProb.py | awk '$6 < 0.75 {print $1"\t"$2}' > allSites
awk: cmd. line:1: $6 < 0.75 {print $1"\t"$2}
awk: cmd. line:1: ^ backslash not last character on line
Traceback (most recent call last):
File "/home/2bRAD_denovo/HetMajorityProb.py", line 28, in
stdout.write("\t".join([chrom, pos, str(len(pr_heteroz)), str(num_heteroz), str(h_expected), str(utail_prob)]) + "\n")
IOError: [Errno 32] Broken pipe

undefined reference to `gzopen'

Hello,
I'm trying to install the required packages and I have an issue, most likely because I'm a noob.
I'm installing ngsF but I receive the following error message:

andrea@andrea-HP:~/ngsF$ make HTSSRC=../htslib
g++ -O3 -Wall -I -I/home/andrea/htslib -I -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -D_USE_KNETFILE ngsF.cpp parse_args.o read_data.o EM.o shared.o -lgsl -lgslcblas -lm -L -lz -lpthread /home/andrea/htslib/libhts.a -o ngsF
/usr/bin/ld: read_data.o: in function init_output(params*, out_data*)': read_data.cpp:(.text+0x253): undefined reference to gzopen'
/usr/bin/ld: read_data.cpp:(.text+0x271): undefined reference to gzread' /usr/bin/ld: read_data.cpp:(.text+0x28f): undefined reference to gzread'
/usr/bin/ld: read_data.cpp:(.text+0x2ac): undefined reference to gzread' /usr/bin/ld: read_data.cpp:(.text+0x32f): undefined reference to gzread'
/usr/bin/ld: read_data.cpp:(.text+0x38e): undefined reference to `gzclose'
collect2: error: ld returned 1 exit status
make: *** [Makefile:40: ngsF] Error 1

I checked for zlib and it's installed on my system.
Any idea how to unstuck this?
Thanks!

GATK update

Hi Misha,

For hard-calling SNPs using GATK, the function for GenomeAnalysisTK.jar / UnifiedGenotyper is no longer supported in the current version; instead it's replaced by HaplotypeCaller.

https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller

There are some file formatting and other issues with adapting UnifiedGenotyper to HaplotypeCaller in the updated version. Overall though was wondering if you had any suggestions on setting new parameters for HaplotypeCaller, vs. downloading and using the old version of GATK with UnifiedGenotyper, or running through ANGSD caller instead.

Thanks,
Shelby

issue with running GADMA using the dadi output from realsfs2dadi

Hello,

I have been using your scripts to format and thin the data from angsd to the dadi format to run GADMA.

When I run GADMA (on the full or thinned file) it stops with the following error message:
raise SyntaxError("Construction of data_dict failed: " + str(e))
SyntaxError: Construction of data_dict failed: 'Allele2' is not in list

Allele2 is in the header of the input file, for example, the first lines look as follow:

REF OUT Allele1 West East NEG Allele2 West East NEG Gene Position
cag CGG a 0 0 0 T 16 14 18 NW_021703766.1 18085

The GADMA developer suggested that there may be problem with the dadi format. I was wondering whether you had heard of similar issues and could advice on how to fix it.

Thank you very much,
Best wishes,

Marie

running sfs2dadi.R

Hi,
I'm trying to convert files for two pops using this script but getting the error
Error in [.data.frame(sfs, , 2) : undefined columns selected The files look like this

==> ../SFS/wbm_par5_filtered.sfs <==
370852794.275136 796465.625156 671974.027855 455452.741808 352765.173275 270635.372224 212309.831657 171066.441767 139317.093726 117808.605686 106515.485767 97080.887813 84353.795309 75661.287553 71121.145427 68711.955636 67755.344356 68543.439203 73780.923838 77212.326136 94793.208109 94973.818405 11894313.194159
> ==> ../SFS/beng_par5_filtered.sfs <==
> 348398020.670957 616056.539667 434489.161227 304745.964758 242005.838217 199375.157620 167487.991092 143066.809409 125862.137423 109980.958992 99609.061586 90920.572153 83000.454505 77025.261765 74636.908585 69652.208053 68631.425787 66388.365671 66318.632284 74497.831215 107331.610928 160531.323274 11067692.114832

What could be wrong?

sequencing 2bRAD libraries on Hiseq 4000 to Novaseq X series

Hello,

With the recent release of Novaseq X series, is it still recommended to spike in 20% of PhiX libraries with your 2bRAD samples to avoid the problem of reading Invariant bases (adaptor, restriction site), or do we not have to worry about that with the newer sequencers?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.