Comments (10)
Hi,
I'm not sure what the issue might be. Could you share the top 3 lines from each of your geno files that you want to merge?
from genomics_general.
Hi,
I'm not sure what the issue might be. Could you share the top 3 lines from each of your geno files that you want to merge?
Hi,
I have deleted them... but they just looked like the standard geno
file demonstrated by you.
BTW, I put my sample together in one vcf and ran the popgenWindows.py
with parameter --popsFile
and -p
for separating the population. It worked finally, but the result of pi, fst and dxy are quite different from vcftools's. I set the same windows and steps for the two methods, it really confuses me.
Please let me show you one example:
For pi by vcftools:
CHROM BIN_START BIN_END N_VARIANTS PI
scaffold_36 1 50000 579 0.00303697
For pi by popgenWindows.py:
scaffold,start,end,mid,sites,pi_sc
scaffold_36,1,50000,26960,582,0.2749
You could see the two result have huge difference, but actually the CHROM and window are same and the number of variants alike. I have no ideas what happen to this...
It would be really appreciate if you could share your valuable suggestions, looking forward to your reply.
from genomics_general.
Regarding the pi value, the difference is probably because you have only variant sites. One of the major differences between my approach and vcftools, is that my scripts don't assume that missing sites are invariant. In other words, you need to have both the variant and invariant sites in the geno file to get a meaningful pi value. There is some discussion of these issues in this preprint: https://doi.org/10.1101/2020.06.27.175091
For Fst it should have less effect, but note that the specific details of how Fst and pi is calculated can also differ between approaches, so you will see some variation. For example, the approaches can differ in how they handle sites that have incomplete data.
from genomics_general.
Regarding the pi value, the difference is probably because you have only variant sites. One of the major differences between my approach and vcftools, is that my scripts don't assume that missing sites are invariant. In other words, you need to have both the variant and invariant sites in the geno file to get a meaningful pi value. There is some discussion of these issues in this preprint: https://doi.org/10.1101/2020.06.27.175091
For Fst it should have less effect, but note that the specific details of how Fst and pi is calculated can also differ between approaches, so you will see some variation. For example, the approaches can differ in how they handle sites that have incomplete data.
Thanks so much for the help!
BTW, when I wanna take a look of the help information of some scripts, like ABBABABAwindows.py
, it said:
line 64 print("Sorter received result", resNumber, file=sys.stderr) SyntaxError: invalid syntax
-------------------------------------------------------^
theβ^β is point to the equal mark
Could you take a look of it? Or maybe I used it in a wrony way...
from genomics_general.
You get this error with the command python ABBABABAwindows.py -h
?
That is strange. I can't reproduce the error.
from genomics_general.
Oh I see it's a python version error. You need to use Python 3
from genomics_general.
Hi willright and Simon,
I am having the same issue as the one stated at the top of this thread. Willright, did you ever resolve this? I get the errors "Could not retrieve index file" and "Could not load .tbi/.csi index" when I try to convert my vcf to geno using the script parseVCFs.py. Below you can see my command and top several lines of the vcf. advice?? running on python 3
nohup python VCF_processing/parseVCFs.py -i scurra_viridula_zebrina_pileup_05jan2024.vcf.gz --skipIndels --minQual 30 --gtf flag=DP min=5 --threads 30 --fai jasmine-uni1728-mb-hirise-3bs35_08-29-2020__hic_output.fasta.fai > scurra_viridula_zebrina_pileup_17jan2024.geno &
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.9+htslib-1.9
##bcftoolsCommand=mpileup --threads 20 --skip-indels -q 30 -Q 20 -f /path/to/genome/jasmine-uni1728-mb-hirise-3bs35_08-29-2020__hic_output.fasta -Ov -o scurra_viridula_zebrina_pileup_05jan2024.vcf
names_of_124.bam
##reference=file:///path/to/genome/jasmine-uni1728-mb-hirise-3bs35_08-29-2020__hic_output.fasta
##contig=<ID=Sc1nRTI_1;HRSCAF=9,length=9724429>
##contig=<ID=Sc1nRTI_2;HRSCAF=53,length=43341626>
##contig=<ID=Sc1nRTI_3;HRSCAF=77,length=26245687>
##contig=<ID=Sc1nRTI_4;HRSCAF=306,length=762119>
##contig=<ID=Sc1nRTI_5;HRSCAF=352,length=47867>
from genomics_general.
Do you have bgzip and tabix installed. parseVCFs.py
only runs on a bgzipped vcf. Otherwise you can use the single-thread parseVCF.py
.
from genomics_general.
tabix is from samtools, correct? samtools is installed.
I definitely used something else to zip the vcf. Perhaps that is the problem. Will try again with bgzip.
Thank you for the quick reply!!
from genomics_general.
I discovered that parseVCFs.py only works if you have previously indexed the vcf with tabix. This requirement is not clearly written in the readme. Thanks!
from genomics_general.
Related Issues (20)
- filterGenotypes does not accept 'randomAlleles' as an output format HOT 6
- In the results of popgenWindows.py, Dxy > Fst HOT 2
- problem with phyml_sliding_windows.py HOT 6
- Can ABBA script was used with only one sample per species/population HOT 1
- ABBABABAwindows.py "TypeError: slidingCoordWindows()" HOT 2
- IndexError: list index out of range - freq & sfs HOT 2
- error with popgenWindows.py: "All populations must be represented by at least one sample." HOT 2
- ABBABABAwindows.py output halts half-way through scaffold (possibly due to outgroup?) HOT 2
- Query Regarding Fst Calculation Method in popgenWindows.py HOT 1
- popgenWindows only runs on some scaffolds HOT 4
- Error with parseVCF.py HOT 1
- error transcripts codingSiteTypes HOT 6
- parseVCF.py bug? HOT 1
- --minSites set HOT 1
- phased HOT 3
- DOI? HOT 1
- AssertionError: Sample ploidy (2) doesn't match number of sequences (8) HOT 1
- popgenWindows.py Dxy estimates are large with vcf with invariant sites HOT 7
- How to cite οΌ HOT 2
- ValueError: Sample B5 at Chr01:18606751 genotype . does not match explected ploidy of 2 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from genomics_general.