Comments (2)
Hi Tal,
Thanks for your question, it is a perfectly reasonable one! The similarity between pixy's estimates and estimators that assume missing sites are homozygous for the reference allele is a function of the amount of missing data. The more missing data there are, the more biased the estimates of pi and dxy become. In cases where missing data is very high (e.g. computing over windows using RADseq or similar type data), pixy will differ a great deal from other estimators.
If one wanted to standardize estimates post hoc, the key quantity would be the number of sites in the window of interest that had sufficient depth to call a genotype in the focal individuals (i.e. those that would have appeared as invariant sites in an all-sites VCFs). Silas Tittes made a really nice command-line program for computing this quantity from bam files https://github.com/RILAB/mop. You would need access to the original bam files to compute it, but that is unfortunately the only way I am aware of doing this!
I hope that helps. This is unfortunately a very common problem and there isn't really a great solution for it other than encouraging better practices!
from pixy.
Thanks Kieran. That's what I was expecting I suppose :). Glad to see at least that more people are using pixy in publications over the last year. I will check out the Mop program.
from pixy.
Related Issues (20)
- Pandas error: too many columns specified HOT 2
- Pixy estimations are wrong HOT 2
- csi index HOT 2
- IndexError: list index out of range
- Problem running pi on pixy HOT 5
- Pixy does not detect variable sites HOT 2
- Hello, While running the pixy software after converting vcf file with invariant sites using bcftools, I am getting the following error:UnicodeDecodeError: 'utf-8' codec cant decode byte 0x8b in position 1: invalid start byte; Could you please help me to resolve this error? HOT 4
- attribute error HOT 4
- Support for New Missing Data Formatting from GATK
- ability to handle sex chromosomes HOT 2
- Attribute Error During Help Menu Call HOT 5
- No variable sites HOT 3
- Version of numpy and htslib doesn't work HOT 2
- Why are the results so different when using pixy and vcftools HOT 5
- pixy (apparently) not using all requested cores
- Provide convenient sample data in a package, such as sample data CHrX in sample scripts
- Error during install from conda HOT 1
- ValueError: invalid literal for int() with base 10: '' HOT 2
- Overestimation of number differences? HOT 1
- "the provided VCF appears to contain no invariant sites" HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pixy.