Comments (11)
The BCFtools/mochatools plugin will infer which allele is the A and B allele as long as at least one homozygous AA or one homozygous BB allele is observed. All sites for which all samples are heterozygous will not be inferrable. It is simply not possible to do so. If you have enough samples in the VCF, this should not be a problem. Are you running the tool on a single sample VCF? My advice is to go back to the org that provides your dataset and tell them to do the right thing and give you the IDAT files (or CEL files if it is Affymetrix data)
from mocha.
I have attempted running it with both a single VCF (which I now understand why it would give an error), and then with a test VCF with 10 samples. The error persists.
Is this just a result of still not having a sufficient number of samples? Just want to resolve the issue before we run the algo to add LRR and BAF to tons of different VCFs.
Will try to follow up with the org but have had low success with them about this issue in the past.
from mocha.
With 10 samples in a VCF, for a very common variant with minor allele frequency close to 0.5 you still have ~1/1,000 chances that all samples will be heterozygous. So it is still possible that you will not be able to infer which one is ALLELE_A and which one is ALLELE_B for a few markers. To be safe, I think you need a VCF with at least ~30 samples from independent participants. Otherwise it is just not possible to retrieve this information. Remember that the root of the issue here is that the org that provides your dataset tossed that information away. This is not a limitation of MoChA
from mocha.
Hello, Figure 1 is the .vcf file format of .gtc file to conversion which comes from .idat file , which is different from the basic vcf format, could you tell me how to add ALLELE A/ALLELE B/GC/LRR/BAF mentioned in Figure 2?
from mocha.
BCFtools/gtc2vcf can automatically add ALLELE A/ALLELE B/GC/LRR/BAF when you convert a .gtc file. I have no idea what you refer to when you say basic vcf format. One thing for sure. If a VCF does not have LRR/BAF information, then there is no way to "add" this information
from mocha.
Hello, sorry to bother you. I have another problem. When I perform the shapeit step, it says that there is no AC field. But my VCF file is GTC converted, how should I solve this step?
from mocha.
SHAPEIT5, differently from SHAPEIT4, requires the AC and AN fields to be filled. You can quickly fill them with either of the following BCFtools commands:
bcftools view -c 0
bcftools +fill-AN-AC
from mocha.
Thank you. Sounds like 5 is a bit more complicated than 4. I've tried a lot of online methods to make shapeit4, but they didn't success. Could you provide the shapeit4 file that has already been compiled?
from mocha.
SHAPEIT4 and phase_common from SHAPEIT5 are identical other than requiring the AC and AN fields, with the advantage that SHAPEIT5 can handle trios. You can find binaries for SHAPEIT5 here. In the past to generate binaries for SHAPEIT4 I used the following Dockerfile:
FROM debian:testing-slim
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get -qqy update --fix-missing && \
apt-get -qqy install --no-install-recommends \
wget \
g++ \
make \
libboost-iostreams-dev \
libboost-program-options-dev \
libhts-dev \
libbz2-dev \
libssl-dev \
libboost-iostreams1.74.0 \
libboost-program-options1.74.0 \
bcftools && \
wget --no-check-certificate https://github.com/odelaneau/shapeit4/archive/v4.2.2.tar.gz && \
tar xzf v4.2.2.tar.gz && \
cd shapeit4-4.2.2 && \
sed -i 's/^HTSLIB_INC=\$(HOME)\/Tools\/htslib-1.11$/HTSLIB_INC=-Ihtslib/' makefile && \
sed -i 's/^HTSLIB_LIB=\$(HOME)\/Tools\/htslib-1.11\/libhts.a$/HTSLIB_LIB=-lhts/' makefile && \
sed -i 's/^BOOST_LIB_IO=\/usr\/lib\/x86_64-linux-gnu\/libboost_iostreams.a$/BOOST_LIB_IO=-lboost_iostreams/' makefile && \
sed -i 's/^BOOST_LIB_PO=\/usr\/lib\/x86_64-linux-gnu\/libboost_program_options.a$/BOOST_LIB_PO=-lboost_program_options/' makefile && \
make && \
mv bin/shapeit4.2 /usr/bin/ && \
cd .. && \
apt-get -qqy purge --auto-remove --option APT::AutoRemove::RecommendsImportant=false \
wget \
g++ \
make \
libboost-iostreams-dev \
libboost-program-options-dev \
libhts-dev \
libbz2-dev \
libssl-dev && \
apt-get -qqy clean && \
rm -rf v4.2.2.tar.gz \
shapeit4-4.2.2 \
/var/lib/apt/lists/*
from mocha.
Hello, when I use a VCF file to add a ALLELE_A or ALLELE_B, I use the above code and get an error:“Error: BAF format field is not present, cannot infer ALLELE_A or ALLELE_B”
VCF files were genotyped and exported by Axiom™ Analysis Suite.
from mocha.
Your VCF does not include intensity data so it would be pointless to identify which one is the A allele and which one is the B allele. I would advise you to go back to the table data generated by the Affymetrix Power Tools when you genotyped your samples and then use BCFtools/affy2vcf to generate a VCF with BAF, LRR, ALLELE_A, and ALLELE_B. Then you don't have to worry about file formatting issues
from mocha.
Related Issues (20)
- query about calling CNV with multi-samples HOT 1
- Conceptual Confusion about CNV and mCA HOT 2
- check bpm or not HOT 2
- Could not parse gender (0/1/2) in the sample statistics file HOT 2
- Questions about how to filter callset HOT 2
- Written 0 variants for all contigs HOT 1
- GC content in VCF file HOT 2
- The sequence "hs37d5" not found and "No BGZF EOF marker" errors HOT 1
- Any suggestions on filters for downstream analysis? HOT 2
- Imputation Error - terminate called after throwing an instance of std::length_error HOT 1
- no mLOY result but normal mLOX and mCA of male in autosomes HOT 6
- mochatools input format HOT 3
- Incomplete LOH call(s) on Chr12p HOT 10
- genetic map grch38 HOT 1
- Use combined 1kG and HRC as reference panel for imputation HOT 1
- Error [E::bgzf_read_block] Invalid BGZF header at offset 85619 HOT 1
- eLRR: Clarifcation on LRR adjustments HOT 6
- Can I Use CEL Files from Axiom Precision Medicine Research Array (PMRA) Release 3 to Generate CHP Files and Obtain VCF Files Annotated with GC, BAF, and LRR Content? HOT 8
- imp5Converter Output file format is not supported HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mocha.