Git Product home page Git Product logo

Comments (11)

freeseek avatar freeseek commented on August 28, 2024

The BCFtools/mochatools plugin will infer which allele is the A and B allele as long as at least one homozygous AA or one homozygous BB allele is observed. All sites for which all samples are heterozygous will not be inferrable. It is simply not possible to do so. If you have enough samples in the VCF, this should not be a problem. Are you running the tool on a single sample VCF? My advice is to go back to the org that provides your dataset and tell them to do the right thing and give you the IDAT files (or CEL files if it is Affymetrix data)

from mocha.

AkshajD avatar AkshajD commented on August 28, 2024

I have attempted running it with both a single VCF (which I now understand why it would give an error), and then with a test VCF with 10 samples. The error persists.

Is this just a result of still not having a sufficient number of samples? Just want to resolve the issue before we run the algo to add LRR and BAF to tons of different VCFs.

Will try to follow up with the org but have had low success with them about this issue in the past.

from mocha.

freeseek avatar freeseek commented on August 28, 2024

With 10 samples in a VCF, for a very common variant with minor allele frequency close to 0.5 you still have ~1/1,000 chances that all samples will be heterozygous. So it is still possible that you will not be able to infer which one is ALLELE_A and which one is ALLELE_B for a few markers. To be safe, I think you need a VCF with at least ~30 samples from independent participants. Otherwise it is just not possible to retrieve this information. Remember that the root of the issue here is that the org that provides your dataset tossed that information away. This is not a limitation of MoChA

from mocha.

Tianwen-lab-star avatar Tianwen-lab-star commented on August 28, 2024

2
1
Hello, Figure 1 is the .vcf file format of .gtc file to conversion which comes from .idat file , which is different from the basic vcf format, could you tell me how to add ALLELE A/ALLELE B/GC/LRR/BAF mentioned in Figure 2?

from mocha.

freeseek avatar freeseek commented on August 28, 2024

BCFtools/gtc2vcf can automatically add ALLELE A/ALLELE B/GC/LRR/BAF when you convert a .gtc file. I have no idea what you refer to when you say basic vcf format. One thing for sure. If a VCF does not have LRR/BAF information, then there is no way to "add" this information

from mocha.

Tianwen-lab-star avatar Tianwen-lab-star commented on August 28, 2024

image
image
Hello, sorry to bother you. I have another problem. When I perform the shapeit step, it says that there is no AC field. But my VCF file is GTC converted, how should I solve this step?

from mocha.

freeseek avatar freeseek commented on August 28, 2024

SHAPEIT5, differently from SHAPEIT4, requires the AC and AN fields to be filled. You can quickly fill them with either of the following BCFtools commands:

bcftools view -c 0
bcftools +fill-AN-AC

from mocha.

Tianwen-lab-star avatar Tianwen-lab-star commented on August 28, 2024

Thank you. Sounds like 5 is a bit more complicated than 4. I've tried a lot of online methods to make shapeit4, but they didn't success. Could you provide the shapeit4 file that has already been compiled?

from mocha.

freeseek avatar freeseek commented on August 28, 2024

SHAPEIT4 and phase_common from SHAPEIT5 are identical other than requiring the AC and AN fields, with the advantage that SHAPEIT5 can handle trios. You can find binaries for SHAPEIT5 here. In the past to generate binaries for SHAPEIT4 I used the following Dockerfile:

FROM debian:testing-slim
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get -qqy update --fix-missing && \
    apt-get -qqy install --no-install-recommends \
                 wget \
                 g++ \
                 make \
                 libboost-iostreams-dev \
                 libboost-program-options-dev \
                 libhts-dev \
                 libbz2-dev \
                 libssl-dev \
                 libboost-iostreams1.74.0 \
                 libboost-program-options1.74.0 \
                 bcftools && \
    wget --no-check-certificate https://github.com/odelaneau/shapeit4/archive/v4.2.2.tar.gz && \
    tar xzf v4.2.2.tar.gz && \
    cd shapeit4-4.2.2 && \
    sed -i 's/^HTSLIB_INC=\$(HOME)\/Tools\/htslib-1.11$/HTSLIB_INC=-Ihtslib/' makefile && \
    sed -i 's/^HTSLIB_LIB=\$(HOME)\/Tools\/htslib-1.11\/libhts.a$/HTSLIB_LIB=-lhts/' makefile && \
    sed -i 's/^BOOST_LIB_IO=\/usr\/lib\/x86_64-linux-gnu\/libboost_iostreams.a$/BOOST_LIB_IO=-lboost_iostreams/' makefile && \
    sed -i 's/^BOOST_LIB_PO=\/usr\/lib\/x86_64-linux-gnu\/libboost_program_options.a$/BOOST_LIB_PO=-lboost_program_options/' makefile && \
    make && \
    mv bin/shapeit4.2 /usr/bin/ && \
    cd .. && \
    apt-get -qqy purge --auto-remove --option APT::AutoRemove::RecommendsImportant=false \
                 wget \
                 g++ \
                 make \
                 libboost-iostreams-dev \
                 libboost-program-options-dev \
                 libhts-dev \
                 libbz2-dev \
                 libssl-dev && \
    apt-get -qqy clean && \
    rm -rf v4.2.2.tar.gz \
           shapeit4-4.2.2 \
           /var/lib/apt/lists/*

from mocha.

Tianwen-lab-star avatar Tianwen-lab-star commented on August 28, 2024

caf86605762a05ceb165f0876b103e7
Hello, when I use a VCF file to add a ALLELE_A or ALLELE_B, I use the above code and get an error:“Error: BAF format field is not present, cannot infer ALLELE_A or ALLELE_B”
VCF files were genotyped and exported by Axiom™ Analysis Suite.

from mocha.

freeseek avatar freeseek commented on August 28, 2024

Your VCF does not include intensity data so it would be pointless to identify which one is the A allele and which one is the B allele. I would advise you to go back to the table data generated by the Affymetrix Power Tools when you genotyped your samples and then use BCFtools/affy2vcf to generate a VCF with BAF, LRR, ALLELE_A, and ALLELE_B. Then you don't have to worry about file formatting issues

from mocha.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.