Git Product home page Git Product logo

Comments (3)

kcleal avatar kcleal commented on June 12, 2024

Hi @warthmann,

The documentation is not very clear, apologies. When you filter against a normal sample, the calls returned should be unique to the mutant line. In theory this should mean all the sites in the normal are 0/0. In practise, some sites may have low support. When you supply a --sites file, the whole genome is analysed, not just the regions around the input sites. Any variant in the --sites file will make SV detection more sensitive around those sites, but all SVs from the genome will still be reported, which is consistent with your output.
I think the behaviour you were expected was a bit different - using dysgu to only re-genotype a file (rather than discovery as well). If you only want re-genotyping around the input sites, you can provide a bed file of regions to search using the --search option.
Hope that helps

from dysgu.

warthmann avatar warthmann commented on June 12, 2024

hello,
thank you for this clarification.
Regarding the --search flag in dysgu run. I can parse the required positions from the vcf. converting this into a .bed file, will I need to pad the positions in some way or can you recommend using the (same) position of the variant as BEGIN and END in the .bed? Alternatively, can I suggest including a feature similar to --all-sites for dysgu --filter? I.e., such that upon set flag dysgu --filter returns a vcf that also contains entries for a list of specified samples, or all samples for which a bam file is provided?
thanks a lot

from dysgu.

kcleal avatar kcleal commented on June 12, 2024

That is a good suggestion, I can try and add that in the next release.

In the meantime, you will need to parse the dysgu vcf to extract the regions surrounding each breakpoint. I recommend using pysam for this. The pad should be similar to the insert size: Pseudo code would be along the lines

import pysam

vcf = pysam.VariantFile('dysgu.vcf')
outbed = open('out.bed', 'w')
insert = 200

for v in vcf.fetch():
    outbed.write(f"{v.chrom}\t{v.pos - insert}\t{v.pos + insert}\n")
    if r.info['SVTYPE'] != 'TRA':
        outbed.write(f"{v.chrom}\t{v.end - insert}\t{v.end + insert}\n")
    else:
        outbed.write(f"{v.info['CHR']}\t{v.info['CHR2_POS'] - insert}\t{v.info['CHR2_POS'] + insert}\n")

Dysgu will sort and merge intervals as needed.

from dysgu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.