Git Product home page Git Product logo

Comments (4)

kcleal avatar kcleal commented on May 20, 2024

You might be better off with a small script. As a starting point your could try something like:

awk 'OFS="\t" {if (/^#/) {print} else {if ($7 == "PASS") {print} }}'

Also, the PROB column is usually the best to filter on, but 0.95 is probably too stringent. If you want high precision PROB>=0.75 can work well without destroying sensitivity.

from dysgu.

wenyuhaokikika avatar wenyuhaokikika commented on May 20, 2024

You might be better off with a small script. As a starting point your could try something like:

awk 'OFS="\t" {if (/^#/) {print} else {if ($7 == "PASS") {print} }}'

Also, the PROB column is usually the best to filter on, but 0.95 is probably too stringent. If you want high precision PROB>=0.75 can work well without destroying sensitivity.

In addition to Filter and PROB, are there other recommended filtering parameters and their thresholds, such as SU, PE, SC

from dysgu.

kcleal avatar kcleal commented on May 20, 2024

Hi @wenyuhaokikika, as of v1.5.0, there is a filtering command that makes filtering on PROB a bit easier:

dysgu filter --help
Usage: dysgu filter [OPTIONS] INPUT_VCF [NORMAL_BAMS]...

  Filter a vcf generated by dysgu. Unique SVs can be found in the input_vcf by
  supplying a --normal-vcf (single or multi-sample), and normal bam files.
  Bam/vcf samples with the same name as the input_vcf will be ignored

Options:
  --reference PATH              Reference for cram input files
  -o, --svs-out PATH            Output file, [default: stdout]
  -n, --normal-vcf PATH         Vcf file for normal sample, or panel of
                                normals. The SM tag of input bams is used to
                                ignore the input_vcf for multi-sample vcfs
  -p, --procs INTEGER RANGE     Reading threads for normal_bams  [default: 1;
                                1<=x<=8]
  -f, --support-fraction FLOAT  Minimum threshold support fraction / coverage
                                (SU/COV)  [default: 0.1]
  --target-sample TEXT          If input_vcf if multi-sample, use target-
                                sample as input
  --keep-all                    All SVs classified as normal will be kept in
                                the output, labelled as filter=normal
  --ignore-read-groups          Ignore ReadGroup RG tags when parsing sample
                                names. Filenames will be used instead
  --min-prob FLOAT              Remove SVs with PROB value < min-prob
                                [default: 0.1]
  --pass-prob FLOAT             Re-label SVs as PASS if PROB value >= pass-
                                prob  [default: 1.0]
  --interval-size INTEGER       Interval size for searching normal-vcf/normal-
                                bams  [default: 1000]
  --random-bam-sample INTEGER   Choose N random normal-bams to search. Use -1
                                to ignore  [default: -1]
  --help                        Show this message and exit.

See option for --min-prob. Also, using the script you can filter on --support-fraction which will get rid of SVs with low support as a fraction of local coverage. For example if you wanted to keep high quality SVs only you could try dysgu filter --min-prob 0.5 --support-fraction 0.2 input.vcf > output.vcf

from dysgu.

wenyuhaokikika avatar wenyuhaokikika commented on May 20, 2024

Thank you,I got it~~~

from dysgu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.