Comments (4)
You might be better off with a small script. As a starting point your could try something like:
awk 'OFS="\t" {if (/^#/) {print} else {if ($7 == "PASS") {print} }}'
Also, the PROB column is usually the best to filter on, but 0.95 is probably too stringent. If you want high precision PROB>=0.75 can work well without destroying sensitivity.
from dysgu.
You might be better off with a small script. As a starting point your could try something like:
awk 'OFS="\t" {if (/^#/) {print} else {if ($7 == "PASS") {print} }}'
Also, the PROB column is usually the best to filter on, but 0.95 is probably too stringent. If you want high precision PROB>=0.75 can work well without destroying sensitivity.
In addition to Filter and PROB, are there other recommended filtering parameters and their thresholds, such as SU, PE, SC
from dysgu.
Hi @wenyuhaokikika, as of v1.5.0, there is a filtering command that makes filtering on PROB a bit easier:
dysgu filter --help
Usage: dysgu filter [OPTIONS] INPUT_VCF [NORMAL_BAMS]...
Filter a vcf generated by dysgu. Unique SVs can be found in the input_vcf by
supplying a --normal-vcf (single or multi-sample), and normal bam files.
Bam/vcf samples with the same name as the input_vcf will be ignored
Options:
--reference PATH Reference for cram input files
-o, --svs-out PATH Output file, [default: stdout]
-n, --normal-vcf PATH Vcf file for normal sample, or panel of
normals. The SM tag of input bams is used to
ignore the input_vcf for multi-sample vcfs
-p, --procs INTEGER RANGE Reading threads for normal_bams [default: 1;
1<=x<=8]
-f, --support-fraction FLOAT Minimum threshold support fraction / coverage
(SU/COV) [default: 0.1]
--target-sample TEXT If input_vcf if multi-sample, use target-
sample as input
--keep-all All SVs classified as normal will be kept in
the output, labelled as filter=normal
--ignore-read-groups Ignore ReadGroup RG tags when parsing sample
names. Filenames will be used instead
--min-prob FLOAT Remove SVs with PROB value < min-prob
[default: 0.1]
--pass-prob FLOAT Re-label SVs as PASS if PROB value >= pass-
prob [default: 1.0]
--interval-size INTEGER Interval size for searching normal-vcf/normal-
bams [default: 1000]
--random-bam-sample INTEGER Choose N random normal-bams to search. Use -1
to ignore [default: -1]
--help Show this message and exit.
See option for --min-prob
. Also, using the script you can filter on --support-fraction
which will get rid of SVs with low support as a fraction of local coverage. For example if you wanted to keep high quality SVs only you could try dysgu filter --min-prob 0.5 --support-fraction 0.2 input.vcf > output.vcf
from dysgu.
Thank you,I got it~~~
from dysgu.
Related Issues (20)
- Merging samples VCFs HOT 8
- Segfault only when using --regions and not --contigs False HOT 6
- Can dysgu be used in haploid genome? HOT 5
- input bam files refinements HOT 2
- Support for Python 3.11 HOT 2
- ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 232 from C header, got 216 from PyObject HOT 8
- GT value for missing data HOT 4
- no command as dysgu filter-normal HOT 3
- what is SVLEN? HOT 2
- Error with --search option HOT 11
- Generating Alternative Reference HOT 16
- Run OSError: [Errno 24] Too many open files Mac OS M HOT 4
- OverflowError: can't convert negative value to size_t HOT 2
- Dysgu filter IndexError: string index out of range HOT 6
- long reads default mapq lowered to 1: help text for dysgu call still says pacbio and nanopore mode has --mq 20 HOT 1
- When will docker image with new release be available? HOT 1
- Got an warning when Loading Model in "dysgu run" HOT 1
- problems genotyping, dysgu run --sites HOT 3
- clarification needed on RG and samples HOT 4
- Getting SV length in dysgu output vcf HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dysgu.