Git Product home page Git Product logo

vardict's Introduction

This is the Final Version of VarDict. No longer maintained.

VarDict

VarDict is an ultra sensitive variant caller for both single and paired sample variant calling from BAM files. VarDict implements several novel features such as amplicon bias aware variant calling from targeted sequencing experiments, rescue of long indels by realigning bwa soft clipped reads and better scalability than many Java based variant callers.

Due to the philosophy of VarDict in calling "everything", several downstream strategies have been developed to filter variants to for example the most likely cancer driving events. These strategies are based on evidence in different databases and/or quality metrics. http://bcb.io/2016/04/04/vardict-filtering/ provides an overview of how to develop further filters for VarDict. The script at https://github.com/AstraZeneca-NGS/VarDict/blob/master/vcf2txt.pl can be used to put the variants into a context by including information from dbSNP, Cosmic and ClinVar. We are open to suggestions from the community on how to best narrow down to the variants of most interest.

A Java based drop-in replacement for vardict.pl is being developed at https://github.com/AstraZeneca-NGS/VarDictJava. The Java implementation is approximately 10 times faster than the original Perl implementation and does not depend on samtools

To enable amplicon aware variant calling (single sample mode only; not supported in paired variant calling), please make sure the bed file has 8 columns with the 7th and 8th columns containing the insert interval (therefore subset of the 2nd and 3rd column interval). The bed files typically look similar to the below two overlapping intervals:

chr1 115247094 115247253 NRAS 0 . 115247117 115247232

chr1 115247202 115247341 NRAS 0 . 115247224 115247323

For more information on amplicon aware calling please see https://github.com/AstraZeneca-NGS/VarDict/wiki/Amplicon-Mode-in-VarDict

VarDict is fully integrated in e.g. bcbio-nextgen, see https://github.com/chapmanb/bcbio-nextgen

Please cite VarDict:

Lai Z, Markovets A, Ahdesmaki M, Chapman B, Hofmann O, McEwen R, Johnson J, Dougherty B, Barrett JC, and Dry JR. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res. 2016, pii: gkw227.

The link to is article can be accessed through: http://nar.oxfordjournals.org/cgi/content/full/gkw227?ijkey=Tk8eKQcYwNlQRNU&keytype=ref

Coded by Zhongwu Lai 2014.

Requirements

  • Perl (uses /usr/bin/env perl)
  • R (uses /usr/bin/env R)
  • samtools (must be in path, not required if using the Java implementation in place of vardict.pl)

Quick start

Make sure the VarDict folder (scripts vardict.pl, vardict, testsomatic.R, teststrandbias.R, var2vcf_valid.pl and var2vcf_paired.pl) is in path before running the following commands.

  • Running in single sample mode:

    AF_THR="0.01" # minimum allele frequency
    vardict -G /path/to/hg19.fa -f $AF_THR -N sample_name -b /path/to/my.bam -c 1 -S 2 -E 3 -g 4 /path/to/my.bed | teststrandbias.R | var2vcf_valid.pl -N sample_name -E -f $AF_THR
    
  • Paired variant calling:

    AF_THR="0.01" # minimum allele frequency
    vardict -G /path/to/hg19.fa -f $AF_THR -N tumor_sample_name -b "/path/to/tumor.bam|/path/to/normal.bam" -c 1 -S 2 -E 3 -g 4 /path/to/my.bed | testsomatic.R | var2vcf_paired.pl -N "tumor_sample_name|normal_sample_name" -f $AF_THR
    

Contributors

License

The code is freely available under the MIT license.

vardict's People

Contributors

almiheenko avatar bioinfo avatar cbrueffer avatar chapmanb avatar clintval avatar gitshadowhub avatar jdagilliland avatar mjafin avatar multimeric avatar nh13 avatar pcingola avatar polinabevad avatar popucui avatar tfenne avatar vladsavelyev avatar zhongwulai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vardict's Issues

Wrong depth reported in vcf file

Hi!

I'm interested in using your variant caller in my pipeline and have already done a test run on targeted re sequencing data using the amplicon-based Illumina TruSight Myeloid Panel. I've compared the results with those of FreeBayes and noticed something strange: for the same variant call at a certain position, FreeBayes reports a much higher depth (DP) as compared to VarDict (e.g. 6000 compared to 40). I've checked the per-base coverage report and FreeBayes is accurate. Also, the QUAL for all bases is in a very narrow range with ~95% of the variants falling between 30 and 37. These same variants have a high quality score with FreeBayes. Are there some filtering steps that are carried out by VarDict that cause me to lose some reads?

My command line call for VarDict:
vardict -G ref.fa -N sample_name -f 0.01 -b sample.bam -c 1 -S 2 -E 3 -g 4 -Q 20 -a -F 0 targets.bed | teststrandbias.R | var2vcf_valid.pl -N sample_name -E -f 0.01

I'm using BWA for mapping and then performing a base quality recalibration and local realignment using GATK before doing the variant calling. The sequencing was done on an Illumina NextSeq in paired-end mode 2x150 bp.

Thanks for your help.

Deletion with AF>1

Ran into a variant with DP=7; VD=8:

20      62919376        .       TAAG    T       111     PASS    
TYPE=Deletion;DP=7;VD=8;AF=1.1429;BIAS=0:0;REFBIAS=0:0;VARBIAS=8:0;PMEAN=31.3;
PSTD=1;QUAL=37;QSTD=1;SBF=1;ODDRATIO=0;MQ=21.3;SN=16;HIAF=1;ADJAF=0.1429;
SHIFT3=0;MSI=3;MSILEN=3;NM=1.1;HICNT=8;HICOV=8;LSEQ=TAATTTTTAATTGAGTGGAA;
RSEQ=AATAATATTGATAAAAGTAG;dgv=CopyNumber 
GT:DP:VD:AD:AF:RD:ALD   1/1:7:8:0,8:1.1429:0,0:8,0

Did anyone see such things before? Is there a realistic reason it can happen, or it's a bug? Trying to avoid digging into perl.

VCF quality measurments: Difference between MQ and QUAL

Hi

I am not sure what the difference between the two fields MQ and QUAL is in the vcf output of VarDict as the explanations are very similar:

##FORMAT=<ID=MQ,Number=1,Type=Float,Description="Mean Mapping Quality">
##FORMAT=<ID=QUAL,Number=1,Type=Float,Description="Mean quality score in reads">

If the QUAL does not refer to mapping quality, to what is it refering?

Thanks for any explanations!

Update:
Also, for other tools (varscan, freebayes), the field MQ is used to display the RMS Mapping quality. Is this also the case for vardict?

TruSeq Custom Amplicon bed file

I'm interested in using VarDict with samples run using a TruSeq custom amplicon (TSCA) panel. I see amplicon aware mode using an 8 column bed file. The trouble I'm having is generating the necessary bed file from the provided design files. Has anyone used TSCA design files to generate the necessary bed file for vardict?
Thanks,
Bob

amplicons.bed

chr9	340100	340375	chr9:340100:340375:116489980	275	+
chr9	340658	340888	chr9:340658:340888:116489981	230	+
chrX	100615078	100615353	chrX:100615078:100615353:116489982	275	+
chr11	108160167	108160398	chr11:108160167:108160398:116489983	231	+

targets.bed

chr10	14978536	14978592	chr10:14978536:14978592:DCLRE1C.FiveUtrExon.chr10.14978537.14978592:UserDefined	56	+
chr10	14987103	14987188	chr10:14987103:14987188:DCLRE1C.FiveUtrExon.chr10.14987104.14987188:UserDefined	85	+
chrX	100630131	100630302	chrX:100630131:100630302:BTK.FiveUtrExon.chrX.100630132.100630302:UserDefined	171	+
chr10	14984112	14984434	chr10:14984112:14984434:DCLRE1C.FiveUtrExon.chr10.14984113.14984434:UserDefined	322	+
chr10	14981808	14981868	chr10:14981808:14981868:DCLRE1C.FiveUtrExon.chr10.14981809.14981868:UserDefined	60	+
chr8	61653817	61655656	chr8:61653817:61655656:CHD7.FiveUtrExon.chr8.61653818.61655656:UserDefined	1839	+

manifest file
image

3 column bed file

Java VarDict gives an immediate java.lang.ArrayIndexOutOfBoundsException when I try a 3-column bed file. The bed file looks like (tab delimited)
chr1 65509 65625
chr1 65831 65973
chr1 69481 69600
chr1 721381 721519
chr1 721530 721806
...

Information from an individual line works with the -R option.
An 8-column bed file does work with columns 4,5 and 6 empty like
chr1 65509 65625 65509 65625
...

Does this suggest that "amplicon mode" is always on? I have tried the -z option and that hasn't changed this behavior. Where have I gone wrong?

Can't locate Stat/Basic.pm in @INC

When I run vardict2mut.pl, it told me that "Can't locate Stat/Basic.pm in @inc ".
But I can't find the module Stat::Basic in the cpan repository.
How can I install the module Stat::Basic?

Proper format for bed file in amplicon mode

My understanding is that the region start and end positions are for the Amplicon, inclusive of primer regions. And the segment start and end correspond to the targeted region excluding the primer region. Is this correct? I have not seen it documented.

Reported variant sequence is not consistent

I have generated vcf file using vardict.pl script downloaded on Jul 15, 2016. What I notice is that there is many instances there is variant inconsistency. For example, chr6:18258254-18258255 - GA was reported as REF but the fasta file at these location contain AA. I used same fasta files which are in the alignemnt. Below is the whole record. Could you please help me with this?

chr6 18258255 . GA G 198 PASS SAMPLE=NS-16-04_BC09_HorizonFFPE;TYPE=Deletion;DP=678;VD=53;AF=0.0782;BIAS=2:2;REFBIAS=325:300;VARBIAS=26:27;PMEAN=47.2;PSTD=1;QUAL=34.6;QSTD=1;SBF=0.77494;ODDRATIO=1.1248;MQ=60;SN=52;HIAF=0.0799;ADJAF=0.0015;SHIFT3=6;MSI=7;MSILEN=1;NM=0.3;HICNT=52;HICOV=651;LSEQ=TTTCTTCTTACTTAGAAAAA;RSEQ=AAAAAATGTATCCTCTCAAT GT:DP:VD:AD:AF:RD:ALD 0/1:678:53:625,53:0.0782:325,300:26,27

Error Message when using VarDict

Hi,

I ran VarDict on my bam files and it threw me this error:

Use of uninitialized value $sample in concatenation (.) or string at /home/ttnpham/vardict/VarDict-master/var2vcf_somatic.pl line 35.
Error: Incorrect input detected in teststrandbias.R
Execution halted

My bam files was aligned using BWA-MEM and was also local realigned and base quality recalibrated before it was input into VarDict.

The command I used is below

AF_THR=0.01 vardict -G ucsc.hg19.with_decoy.fasta -f $AF_THR -N case1_tumour -b case1_tumour.bam -c 1 -S 2 -E 3 -g 4 -Q 15 -q 15 TruSight_One_v1.1.bed | teststrandbias.R | var2vcf_valid.pl -N case1_tumour -E -f $AF_THR > case1_tumour.vcf

bam format

Hello,

I am pretty interested in this software. I installed the code and successfully ran your sample code. However when I switched to my sample bam file, the error messages were raised:

Use of uninitialized value in join or string at vardict.pl line 2365.
Use of uninitialized value in join or string at vardict.pl line 2367.

The process failed. I used the samtools to check the difference between your sample bam file and my sample bam file. I found my bam file contains more fields than yours. please see the below from my sample bam file:

HWI-ST1377:449:HVKW2ADXX:2:2203:3926:4692 81 17 7580123 60 101M = 7579813 -411 CACTCCTGCCCCACCCCTCACCAGCCATGCACTTCTTTGAGGAAAAGACAATCAGAGAGGGACTTCCAACCTTCCCACCACTAAATCCCCAAGACTTCCTA C?@AEGDGGGGCAGGGFCFAGGGEGGCDFHCGEDFDDEEDCECCCEECEBCBFDDDEDBBEBECBCEBBCDCBCBFACEAEBBBBACDCGBEFBGDBCA>A MC:Z:101M BD:Z:KPMMPSRQKKPKNJJMMMKNOPOPOOOPPJOLKLLCLMKLLKDDJKOJKLNLOKKKKLILOOLKLNKKMLLKLINJMNJONKDLNLIINKKLPPNMOMOLL MD:Z:101 PG:Z:MarkDuplicates RG:Z:sample1 BI:Z:PTQQTTUSNNSQRMMQQSPRRSTRRSRTTPSPOOPHPRMPOPHHNOPNROPRROMOMPLOPSPOPQRPQPPOPKPNPQNSQPGNOOKKPRNOPSQPRQSPP NM:i:0 MQ:i:60 OQ:Z:DC@?DDBDDDD@;DDDBCA=DDECFDFDFFEFHEHJJHHIIIJJIJIIJJGGIIGGIJJJIGGJIJIHGJIHGJIHFJIHJIIIIGJJHHHHHFFFFFCBB AS:i:101 XS:i:20

does this bam format cause the program fail?

Thanks.
Marissa

can not be run in background

When use vardict as a background command, such as

vardict -G /path/to/hg19.fa -f $AF_THR -N sample_name -b /path/to/my.bam -c 1 -S 2 -E 3 -g 4 /path/to/my.bed &

the above command will be done immediately, and no output, is there something wrong?

Running Vardict on large chromosomal chunks - May not be error

I would like to run Vardict on total RNA data, allowing for variants to be called anywhere they are observed along the chromosome. I have tried breaking the chromosome into 6 segments (by bed) and running verdict on each segment. I've found that verdict will run for many hours and will often return an empty variant file.

My question is, can Vardict java be run on such large segments or is this issue expected? If it is expected, what is your suggestion for running on total RNA data?

Thanks in advance!

no outcome

hello,when I run the sample command ,there were no outcome file but print the usage.
what's the problem?
i'm looking forward to your reply

error in fisher.text

I've received two errors on a pair of tumor-normal data set (hg19 reference). The ".var" file was generated with the following command:

vardict -G hg19.fa -f 0.01 -h -b 'tumor.bam|normal.bam' -z -F -c 1 -S 2 -E 3 -g 4 region.23.bed > region.23.var

The error occurred here: cat region.24.var | awk 'NR!=1' | testsomatic.R > region.24.testsomatic

Error in fisher.test(matrix(c(d[i, 10], d[i, 11], d[i, 12], d[i, 13]), :
all entries of 'x' must be nonnegative and finite
Execution halted

Have you guys run into these before?

Thanks.

ROC generation

I successfully ran VarDict using bcbio-nextgen for the ICGC medulloblastoma dataset. I'd like to create an ROC curve to compare it with other tools. Is there a single score that controls recall/precision trade-off? I've tried SOR, SSF, and QUAL, but none of these produced a pretty ROC curve.

Wrong genotype reported in VCF file

In the var2vcf scripts, the frequency thresholds for the frequency filter and for the genotype get mixed when determining the genotype. The line should read
my $gt = (1-$af < $GTFreq) ? "1/1" : ($af >= 0.5 ? "1/0" : ($af >= $GTFreq ? "0/1" : "0/0"));
and similarly for my $gtm = ....

In var2vcf_valid.pl, s and o are mixed up in the getopts line. It should read
getopts('htaHSCEP:d:v:f:p:q:F:Q:o:N:m:I:c:r:O:X:k:V:M:') || Usage();.

Thanks! Best, Malte

Complex variants clarification

Hello,
I'm missing something with the Complex variants, what is exactly the change for the following variant? The only thing I'm sure is that the reference C in position 58017753 changes to an A, but then.. I get lost with the notation.

19 58017753 . CAGCTG AGAGGCTC 188 PASS LSEQ=GCAAGGAGCCAAGGCTGAGG;MSI=3.000;MSILEN=1;RSEQ=CTGAGCAGAGTGCTTCTGTA;SAMPLE=mut;SHIFT3=0;SOR=Inf;SSF=0;STATUS=StrongSomatic;TYPE=Complex GT:AD:ADJAF:AF:ALD:BIAS:DP:HIAF:MQ:NM:ODDRATIO:PMEAN:PSTD:QSTD:QUAL:RD:SBF:SN:VD 0/0:81,0:0:0:0,0:2,0:128:0.648:60:0.1:0:15.4:1:1:36.7:27,54:1:162:0 0/1:72,39:0.0708:0.3451:18,21:2,2:113:0.3451:60:0.1:1.94:24.5:1:1:35.6:22,50:0.14667:78:39

Another question is, is it possible to convert complex variants to individual simple changes? Separate rows with SNPs and the indels that define the complex variation.

Thank you in advance,

verdict on WGS tumor-normal pair

Is there any driver script to easily run vardict on a human WGS tumor-normal pair?

Due to memory requirements, it is not possible to run it on the whole genome. Even splitting by chromosome is not enough. Based on other posts, it is suggested to split the genome in 5kb overlapping regions of 150bp. However merging the downstream calls must be done carefully to consolidate the variants within the overlapping regions.

empty output for PE sequence

I'm attempting to implement Vardict as part of an analysis plan for the FDA's SEQ QC targeted sequencing work group. I'm getting an empty output from PE hybrid capture sequences when using VarDict-1.5.1. I used bwa for alignment with the -f P flag to ensure outputs are paired and mapped. I used BaseRecalibrator to tune the Q scores and then sort and index with samtools. The resulting files give good VCF results from mpileup, and mutect2 VCF.

The command line below runs for a second and exists with no error or data in the output file.
VarDict -b sample4rep2_S8.sorted.corr.bam -G /NGS/REFS/genome.fa -N sample4rep2_S8 -f 0.01 -v -c 1 -S 2 -E 3 -th 8 /NGS/MIX2.bed > sample4rep2_S8.vardict.txt

I can't get the Java VarDict version to install to see if an alternative version works (gradle error involving an issue with slf4j logger).

As per one suggestion in this forum, I reduced the bed file down to 3 small regions that have known VAF of >20%. Still empty output.

I've also tried removing -th 8 flag, adding -g 4 flag. I'm running Ubuntu AWS server using 16 processors x 64 Gb mem.

option -o: signal to noise

Hi,

The vc2vcf_paired.pl help indicates that the parameter "-o" controls "The minimum signal to noise, or the ratio of hi/(lo+0.5). Default to 1.5. Set it higher for deep sequencing." Why increasing sequencing coverage should modify the ratio among good quality reads and bad quality reads?

Thank you in advance,

Tamara

Reported genotypes 1/0 vs 0/1

I noticed that VarDict reports both 1/0 and 0/1 genotypes. What does that mean exactly? The VCF spec does not explicitly mention this notation (another blind spot).

Hard coded filters in VarDict remove true variant calls

I am trying to get to the bottom of why VarDict has some false negatives in my high-coverage amplicon samples. I noted that VarDict performs a number of filters which are documented in the VCF header:

##FILTER=<ID=AMPBIAS,Description="Indicate the variant has amplicon bias.">
##FILTER=<ID=Bias,Description="Strand Bias">
##FILTER=<ID=Cluster0bp,Description="Two variants are within 0 bp">
##FILTER=<ID=InGap,Description="The variant is in the deletion gap, thus likely false positive">
##FILTER=<ID=InIns,Description="The variant is adjacent to an insertion variant">
##FILTER=<ID=LongMSI,Description="The somatic variant is flanked by long A/T (>=14)">
##FILTER=<ID=MSI12,Description="Variant in MSI region with 12 non-monomer MSI or 13 monomer MSI">
##FILTER=<ID=NM4.25,Description="Mean mismatches in reads >= 4.25, thus likely false positive">
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=Q10,Description="Mean Mapping Quality Below 10">
##FILTER=<ID=SN1.5,Description="Signal to Noise Less than 1.5">
##FILTER=<ID=d3,Description="Total Depth < 3">
##FILTER=<ID=f0.1,Description="Allele frequency < 0.1">
##FILTER=<ID=p8,Description="Mean Position in Reads Less than 8">
##FILTER=<ID=pSTD,Description="Position in Reads has STD of 0">
##FILTER=<ID=q25,Description="Mean Base Quality Below 25">
##FILTER=<ID=v2,Description="Var Depth < 2">

Unfortunately, only a fraction of them seem to be accessible through the command line. I am particularly interested in modifying Cluster0bp and NM4.25. I assume that VarDict is not aware of phasing. I suspect Cluster0bp to remove variants on the second allele which are neatly called by freebayes and clearly visible to the eye like this one:

chr7	117199644	.	ATCT	GTCT,A

I wonder why Cluster0bp was introduced in the first place. I cannot imagine that it made much of a difference when optimizing for TP/FP ratios in gold standard data sets. Could you shine some light on that?

Phasing issues with MNVs

Is phasing and MNV calling a known issue with Vardict? In my sample, a dinucleotide variant was missed by Vardict and called as two independent variants. The MNV was clear in the pileup. Bizarrely, in subsequent dilutions of the same sample, Vardict was able to correctly identify and report the MNV. It also reported the two, adjacent variants. I wanted to point out this inconsistency and see if anyone else was getting this bug (b/c of phasing?) and if this issue is being addressed.

Thanks,
Jenn

image

Deletion not detected by VarDict

Dear all,
I am using VarDict in a sample with known genotype. The sample was sequenced with an Illumina TruSight custom amplicon for BRCA1/2 on a MiSeq sequencer.
We know it has a deletion of 70bp on BRCA1 gene but VarDict does not detect it (while HaplotypeCaller yes). I generated the 8 columns bed file as suggested starting from the Target section of Illumina Manifest, here some lines as example

chr13   32890519        32890777        BRCA2   0       +       32890547        32890714
chr13   32893133        32893406        BRCA2   0       +       32893163        32893512
chr13   32893349        32893611        BRCA2   0       -       32893163        32893512
chr13   32899118        32899337        BRCA2   0       +       32899162        32899371
chr13   32899280        32899538        BRCA2   0       -       32899162        32899371
chr13   32900061        32900308        BRCA2   0       +       32900187        32900800
chr13   32900253        32900530        BRCA2   0       -       32900187        32900800
chr13   32900475        32900751        BRCA2   0       +       32900187        32900800
chr13   32900697        32900969        BRCA2   0       -       32900187        32900800
chr13   32903497        32903767        BRCA2   0       +       32903529        32903679

Then I used vardict as suggested in the manual page, here my line:

perl ../VarDict-master/vardict.pl -G hg19.fa -N 14-0101 -b 14-101.bam -f 0.01 -c 1 -S 2 -E 3 -g 4 -I 1000 file_for_vardic.sorted.bed | ../VarDict-master/teststrandbias.R | ../VarDict-master/var2vcf_valid.pl -N 14-0101 -E > 14-101_vardict.vcf

But the deletion is not presente in the final VCF, I tried using different values of -f but nothing changed..Moreover VarDict does not detect a SNV (known to be there) in the same gene.
Is there something that I am doing wrong? Am I missing something?

Thanks for the help,

Stefania

BED file for targeted sequencing

I saw in #7 that for amplicon-awared variant calling, the BED file need to have 8 columns storing both the primer and insert positions. However, the BED file that comes with the manufacturer's targeted capture bed contains only a set of positions. Is there a recommended strategy in creating a BED file suitable for VarDict to perform variant calling?

Given that I don't have the primer information, what's the appropriate -M INT setting for a typical targeted sequencing experiment?

First entry in the INFO field of the vcf file

Hi,

I ran a chromosome 21 of a dream challenge data set through the workflow. I have some question about the first item in the INFO field of the vcf file (or the 2nd to last column in the vardict.pl output).

  1. What's the "threshold" for LikelySomatic vs. StrongSomatic?
  2. There are a number of "Germlin" without the "e" at the end of "Germline." Is that a typo?
  3. What does AFDiff, SampleSpecific, and Deletion (not TYPE=Deletion) mean? How do they differ from LikelySomatic or LOH?

Thanks in advance.

Somatic variant detection error: Output empity

Derar all,
I need to use the perl version of vardictc on my centos 5.5. I use this command:

/illumina/software/PROG2/VarDict/vardict -G /illumina/software/database/database_2016/hg19_primary.fa -f 0.01 -N 415_tumor -b "/mnt/ALIGN/415/Coclean/415.sorted.dup.recal.cleaned.bam|/mnt/ALIGN/415/Coclean/416.sorted.dup.recal.cleaned.bam" -c 1 -S 2 -E 3 -g 4 /illumina/software/database/database_2016/TARGET/truseq-exome-targeted-regions-manifest-v1-2.bed| testsomatic.R |var2vcf_paired.pl -N "415_tumor|416_normal" -f 0.01

Generate empty file wih only the header. No error for debug. Could you please help me?

only prints help

Hi,
Needs some help as I always only gets help messages printed regardless what I tried.
example command:
vardict -G b37.fa -f 0.01 -N NA12878 -b NA12878.bam -c 3 -S 60132054 -E 60214549 ~/VarDict/my.bed

It immediately pops up helkp information indicating my command is mal-formated.
any help?
If we specify -c 3 -S 60132054 -E 60214549 already, what is the bed file do?

I just couldn't find any more useful information regarding the parameters other than the printed help. and I amdo not read perl...

Thanks,
Shuoguo

vardict running extremely slow on 10G targeting sequencing data

Hi
When we running on 10G targeting sequencing on ffpe source sample on pair mode, vardict running extremely slow compare to cf source sample. The average coverage after dup removed about 1200x, I wondering why?

Because of cf source sample , even more data, I could be faster then ffpe.

Mislabeling as AMPBIAS

Hi,

VarDict is mislabeling a SNV as being AMPBIAS for 2 of my three samples. As you can see in the picture, there are multiple overlapping amplicons that contain the SNV.

igv_snapshot

Any idea what could be causing this issue?

Thanks,
Tim

Warning messages

Hi,
I am trying VarDict on some RNA-seq data, I am getting the following message

Use of uninitialized value in numeric lt (<) at /home/shared/app/bcbio/tool/bin/vardict line 1329.

Should I get concerned about this warning? what does it mean?

All the best and thanks in advance

p0.05 Label

Hi

Just a quick question about the p0.05 filter annotation. To what test is the value corresponding? What is being tested?

Best

vardict did not combine nearby SNVs

Hi, I encounter a complex mutation, which is composed of two SNVs, which is separated by one bp, but vardict did not report this as a complex mutation

reference ACC
read TCT

input arguments and run time

Hi,

I think I've gotten VarDict to run the past few days, but there are a few questions I don't really understand.

  1. When I tried to run VarDict on WGS bam files without inputting region information or bed file, the program looks for things from stdin, hangs there and does nothing. Is it looking for a bed file? Is bed file required to run?

  2. If I specify a whole chromosome in the command line, it seems the program tries to read everything into the memory, and then it gets killed (probably due to too much memory request). Is this expected behavior?

  3. When I specify a region, at 1000 or 10,000 bp interval for each line, it runs okay. I ran it on a pair of tumor/normal chromosome 22 (about 800MB each), and it took 6-7 hours to complete. Is that more or less expected run time?

  4. When I specify successive regions in the bed file, should I indicate overlapping regions, (i.e., 1-5000 in line 1, and 4750-9750 in line 2)?

4/a) Can you elaborate a bit about the bed files you are using internally as the region?

Thank you very much.

-- Li Tai

I put the same Bam file as input, but the results are different

Dear all

I am doing Variant calls using Vardict. However, even when the same Bam file was used, a problem that a specific variation was not detected in one Bam file occurred.

chr 13 : 32912299

vardict version : 1.4.7

mapping : bwa-0.7.12

dedup : picard--1.92

read_analysis & Indel call : GenomeAnalysisTKLite-2.3-9.jar

I do not know why I get different results from the same file.

vardict
-G $reference
-f 0.03
-b $bam_file
-c 1 -S 2 -E 3 -g 4
$target | $teststrandbias | \

This is a vardict option. I also have the same options. There is no difference in the Bam file.

What happened?

Thanks for the help,

hunseong.

How can I filter variants with 0 or low read in normal sample for paired analysis?

Hello. I'm sorry if it is a trivial question.
When I performed paired analysis, there were so many variants comparing other analysis softwear because most of the variants were 0 or very low read in normal sample.
How can I filter those and keep variants with high quality present in both samples ?
I use MacOS X 10.13.1, VarScan ver1.5.1, Java ver1.8.0_144, R ver3.4.4, perl ver5.18.2
The comand is like this;

$ AF_THR="0.01"

$ /Users/sh/VarDictJava/VarDict-1.5.1/bin/VarDict -G /Users/sh/ucsc.hg19.fasta -f $AF_THR -N tumor -b "/Users/sh/tumor.bam|/Users/sh/normal.bam" -z 1 -c 1 -S 2 -E 3 -g 4 /Users/sh/ucsc.hg19.bed | /Users/sh/VarDictJava/VarDict/testsomatic.R | /Users/sh/VarDictJava/VarDict/var2vcf_paired.pl -N "tumor|normal" -f $AF_THR > X.vcf

Thanks in advance.

Calling structural variants

I have been using VarDict to call SNVs and INDELs, and I think it is a very good variant caller. I would also like to use it to call structural variants. What is the command line to do that? I know one would have to use the vardict_sv.pl script, but probably in combination with the main vardict call. Is that the case?

Thanks,
Maria Z

Warning messages

This is the command:
vardict -G human_g1k_v37_decoy.fasta -f 0.01 -h -b 'tumor.bam|normal.bam' -z -F -C -c 1 -S 2 -E 3 -g 4 11.per5000.bed > 11.var

I get messages such as this. I'm not sure what they mean.

Use of uninitialized value $ref in numeric eq (==) at /home/ltfang/apps/VarDict/vardict line 259.
Use of uninitialized value $ref in substr at /home/ltfang/apps/VarDict/vardict line 261.
Use of uninitialized value $var in substr at /home/ltfang/apps/VarDict/vardict line 261.
Use of uninitialized value $ref in numeric eq (==) at /home/ltfang/apps/VarDict/vardict line 263.
Use of uninitialized value $ref in numeric gt (>) at /home/ltfang/apps/VarDict/vardict line 265.
Use of uninitialized value in numeric lt (<) at /home/ltfang/apps/VarDict/vardict line 471.
Use of uninitialized value $ref in numeric eq (==) at /home/ltfang/apps/VarDict/vardict line 259.
Use of uninitialized value $ref in substr at /home/ltfang/apps/VarDict/vardict line 261.
Use of uninitialized value $var in substr at /home/ltfang/apps/VarDict/vardict line 261.
Use of uninitialized value $ref in numeric eq (==) at /home/ltfang/apps/VarDict/vardict line 263.
Use of uninitialized value $ref in numeric gt (>) at /home/ltfang/apps/VarDict/vardict line 265.
Use of uninitialized value in numeric lt (<) at /home/ltfang/apps/VarDict/vardict line 471.
Use of uninitialized value $ref in numeric eq (==) at /home/ltfang/apps/VarDict/vardict line 259.
Use of uninitialized value $ref in substr at /home/ltfang/apps/VarDict/vardict line 261.
Use of uninitialized value $var in substr at /home/ltfang/apps/VarDict/vardict line 261.
Use of uninitialized value $ref in numeric eq (==) at /home/ltfang/apps/VarDict/vardict line 263.
Use of uninitialized value $ref in numeric gt (>) at /home/ltfang/apps/VarDict/vardict line 265.
Use of uninitialized value in numeric lt (<) at /home/ltfang/apps/VarDict/vardict line 471.
Use of uninitialized value $ref in numeric eq (==) at /home/ltfang/apps/VarDict/vardict line 259.
Use of uninitialized value $ref in substr at /home/ltfang/apps/VarDict/vardict line 261.
Use of uninitialized value $var in substr at /home/ltfang/apps/VarDict/vardict line 261.
Use of uninitialized value $ref in numeric eq (==) at /home/ltfang/apps/VarDict/vardict line 263.
Use of uninitialized value $ref in numeric gt (>) at /home/ltfang/apps/VarDict/vardict line 265.
Use of uninitialized value in numeric lt (<) at /home/ltfang/apps/VarDict/vardict line 471.

Thanks.

somatic calling error 35

Hi!!
I have use this comand in somatic:

/illumina/software/PROG2/VarDict/vardict -G /illumina/software/database/database_2016/hg19_primary.fa -f 0.01 -N 415_tumor -b "/mnt/ALIGN/415/Coclean/415.sorted.dup.recal.cleaned.bam|/mnt/ALIGN/415/Coclean/416.sorted.dup.recal.cleaned.bam" -c 1 -S 2 -E 3 -g 4 /illumina/software/database/database_2016/TARGET/BED_chromosome/my.chr12.bed| testsomatic.R |var2vcf_paired.pl -N "415_tumor|416_normal" -f 0.01 -v 5 > 415_416_vardict_chr12.vc

Now I use to divede my exome bed on chromosome but I have this error;:

`
Use of uninitialized value in concatenation (.) or string at /illumina/software/PROG2/VarDict/var2vcf_paired.pl line 3`5.

What is right way to do somatic calling with the new version
thanks so much!

Empty VCF file

Hi,

I'm attempting to use VarDict in single sample mode for analysis of a targeted gene panel, approximately 300Kb in size generated on an Illumina NextSeq. However when following the workflow I get an empty VCF file with only the headers and no variants. Other variant callers like Varscan2 and Mutect2 have been used successfully.

I am using BWA mem for alignment.

Below is an examples of the command I'm using:

vardict -G Homo_sapiens_assembly38.fasta -N BEL037 -b BEL037.rg.bam -c 1 -S 2 -E 3 -g 4 -D -F 0 -f 0.01 initial_design.bed | VarDict-master/teststrandbias.R | VarDict-master/var2vcf_valid.pl -N BEL037 -f 0.01 > BEL037_vardict.vcf

I have also tried using bioconda and the vardict-java and get the same problem.

Any help or advice would be greatly appreciated!

High FDR on dream synthetic dataset 3

I am experiencing a 26% false discovery rate (FDR) on the DREAM dataset 3 for indels only.
I run the tool with default parameters using the "paired variant calling" command from the documentation. For evaluation I used only calls from the output VCF marked as "PASS" and labelled as "Somatic". Figure 6 from the vardict paper also shows about ~16% FDR on snv+indel combined. The rate seems too high.

Is such FDR what is expected on the DREAM dataset 3 when vardict is run with the default parameters?

AF, AD & DP values don't match!

Dear all,

I'm currently trying to use VarDict to call variants from a WXS experiment and I've noticed that most called variants have AD, AF and DP values that do not match. Here's one example:

chr1 880745 . A G 47 PASS . GT:AD:ADJAF:AF:BIAS:DP:HIAF:MQ:NM:ODDRATIO:PMEAN:PSTD:QSTD:QUAL:RD:SBF:SN:VD 0/1:3,1:0:0.0132:2,2:302:0.0139:85.5:1:2.85:33.3:1:1:23.8:151,144:0.62316:8:4

As you can see, the AD is 3 and 1, AF is 0.0132 and DP is 302. I thought that AF estimates were obtained using the AD values for each allele. Do you have any idea why this is happening?
Here's how I'm running the tool:

AF_THR="0.01" VarDict -th 4 -GHg19.karyo.custom.fa \ -f $AF_THR -N SAMP12 -b "SAMP12.bam|SAMP26.bam" \ -c 1 -S 2 -E 3 -q 15 \ -g 4 /home/cancer_ngs/2015.SANTIAGO/MuTect2.Calls/Targets.table.bed | testsomatic.R | var2vcf_somatic.pl \ -N "SAMP12|SAMP26" -f $AF_THR > SAMP12.q15.VarDict.vcf

Thank you very much in advance!
Joao

Ouput only Somatic option doesn't seem to work in the latest revision of var2vcf_paired.pl

In the last revision (and perhaps some earlier ones) the option -M in var2vcf_paired.pl doesn't seem to have any effect on the output. I cross-tested the outputs from the new previous tool with an older version, and the old TSV output from TestSomatic.R with the new tool, and all runs output STATUS=Germline mutations as well.

Do you mind taking a look?

Thanks in advance!

P.S. If I can help in any way, please tell me!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.