andersen-lab / ivar Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gkarthik/ivar

115.0 115.0 39.0 16.48 MB

iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing.

Home Page: https://andersen-lab.github.io/ivar/html/

License: GNU General Public License v3.0

Makefile 1.66% Shell 0.78% M4 0.66% Python 5.95% C++ 90.44% Dockerfile 0.50%

amplicon-sequencing variant-calling

ivar's People

Contributors

Stargazers

Watchers

ivar's Issues

terminate called after throwing an instance of 'std::invalid_argument'

using 1.1 and htslib 1.10.2

% ivar trim -i ivar.bam -b nCoV-2019.scheme.bed -p out

terminate called after throwing an instance of 'std::invalid_argument'
 what():  stoi
Aborted (core dumped)

Maybe it's the BED:

% cut -f1-3 nCoV-2019.scheme.bed > new.bed

ivar trim -i ivar.bam -b nww.bed -p out
Number of references: 1
Reference Name: MN908947
Reference Length: 29903
Using Region: MN908947
Sorted By Coordinate
Sorted By Query Name
Segmentation fault (core dumped)

Any ideas?

"Could not retrieve index file" but still works?

I did not index prim.bam and got the error, but it all still worked?
Does it need to be indexed?
Related to #28 stdin.

ivar  trim -i prim.bam -b ARTIC-V1.bed -p prim-ivar
Found 196 primers in BED file
[E::idx_find_and_load] Could not retrieve index file for 'prim.bam'

Number of references in file: 1
MN908947.3
Using Region: MN908947.3

Found 1178203 mapped reads
Found 5122 unmapped reads
Sorted By Coordinate
-------
Processed 10% reads ... 
Processed 20% reads ... 
Processed 30% reads ... 
Processed 40% reads ... 
Processed 50% reads ...

missing data file for tests

tests/check_primer_trim fails because we're missing the data fie
"../data/test.unmapped.sorted.bam"

handle unmapped reads in BAM

Would be great if unmapped reads could be left alone and ignored?

% ivar trim -i ivar.bam
Segmentation fault

# remove unmapped reads
% samtools view -F 4 -o ivar.bam ivar2.bam
% ivar trim -i ivar2.bam
# works!

Document indel format for ivar variants

Indels are shows as [+-][ATGC]+. Make this format clear in ivar variants docs.

ivar 1.1 is now in homebrew

brew install brewsci/bio/ivar

feel free to add this to the docs for installing

[E::bam_read1] CIGAR and query sequence lengths differ

Describe the bug
A clear and concise description of what the bug is.

samtools sort is failing on output of ivar trim with v1.2.1 of iVar on Bioconda. This wasnt an issue with v1.2. Apparently, Conda will resolve the environment to contain v1.2.1 even though I had explicitly specified v1.2. Since this environment is automatically built within a Docker container whenever we merge to dev/master it took a bit of work figuring out why our pipeline CI tests were failing 😅

Caused by:
  Process `IVAR_TRIM (SAMPLE3_SE)` terminated with an error exit status (1)

Command executed:

  samtools view -b -F 4 SAMPLE3_SE.sorted.bam > SAMPLE3_SE.mapped.bam
  samtools index SAMPLE3_SE.mapped.bam

  ivar trim \
      -i SAMPLE3_SE.mapped.bam \
      -e \
      -b nCoV-2019.artic.V1.bed \
      -p SAMPLE3_SE.trim > SAMPLE3_SE.trim.ivar.log

  samtools sort -@ 2 -o SAMPLE3_SE.trim.sorted.bam -T SAMPLE3_SE.trim SAMPLE3_SE.trim.bam
  samtools index SAMPLE3_SE.trim.sorted.bam
  samtools flagstat SAMPLE3_SE.trim.sorted.bam > SAMPLE3_SE.trim.sorted.bam.flagstat
  samtools idxstats SAMPLE3_SE.trim.sorted.bam > SAMPLE3_SE.trim.sorted.bam.idxstats
  samtools stats SAMPLE3_SE.trim.sorted.bam > SAMPLE3_SE.trim.sorted.bam.stats

Command exit status:
  1

Command output:
  (empty)

Command error:
  [E::bam_read1] CIGAR and query sequence lengths differ for M03352:174:000000000-J3R29:1:1119:12450:3784
  samtools sort: truncated file. Aborting

The CIGAR string for that read is indeed astronomically large, and definitely not a product of long-read sequencing!

> samtools view SAMPLE3_SE.trim.bam | grep "M03352:174:000000000-J3R29:1:1119:12450:3784"
M03352:174:000000000-J3R29:1:1119:12450:3784    16      NC_045512.2     4967    22      48S173246631M   *       0       0       CTTCTTTCTTTGAGAGAAGTGAGGACTTGTTCTTACCTTCTTTTCCA      GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC AS:i:54 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:27 YT:Z:UU XA:i:32

To Reproduce
Steps to reproduce the behavior:

All files required to reproduce the error are in the attached tar archive. Please see command.sh for the commands used.
ivar_bug.tar.gz

ivar trim to stdout ?

currently need to do ivar trim -i in.bam -p out to get out.bam
it seems to only write 1 file, the bam?
can you support stdout please so I can pipe it to the next stage?

eg. if no -p do this?

ivar trim -i in.bam | samtools sort -n | samtools fastq > reads.fq

Document handling of overlapping primer pairs

I tried to find best-practice instructions for dealing with overlapping primer pairs like the ones found in the Artic V3 primer scheme for SARS-CoV-2, but couldn't really work it out.

In a case like this:

MN908947.3	4054	4077	nCoV-2019_14_LEFT	60	+
MN908947.3	4044	4068	nCoV-2019_14_LEFT_alt4	60	+
MN908947.3	4428	4450	nCoV-2019_14_RIGHT	60	-
MN908947.3	4402	4424	nCoV-2019_14_RIGHT_alt2	60	-

which primers will the current version of ivar (1.2.2) trim? All of them? The first it encounters in the BED file? The inner/outermost pair?
I found #18, but I don't understand how a change in the ivar tests (f274d3a) is supposed to affect this.

Similarly for ivar getmasked and ivar removereads: how should a set of primers like the above be translated into a proper tsv of primer pairs? My feeling would be that you want to remove reads for which binding of any of the primers involved in their generation was affected by a variant, but what would be the closest approximation of expressing this in terms of primer pairs?

Any hints would be appreciated ❤️

ivar variants -m 0 default?

Is this correct?
(-m 0 doesn't make sense, but maybe it means no min depth)

ivar variants
           -m    Minimum read depth to call variants (Default: 0)

Strand bias filter.

ivar already stores forward and reverse allele counts. Compute fisher's test

Suggested by @mpauthner

Allow naming of consensus ID

% ivar consensus -i myseqID  -p out ...

% head -n 1 out.fa
>myseqID
ATGCTATGCGATATATAT

Link to manual is not obvious

For quite a while i thought this tool had no documentation.

But then I realised the word Manual in the README.md was not a subheading but a link to a wonderful set of detailed information!

Would it be possible to make this more prominent?
Maybe a sentence under the word "Manual" saying where all the docs are?

ivar not trimming overlapping sets of primers correctly

Hi,
in #49 (comment) the intended behavior of ivar trim for reads amplified using overlapping sets of primers is described as:

Starting with v1.2, ivar handles overlapping primers by taking minimum of start pos of primers for reverse reads and maximum of end position of primers for forward reads. So it ensures that in case of overlapping primers, the innermost sequence is taken.

I think this is not what is happening though with version 1.2.2, but it seems that only the outer primer sequence gets trimmed instead. Here's a screenshot of an alignment after trimming for illustration (soft-clipped bases are highlighted to show the trimmed parts of each read):

As you can see reads that have the 18_LEFT primer sequence trimmed never show trimming of 18_LEFT_alt2.

In fact, when running ivar trim a second time on the already trimmed BAM, this is the stats it outputs:

Results: 
Primer Name	Read Count
nCoV-2019_1_LEFT	0
nCoV-2019_1_RIGHT	0
nCoV-2019_2_LEFT	0
nCoV-2019_2_RIGHT	0
nCoV-2019_3_LEFT	0
nCoV-2019_3_RIGHT	0
nCoV-2019_4_LEFT	0
nCoV-2019_4_RIGHT	0
nCoV-2019_5_LEFT	0
nCoV-2019_5_RIGHT	0
nCoV-2019_6_LEFT	0
nCoV-2019_6_RIGHT	0
nCoV-2019_7_LEFT	195
nCoV-2019_7_LEFT_alt0	0
nCoV-2019_7_RIGHT	0
nCoV-2019_7_RIGHT_alt5	6
nCoV-2019_8_LEFT	0
nCoV-2019_8_RIGHT	0
nCoV-2019_9_LEFT	0
nCoV-2019_9_LEFT_alt4	0
nCoV-2019_9_RIGHT	0
nCoV-2019_9_RIGHT_alt2	4
nCoV-2019_10_LEFT	0
nCoV-2019_10_RIGHT	0
nCoV-2019_11_LEFT	0
nCoV-2019_11_RIGHT	0
nCoV-2019_12_LEFT	0
nCoV-2019_12_RIGHT	0
nCoV-2019_13_LEFT	0
nCoV-2019_13_RIGHT	0
nCoV-2019_14_LEFT	324
nCoV-2019_14_LEFT_alt4	0
nCoV-2019_14_RIGHT	0
nCoV-2019_14_RIGHT_alt2	0
nCoV-2019_15_LEFT	0
nCoV-2019_15_LEFT_alt1	1
nCoV-2019_15_RIGHT	0
nCoV-2019_15_RIGHT_alt3	16
nCoV-2019_16_LEFT	0
nCoV-2019_16_RIGHT	0
nCoV-2019_17_LEFT	0
nCoV-2019_17_RIGHT	0
nCoV-2019_18_LEFT	0
nCoV-2019_18_LEFT_alt2	42
nCoV-2019_18_RIGHT	0
nCoV-2019_18_RIGHT_alt1	0
nCoV-2019_19_LEFT	0
nCoV-2019_19_RIGHT	0
nCoV-2019_20_LEFT	0
nCoV-2019_20_RIGHT	0
nCoV-2019_21_LEFT	0
nCoV-2019_21_LEFT_alt2	0
nCoV-2019_21_RIGHT	0
nCoV-2019_21_RIGHT_alt0	15
nCoV-2019_22_LEFT	0
nCoV-2019_22_RIGHT	0
nCoV-2019_23_LEFT	0
nCoV-2019_23_RIGHT	0
nCoV-2019_24_LEFT	0
nCoV-2019_24_RIGHT	0
nCoV-2019_25_LEFT	0
nCoV-2019_25_RIGHT	0
nCoV-2019_26_LEFT	0
nCoV-2019_26_RIGHT	0
nCoV-2019_27_LEFT	0
nCoV-2019_27_RIGHT	0
nCoV-2019_28_LEFT	0
nCoV-2019_28_RIGHT	0
nCoV-2019_29_LEFT	0
nCoV-2019_29_RIGHT	0
nCoV-2019_30_LEFT	0
nCoV-2019_30_RIGHT	0
nCoV-2019_31_LEFT	0
nCoV-2019_31_RIGHT	0
nCoV-2019_32_LEFT	0
nCoV-2019_32_RIGHT	0
nCoV-2019_33_LEFT	0
nCoV-2019_33_RIGHT	0
nCoV-2019_34_LEFT	0
nCoV-2019_34_RIGHT	0
nCoV-2019_35_LEFT	0
nCoV-2019_35_RIGHT	0
nCoV-2019_36_LEFT	0
nCoV-2019_36_RIGHT	0
nCoV-2019_37_LEFT	0
nCoV-2019_37_RIGHT	0
nCoV-2019_38_LEFT	0
nCoV-2019_38_RIGHT	0
nCoV-2019_39_LEFT	0
nCoV-2019_39_RIGHT	0
nCoV-2019_40_LEFT	0
nCoV-2019_40_RIGHT	0
nCoV-2019_41_LEFT	0
nCoV-2019_41_RIGHT	0
nCoV-2019_42_LEFT	0
nCoV-2019_42_RIGHT	0
nCoV-2019_43_LEFT	0
nCoV-2019_43_RIGHT	0
nCoV-2019_44_LEFT	0
nCoV-2019_44_LEFT_alt3	27
nCoV-2019_44_RIGHT	0
nCoV-2019_44_RIGHT_alt0	877
nCoV-2019_45_LEFT	713
nCoV-2019_45_LEFT_alt2	0
nCoV-2019_45_RIGHT	0
nCoV-2019_45_RIGHT_alt7	86
nCoV-2019_46_LEFT	0
nCoV-2019_46_LEFT_alt1	0
nCoV-2019_46_RIGHT	0
nCoV-2019_46_RIGHT_alt2	0
nCoV-2019_47_LEFT	0
nCoV-2019_47_RIGHT	0
nCoV-2019_48_LEFT	0
nCoV-2019_48_RIGHT	0
nCoV-2019_49_LEFT	0
nCoV-2019_49_RIGHT	0
nCoV-2019_50_LEFT	0
nCoV-2019_50_RIGHT	0
nCoV-2019_51_LEFT	0
nCoV-2019_51_RIGHT	0
nCoV-2019_52_LEFT	0
nCoV-2019_52_RIGHT	0
nCoV-2019_53_LEFT	0
nCoV-2019_53_RIGHT	0
nCoV-2019_54_LEFT	0
nCoV-2019_54_RIGHT	0
nCoV-2019_55_LEFT	0
nCoV-2019_55_RIGHT	0
nCoV-2019_56_LEFT	0
nCoV-2019_56_RIGHT	0
nCoV-2019_57_LEFT	0
nCoV-2019_57_RIGHT	0
nCoV-2019_58_LEFT	0
nCoV-2019_58_RIGHT	0
nCoV-2019_59_LEFT	0
nCoV-2019_59_RIGHT	0
nCoV-2019_60_LEFT	0
nCoV-2019_60_RIGHT	0
nCoV-2019_61_LEFT	0
nCoV-2019_61_RIGHT	0
nCoV-2019_62_LEFT	0
nCoV-2019_62_RIGHT	0
nCoV-2019_63_LEFT	0
nCoV-2019_63_RIGHT	0
nCoV-2019_64_LEFT	0
nCoV-2019_64_RIGHT	0
nCoV-2019_65_LEFT	0
nCoV-2019_65_RIGHT	0
nCoV-2019_66_LEFT	0
nCoV-2019_66_RIGHT	0
nCoV-2019_67_LEFT	0
nCoV-2019_67_RIGHT	0
nCoV-2019_68_LEFT	0
nCoV-2019_68_RIGHT	0
nCoV-2019_69_LEFT	0
nCoV-2019_69_RIGHT	0
nCoV-2019_70_LEFT	0
nCoV-2019_70_RIGHT	0
nCoV-2019_71_LEFT	0
nCoV-2019_71_RIGHT	0
nCoV-2019_72_LEFT	0
nCoV-2019_72_RIGHT	0
nCoV-2019_73_LEFT	0
nCoV-2019_73_RIGHT	0
nCoV-2019_74_LEFT	0
nCoV-2019_74_RIGHT	0
nCoV-2019_75_LEFT	0
nCoV-2019_75_RIGHT	0
nCoV-2019_76_LEFT	0
nCoV-2019_76_LEFT_alt3	1
nCoV-2019_76_RIGHT	0
nCoV-2019_76_RIGHT_alt0	69
nCoV-2019_77_LEFT	0
nCoV-2019_77_RIGHT	0
nCoV-2019_78_LEFT	0
nCoV-2019_78_RIGHT	0
nCoV-2019_79_LEFT	0
nCoV-2019_79_RIGHT	0
nCoV-2019_80_LEFT	0
nCoV-2019_80_RIGHT	0
nCoV-2019_81_LEFT	0
nCoV-2019_81_RIGHT	0
nCoV-2019_82_LEFT	0
nCoV-2019_82_RIGHT	0
nCoV-2019_83_LEFT	0
nCoV-2019_83_RIGHT	0
nCoV-2019_84_LEFT	0
nCoV-2019_84_RIGHT	0
nCoV-2019_85_LEFT	0
nCoV-2019_85_RIGHT	0
nCoV-2019_86_LEFT	0
nCoV-2019_86_RIGHT	0
nCoV-2019_87_LEFT	0
nCoV-2019_87_RIGHT	0
nCoV-2019_88_LEFT	0
nCoV-2019_88_RIGHT	0
nCoV-2019_89_LEFT	0
nCoV-2019_89_LEFT_alt2	5
nCoV-2019_89_RIGHT	0
nCoV-2019_89_RIGHT_alt4	211
nCoV-2019_90_LEFT	0
nCoV-2019_90_RIGHT	0
nCoV-2019_91_LEFT	0
nCoV-2019_91_RIGHT	0
nCoV-2019_92_LEFT	0
nCoV-2019_92_RIGHT	0
nCoV-2019_93_LEFT	0
nCoV-2019_93_RIGHT	0
nCoV-2019_94_LEFT	0
nCoV-2019_94_RIGHT	0
nCoV-2019_95_LEFT	0
nCoV-2019_95_RIGHT	0
nCoV-2019_96_LEFT	0
nCoV-2019_96_RIGHT	0
nCoV-2019_97_LEFT	0
nCoV-2019_97_RIGHT	0
nCoV-2019_98_LEFT	0
nCoV-2019_98_RIGHT	0

Clearly, further trimming occurs, and occurs exclusively for amplicons with complex primer sets.

Indels that not met minimum requierements called in consensus

Describe the bug
Hi I think Ivar (v1.2) is adding to consensus indels that do not meet the requirements set by user.

In a sample with an insertion in the consensus genome, when I see that insertion in the tsv file, it has 52% freq. However I have set minimum freq of 70% for consensus calling.

This happened for several other samples.

However not all insertions are added to the consensus.
For example, in another sample I saw the expected behavior. That sample had many indels in the tsv file, with the PASS flag, but only one, with 82% freq (thus passing the 70% threshold) was included in the consensus.

Isolate quality trimming

For item 1 in broadinstitute/viral-pipelines#28

Minimum read threshold for variant calling

Suggested by @mpauthner

is ivar getmasked broken at the current version?

Hi,
I just tested ivar getmasked at versions 1.2.2 and 1.2.1 and the tool seems completely broken!
Instead of reporting primer pairs, it simply lists primers with mismatches.

I checked against version 1.0.1 and that one works as expected. Seems something bad has happened to the tool in between?

Can you confirm that observation?

Deciding on overlapping variants for consensus

Testing out iVar with some tougher ONT samples, I've run in to an issue with overlapping annotations. Here is the ivar variants output for this region:

When using ivar consensus to apply the consensus, the one base deletion is applied, when using thresholds 0, 0.2, and 0.5--an 'N' is applied at thresholds 0.9 and 1. Despite the DP information, I know that the deletion is an ONT sequencing artifact (the G->T mutation creates a long string of Ts), and that there is a true SNV there.

How are overlapping indels decided upon as the "winning" variant? Can this be adjusted, particularly for ONT variants like this?

Also, thanks for all your hard work on this tool!

REF_DP + ALT_DP != TOTAL_DP

I've noticed that in the tsv variant file REF_DP + ALT_DP is usually not equal to TOTAL_DP, and sometimes is quite a bit off. For example, in the attached tsv file at position 8376
TOTAL_DP = 1527
REF_DP + ALT_DP = 2621
ALT_FREQ = 0.717092
ALT_DP / TOTAL_DP = 0.717092
ALT_DP / (REF_DP + ALT_DP) = 0.417779

variants.txt

Combine consensus and variants

It is inefficient to run both of these, when they do nearly the same thing, and both use the same mpileup.

Can you merge them?

Document altered primer trimming rules for PE data in v1.2.2

@gkarthik after noticing drastic changes in the number of trimmed primer seqs between ivar 1.2.1 and 1.2.2, I came across your comment in #42 (comment) stating that version 1.2.2 is now taking primer strand information into account.

Given how big the impact of this change seems to be, I would say it deserves some proper documentation. At least it should be highlighted in the release notes.

Also, just to make sure I understand the motivation behind this change: the idea is that you want to trim the primers from reads derived from the corresponding amplicon, but not from reads derived from overlapping (second-pool) amplicons that happen to end at those primer binding sites?
So a substantially reduced number of trimmed primer sequences upon a version upgrade (a bit alarming at first glance) should actually be considered an improvement?

missing #include <algorithm>

I had to add
#include <algorithm>
to primer_bed.h
in order to avoid the compilation error:
‘all_of’ is not a member of ‘std’

Read alignment off by one after primer trimming

Describe the bug

Some read alignments are shifted after ivar trim (v1.2) compared to untrimmed read alignments from Minimap2 (v2.17-r941) and NGMLR (v0.2.7).

To Reproduce

I aligned the Nanopore reads to the reference genome (MN908947.3) with Minimap2 (and NGMLR), trimmed primer sequences and low quality bases

$ minimap2 -ax map-ont -t16 ref.fa reads.fq | samtools sort -@16 | samtools view -F4 -b -o reads.sort.bam
$ ivar trim -b artic-ncov2019/primer_schemes/nCoV-2019/V2/nCoV-2019.bed -p reads.trim -i reads.sort.bam -q 1 -m 30 -s 4

Note: for some reason many of the reads had very low average quality scores for this run and we were mostly interested in trimming the primer sequences.

I viewed and compared alignments in IGV (v2.8.2 installed from BioConda) and noticed some big differences between the untrimmed and trimmed alignments of some reads in certain regions.

Top alignment is untrimmed Minimap2 and bottom is iVar trimmed alignment:

For example, here's the same read at position MN908947.3:29,383 where in the original alignment the base is G (QV=27) and in the trimmed alignment the base is A (QV=11):

Before ivar trim read alignment information in IGV

Read name = 04853829-6653-42bb-b303-351fbd160119
Read length = 515bp
----------------------
Mapping = Primary @ MAPQ 60
Reference span = MN908947.3:29,363-29,646 (+) = 284bp
Cigar = 89S15M1D11M1I3M1D20M1I14M5I3M2I2M2I18M1I33M1D7M1I7M2I12M1I28M2I8M5I3M1D12M2I3M4I12M1I6M2D5M1D23M1D31M120S
Clipping = Left 89 soft; Right 120 soft
----------------------
s1 = 79
s2 = 0
NM = 51
AS = 314
de = 0.1145
rl = 0
cm = 9
nn = 0
tp = P
ms = 314Location = MN908947.3:29,383
Base = G @ QV 27

After ivar trim

Read name = 04853829-6653-42bb-b303-351fbd160119
Read length = 515bp
----------------------
Mapping = Primary @ MAPQ 60
Reference span = MN908947.3:29,380-29,647 (+) = 268bp
Cigar = 104S11M1I3M1D20M1I14M5I3M2I2M2I18M1I33M1D7M1I7M2I12M1I28M2I8M5I3M1D12M2I3M4I12M1I6M2D5M1D23M1D31M120S
Clipping = Left 104 soft; Right 120 soft
----------------------
s1 = 79
s2 = 0
NM = 51
AS = 314
de = 0.1145
rl = 0
cm = 9
nn = 0
tp = P
ms = 314
Hidden tags: XALocation = MN908947.3:29,383
Base = A @ QV 11

It looks like the iVar trimmed alignment has been shifted when I zoom in on the read in both untrimmed and trimmed alignments:

Note that the position is incremented by 1 in the trimmed read alignment relative to the untrimmed alignment.

Expected behavior

I expected that the alignments should remain the same aside from primer and quality trimming.

Workstation:

OS: Arch Linux x86_64 (Kernel Release: 5.5.9-arch1-2)
CPU: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
iVar Version 1.2 from BioConda

Additional context

Conda env YAML ($ conda env export) for running iVar, Minimap2, NGMLR and samtools:

channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=0_gnu
  - bzip2=1.0.8=h516909a_2
  - ca-certificates=2020.4.5.1=hecc5488_0
  - curl=7.69.1=h33f0ec9_0
  - htslib=1.9=h4da6232_3
  - krb5=1.17.1=h2fd8d38_0
  - libcurl=7.69.1=hf7181ac_0
  - libdeflate=1.5=h516909a_0
  - libedit=3.1.20170329=hf8c457e_1001
  - libgcc-ng=9.2.0=h24d8f2e_2
  - libgomp=9.2.0=h24d8f2e_2
  - libssh2=1.8.2=h22169c7_2
  - libstdcxx-ng=9.2.0=hdf63c60_2
  - minimap2=2.17=h8b12597_1
  - ncurses=6.1=hf484d3e_1002
  - ngmlr=0.2.7=he860b03_1
  - openssl=1.1.1f=h516909a_0
  - samtools=1.9=h10a08f8_12
  - sniffles=1.0.11=hdbcaa40_1
  - tclap=1.2.1=h470a237_1
  - tk=8.6.10=hed695b0_0
  - xz=5.2.5=h516909a_0
  - zlib=1.2.11=h516909a_1006

consensus: "Minimum depth:" value never printed

It isn't prining the -m 1 value i provided?

samtools mpileup -A -d 300000 -Q 0 -F 0 ivar.sorted.bam | ivar consensus -p ivar -m 1

Minimum Quality: 20
Threshold: 0
Minimum depth: 
Regions with depth less than minimum depth covered by: -

Export variants to VCF format

ivar variants outputs results only in custom .tsv format so unusable with other tools.

Allow option to export variants in VCF v4.3 format.

Refactor awk script into C++

Refactor get_common_variants.sh.

Two reasons:

Not cool.
Package into a single binary
Causes issues if mawk implementation of awk is used - issue

consensus with ambiguities

Hi @gkarthik,

I was using ivar consensus with a -t set to 0.5 and I increased it to 0.75 to only have the major variants within my consensus but it created ambiguous bases (like Y or K for instance). If I don't want ambiguous bases should I, instead, lower the -t to 0.25 (and still have the variants with a support of at least 75%) or is there another way?

Thanks,
Paul

[warning] samtools mpileup option `F` is functional, but deprecated. Please switch to using bcftools mpileup in future.

Following the manual, these warnings are seen:

samtools mpileup -A -d 300000 -Q 0 -F 0 ivar.sorted.bam | ivar consensus -p ivar -m 1
[warning] samtools mpileup option `F` is functional, but deprecated. Please switch to using bcftools mpileup in future.

ivar trim accept stdin ?

ideally one should be able to do this:

read_aligner ref R1.fq R2.fq | ivar trim - - | samtools sort | something > answer.out

can ivar trim -i /dev/stdin work or is random access needed?
i think a raw SAM (and the REF + BED) has all the info you need to trim appropriately

feeding ivar pre-stitched PE reads

@mcroxen and i were wondering
if you pre-stitch your PE reads (eg. FLASH, bbmerge.sh, fastp --merge)
And then align these longer SE reads and feed to ivar
Will it still work in terms of primer trimming etc?

filtervariants -t 1 default?

-t    Minimum fration of files required to contain the same variant. Specify value within [0,1]. (Default: 1)

Does this mean you only keep a variant if it is in ALL (100%) of files?
If it's the same in all then it's not a variant?

Print out minimum depth

When running ivar consensus with the minimum depth option -m and the threshold (-t), the output to stdout does not provide the minimum depth.

samtools mpileup -B -A -d 1000000 -Q 0 input.bam | ivar consensus -p output_consensus -n N -t 0.6 -m 10
[mpileup] 1 samples in 1 input files
Minimum Quality: 20
Threshold: 0.6
Minimum depth: 

Regions with depth less than minimum depth covered by: N

The program is filtering based on depth so is working properly but first thought that it was not filtering based on depth due to the lack of info on stdout.

GFF3 translate in reverse for -ve strand

Current aa translation in variants does not account for translation on -ve strand.

Trim all primers in cases with overlapping primer sequences

Artic V3 scheme has overlapping primers. In these cases iVar will trim the first primer it encounters.

Change behavior to allow trimming of all primers at positions of overlapping primers.

Corrupt primer-trimmed bam file

Describe the bug
ivar trim creates a corrupt primer-trimmed bam file:

samtools view 178Ct21-C74045848_S178.primertrim.sorted.bam
[E::sam_format1] Corrupted aux data for read M02568:512:000000000-J52NT:1:1103:25289:20956
samtools view: writing to standard output failed: Invalid argument

To Reproduce
Command to generate the bam file:
ivar trim -i $sampleID.sorted.bam -b /seq/Development/Projects/covid/References/ARTIC-1k-primer.bed -p $sampleID.primertrim
samtools sort -@ 8 $sampleID.primertrim.bam -o $sampleID.primertrim.sorted.bam

Expected behavior
samtools view should behave normally

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):
Redhat linux 7

uname -a
Linux sjlsd0018.us.qdx.com 3.10.0-1062.7.1.el7.x86_64 #1 SMP Wed Nov 13 08:44:42 EST 2019 x86_64 x86_64 x86_64 GNU/Linux

GFF file downloaded from NCBI not recognized as GFF3 format

I was using ivar to call variants in a bam file and used a GFF file downloaded from NCBI for feature information.

My command (with my sample bam and using MN908947.3 as a reference with corresponding gff file from NCBI:

samtools mpileup -A -d 600000 -B -Q 0 sample.primertrim.sorted.bam | ivar variants -p sample.variants -q 20 -t 0 -r MN908947.reference.fasta -g GCF_009858895.2_ASM985889v3_genomic.gff

The command executed, but I got the following output:

Tue Mar 31 16:44:33 MDT 2020
[mpileup] 1 samples in 1 input files
GFF file is not in GFF3 file format!
GFF file is not in GFF3 file format!
GFF file is not in GFF3 file format!
GFF file is not in GFF3 file format!

OS: Centos7
ivar version: 1.1 (conda install)

-Werror=maybe-uninitialized errors in `conda build` (GCC 7.3) of test_unpaired_trim.cpp

conda build current uses GCC 7.3.0. This causes an error when doing make check, as follows:

test_unpaired_trim.cpp: In function 'int main()':
test_unpaired_trim.cpp:31:10: error: 't.cigar_::nlength' may be used uninitialized in this function [-Werror=maybe-uninitialized]
   cigar_ t;
          ^
test_unpaired_trim.cpp:128:9: error: 't.cigar_::cigar' may be used uninitialized in this function [-Werror=maybe-uninitialized]
       t = condense_cigar(t.cigar, t.nlength);
       ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I think this is because t is assigned a value inside a conditional (e.g. at line 81) but used outside that conditional (.e.g at line 113 and elsewhere).

ivar trim doesn't fail when providing an improper BED file

Describe the bug
Previously in version 1.2, if an improper BED file was provided, then ivar trim would fail. However in 1.2.2, if an improper BED file is provided, then ivar trim won't fail, it will simply skip trimming, but still output a bam file, as if no -b option had been provided, potentially causing the error to go unnoticed. Related to #35

To Reproduce
Steps to reproduce the behavior:

Download the BED file from https://github.com/artic-network/artic-ncov2019/blob/master/primer_schemes/nCoV-2019/V3/nCoV-2019.bed. The 5th column is not an integer so ivar trim will consider it invalid.
Run ivar trim using this bed file. ivar trim will complain about the BED file format:

iVar uses the standard 6 column BED format as defined here - https://genome.ucsc.edu/FAQ/FAQformat.html#format1.
It requires the following columns delimited by a tab: chrom, chromStart, chromEnd, name, score, strand

But then proceed anyways, resulting in no reads being trimmed:

Trimmed primers from 0% (0) of reads.
0.73% (10132) of reads were quality trimmed below the minimum length of 30 bp and were not writen to file.
99.27% (1376097) of reads started outside of primer regions. Since the -e flag was given, these reads were written to file.

Multiple primer sequences at same start site

Make ivar primer sequence aware instead of trimming just based on positions. This will help differentiate between reads that originate at the same start site but with different primer sequences. This will allow ivar to handle cases where the virus has very high diversity.

Insertions called as "+N" instead of the corresponding nucleotide

I observed that some insertions in .tsv files (called as "+T" in this example), in consensus files appear as "+N". This happens even if providing the reference genome to mpileup. The command line I used with ivar 1.2:

samtools mpileup -aa -A --reference Sars-Cov-2_REFERENCE.fasta -d 0 -B -Q 0 COV000517.trim.sort.bam | ivar consensus -p COV000517 -q 20 -t 0.7 -m 30 -n N

I thought that maybe it could be related to low depths, so I repeated using - as low-depth character but again, a +N appeared.

I thought that maybe it could be related to quality filters, so I repeated setting -q 0 and then something even funnier happened: this and other indels disappeared.
I attach an image of the consensus genomes obtained with the different options used:

The .tsv file for that position using our ivar variants parameters (-q 20 -t 0.05 -m 20):

MN908947.3      9604    A       +T      2608    973     67      1313    0       20      0.277766        4727

And for (-q 0 -t 0):

MN908947.3      9604    A       T       4718    2355    37      8       4       36      0.00169241      4727
MN908947.3      9604    A       C       4718    2355    37      1       1       0       0.000211551     4727
MN908947.3      9604    A       +T      4718    2355    37      1313    0       0       0.277766        4727

Compilation warnings

I'm getting a lot of compilation warnings (with g++ 8.1.0) mostly concerning comparison of signed with unsigned variables, along with unused variables.

If I were to add -Werror to CXXFLAGS and fix the warnings, would you consider a pull request?

-m 1 and code m=0 do not match


  g_args.min_depth = 0;
<https://github.com/andersen-lab/ivar|andersen-lab/ivar>andersen-lab/ivar | Added by GitHub


           -m    Minimum depth to call consensus(Default: 1)

Invalid subcommands aren't reported clearly

As you can see i made a typo in the subcommand.
I would have expected to see an error Unknown command 'consenus
Eventually figured it out and posted this :)

% ivar consenus

Usage:  ivar [command <trim|variants|filtervariants|consensus|getmasked|removereads|version|help>]

        Command       Description
           trim       Trim reads in aligned BAM file
       variants       Call variants from aligned BAM file
 filtervariants       Filter variants across replicates or samples
      consensus       Call consensus from aligned BAM file
      getmasked       Detect primer mismatches and get primer indices for the amplicon to be masked
    removereads       Remove reads from trimmed BAM file
        version       Show version information

To view detailed usage for each command type `ivar <command>`

Check BED format

When reading a BED file and format is off, iVar throws an error and exists. Refer #7

Solution: Add format check for BED file and show useful error message.

variants & consensus "doing the same thing" ?

Are the variants and consensus tools doing a lot of similar work, or am I missing something?

At first I assume it was trim -> variants -> consensus but then it seems they are independent?

Are indels handled ok?

PS. trim and consensus are exactly the tools i needed, and i am so grateful this exists!

ivar version
iVar version 1.0

Please raise issues and bug reports at https://github.com/andersen-lab/ivar/

andersen-lab / ivar Goto Github PK

ivar's People

Contributors

Stargazers

Watchers

Forkers

ivar's Issues

Recommend Projects

Recommend Topics

Recommend Org