Git Product home page Git Product logo

fieldbioinformatics's People

Contributors

biowilko avatar gkarthik avatar jts avatar mattloose avatar nickloman avatar rambaut avatar rpoplawski0 avatar sagrudd avatar trvrb avatar will-rowe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fieldbioinformatics's Issues

Error when running with too few reads/very low coverage

When running Artic on a sample with too few reads, resulting in too small coverage, Longshot automatically calculates the maximum coverage as 0, and fails. We're having issues with that in an automatic pipeline, where some of our samples keep failing due to that, since they have too few reads (as expected, since they were negative samples).

Running: longshot -P 0 -F -A --no_haps --bam barcode91.primertrimmed.rg.sorted.bam --ref primer_schemes/SARS-CoV-2/V1200/SARS-CoV-2.reference.fasta --out barcode91.merged.vcf --potential_variants barcode91.merged.vcf.gz
Command failed:longshot -P 0 -F -A --no_haps --bam barcode91.primertrimmed.rg.sorted.bam --ref primer_schemes/SARS-CoV-2/V1200/SARS-CoV-2.reference.fasta --out barcode91.merged.vcf --potential_variants barcode91.merged.vcf.gz

Already reported it to them at pjedge/longshot#72, but I agree with them that the best solution would be to simply not use the -A flag (or, alternatively, to ignore the error in the Artic pipeline when it happens, and create an empty file instead).


I tried running the branch in 1.3.0-dev as well, but end up with an error in the artic-tools check_vcf step:

Command failed:artic-tools check_vcf --summaryOut barcode91.vcfreport.txt --vcfOut barcode91.merged.filtered.vcf barcode91.merged.vcf.gz primer_schemes/SARS-CoV-2/V1200/SARS-CoV-2.scheme.bed 2> barcode91.vcfcheck.log

It seems the problem appears when the VCF file is empty:

bcftools view barcode91.merged.vcf.gz
##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##medaka_version=1.2.3
##INFO=<ID=DP,Number=1,Type=Integer,Description="Depth of reads at pos">
##INFO=<ID=DPS,Number=2,Type=Integer,Description="Depth of reads at pos by strand (fwd, rev)">
##INFO=<ID=DPSP,Number=1,Type=Integer,Description="Depth of reads spanning pos +-25">
##INFO=<ID=SR,Number=.,Type=Integer,Description="Depth of spanning reads by strand which best align to each allele (ref fwd, ref rev, alt1 fwd, alt1 rev, etc.)">
##INFO=<ID=AR,Number=2,Type=Integer,Description="Depth of ambiguous spanning reads by strand which align equally well to all alleles (fwd, rev)">
##INFO=<ID=SC,Number=.,Type=Integer,Description="Total alignment score to each allele of spanning reads by strand (ref fwd, ref rev, alt1 fwd, alt1 rev, etc.) aligned with parasail match 5, mismatch -4, open 5, extend 3">
##INFO=<ID=Pool,Number=1,Type=String,Description="The pool name">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Medaka genotype.">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Medaka genotype quality score">
##bcftools_viewVersion=1.10.2+htslib-1.10.2
##bcftools_viewCommand=view barcode91.merged.vcf.gz; Date=Mon Aug 23 12:36:40 2021
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMPLE

Running artic-tools check_vcf ends up on a Segmentation fault (core dumped) when reading the empty VCF:

$ artic-tools check_vcf --summaryOut barcode91.vcfreport.txt --vcfOut barcode91.merged.filtered.vcf barcode91.merged.vcf.gz primer_schemes/SARS-CoV-2/V1200/SARS-CoV-2.scheme.bed
[12:35:34] [artic-tools::check_vcf] starting VCF checker
[12:35:34] [artic-tools::check_vcf] reading scheme
[12:35:34] [artic-tools::check_vcf] collecting scheme stats
[12:35:34] [artic-tools::check_vcf] 	primer scheme file:	primer_schemes/SARS-CoV-2/V1200/SARS-CoV-2.scheme.bed
[12:35:34] [artic-tools::check_vcf] 	reference sequence:	MN908947.3
[12:35:34] [artic-tools::check_vcf] 	number of pools:	2
[12:35:34] [artic-tools::check_vcf] 	number of primers:	58 (includes 0 alts)
[12:35:34] [artic-tools::check_vcf] 	minimum primer size:	22
[12:35:34] [artic-tools::check_vcf] 	maximum primer size:	30
[12:35:34] [artic-tools::check_vcf] 	number of amplicons:	29
[12:35:34] [artic-tools::check_vcf] 	mean amplicon size:	1091
[12:35:34] [artic-tools::check_vcf] 	maximum amplicon size:	1177
[12:35:34] [artic-tools::check_vcf] 	scheme ref. span:	30-29790
[12:35:34] [artic-tools::check_vcf] 	scheme overlaps:	6.5591393%
[12:35:34] [artic-tools::check_vcf] setting parameters
[12:35:34] [artic-tools::check_vcf] 	output report: barcode91.vcfreport.txt
[12:35:34] [artic-tools::check_vcf] 	filtering variants: true
[12:35:34] [artic-tools::check_vcf] 	output VCF: barcode91.merged.filtered.vcf
[12:35:34] [artic-tools::check_vcf] 	minimum quality threshold: 10.0
[12:35:34] [artic-tools::check_vcf] reading VCF file
Segmentation fault (core dumped)

align_trim strand consideration

When doing align_trim, I noticed a read could be soft-clipped by both forward and reverse primers.

Sorry, I am not familiar with sequencing protocol in detailed, but should the align_trim take the strand into account while doing the trimming, so forward primers are trimmed only from forward strand and reverse primers are trimmed from reverse strand?

before line 231: if not segment.is_reverse:
before line 243: if segment.is_reverse:

Duplicate variants in VCF outputs causing bcftools consensus error

Hi,

I've run into what seems to be a rare issue (first time I've seen it in 100's of samples) where there are duplicate variants in my *.pass.vcf.gz and *.merged.vcf file which is leading to bcftools consensus to generate an error with an output of:

The site MN908947.3:27967 overlaps with another variant, skipping...
The fasta sequence does not match the REF allele at MN908947.3:28250:
     .vcf: [T]
     .vcf: [TCTG] <- (ALT)
     .fa:  [G]TTCATCTAAACGAACAAACTAAAATGTCTGATAATGGACCCCAAAATCAGCGAAATGCA...

I also get the output from vcf_merge of:

Found primer binding site mismatch: {}
Found primer binding site mismatch: {}

Issue is similar to #21 from the looks of it although I checked and I am on the latest version so it must be something else.

For reference I'm currently using the following:

Sorry I don't know too much more, I'll keep looking at it in the meantime. Thanks for any help you can offer and sorry if I missed a solution somewhere,

Darian

Question regarding align_trim.pl

Hi all,

I am using align_trim.py script to remove primers from the alignment. In me case, we have performed SARS2 amplification with v3 primer set, and then sequenced with illumina (Miseq).

I was expecting align_trim.py to trims primers from aligned reads only if reads starts or ends in the same positions as a primers starts or ends, and both reads and primers are in the same orientation (I think it has no sense to remove a reverse primer from a forward read, as this forward read can come from and overlapped amplicon). But after executing the program, we can see that all positions in the alignment were a primers is mapped are remove, and all bases up to the end of the read (when the read is forward and the primers is reverse for instance), among other strange things, it could be long to explain...

Here is a image of a couple of reads (in a very sad region, with very few mapped reads)

Captura de pantalla 2022-01-13 a las 0 34 03

So, I do not know is I am losing something is the program, or if in nanopore reads, where I do not have any experience, this behavior of the program is the expected.

Best,

Alberto

Test Datasets

Hello,

I was wondering if you knew of any good publicly available datasets for the V3 Artic Tiling Amplicon sequencing of hCoV-19. Ideally I would love to have a test dataset showing each of the variants of concern (UK, South Africa, and Brazil) along with the Wuhaun strain.

I've tried looking through GISAID and SRA, but as far as I can tell GISAID only supplies the preassembled genomes in a strange format and I need the raw fast5 or fastq files. And SRA is just very challenging to search to get exactly the type of library/sequence set you need that has enough metadata to inform the analysis.

I apologize if there are datasets somewhere already, but I'm somewhat frantically trying to figure out how to do this ARTIC analysis before teaching a course on it that begins on Monday. We sequenced synthetic salvia that contained RNA for the Wuhaun strain and three variants of concern, but for some reason the variant calling is coming up with nothing and I'm struggling to figure out if it's our data or if it's something going wrong in the pipeline.

I would sincerely appreciate any help or a pointer to appropriate public datasets that have done the ARTIC V3 tiling amplicon approach and have worked with the standard SOP bioinformatics pipeline.

necessity of two-barcode requirement

Hello,

I apologize in advance for the long-winded post. I'm hoping to engage a discussion on the necessity of the "two-barcode" demultiplexing requirement.

In the ARTIC SARS-CoV-2 SOP, it states "For the current version of the ARTIC protocol it is essential to demultiplex using strict parameters to ensure barcodes are present at each end of the fragment." This is further elaborated on in the guide here, where it is explained that the main concern is due to the possible occurrence of in silico chimeric reads. This all seems fine and we have been following the recommended stringent Guppy demux settings for nanopore-based SARS-CoV-2 sequencing.

However, on some runs we are seeing a significant percentage of reads which are not being assigned into barcode bins because of the two-barcode requirement. The extent of the issue varies from run to run, but for example in the latest run, 78% of reads were thrown out because of this requirement. Based on the plot below, I suspect that the reason for this is that the barcode and adapter sequence are actually missing on one end of the reads. The reason for this is another issue that the lab doing the sequencing is talking to ONT about tomorrow (although any ideas are welcome).

length_by_class.pdf

In any case, if I remove the two-barcode requirement and leave the rest of the parameters as default, only 7% of the reads are unclassified. Because of uneven coverage for some of the amplicons, the result for many samples is a drastic difference in completeness of the consensus genome sequence.

In doing some benchmarking of demultiplexing and the effect of tuning various parameters, I'm questioning the need for the two-barcode requirement. My logic is based on the fact that there multiple other filters and factors which mitigate the potential for issues caused by in silico chimeric reads. Although we've been using Guppy lately for demuxing at the same time as basecalling, the following points are specific to Porechop as that's what I've used to investigate and benchmark the issue:

  1. Even when two barcode matches are not required, Porechop will not classify a read that has mismatched high quality barcode hits at both ends. If one end has a best hit to NB1 and a passing score, and the other end has a best hit to NB2 and a passing score within a certain distance of the first, the read is not classified. This is controlled by the --barcode_diff parameter of Porechop.

  2. Porechop will not classify a read that has a matching adapter sequence in the middle, as would be expected for chimeric reads. This is enabled by default when demultiplexing and in fact cannot be disabled.

  3. Perhaps most importantly, we are applying the size filtering using artic guppyplex as recommend in the SOP. Because the ncov19 amplicon sizes are fairly uniform and thus the read length distribution is fairly tight, the upper limit can be set conservatively, essentially precluding the possibility of chimeric reads passing through unless they were too short to begin with.

  4. Again because of the narrow read length distributions, any significant population of chimeric reads would be obvious on the read length QC plot as a second peak. In practice, there is sometimes a small peak at approximately twice the ampicon size (this is before size filtering), but it is a very small fraction of the total population.

Clearly, the ideal solution is to fix the problem of apparent missing barcodes shown in the plot above. However, in the meantime we would like to make the best use of the data we have. My inclination, based on the above reasoning, is to remove the two-barcode requirement and re-process the data. I would appreciate any feedback from the ARTIC experts on the points above and if this would be considered acceptable practice.

Many thanks.

Muscle 5.1 breaks the excution of `artic minion`

As you do not pin the version of your dependency package muscle to 3.8, the higher version 5.1 is now chosen by conda.

The 5.1 version has a different command set and your code now raises the following error:

Running: muscle -in 20200311_1427_X1_FAK72834_a3787181_barcode07.muscle.in.fasta -out 20200311_1427_X1_FAK72834_a3787181_barcode07.muscle.out.fasta

Invalid command line
Unknown option in

Medaka calls a 5 nt deletion instead of a 6 nt deletion

Hi there,

in our recent Omicron samples (21K aka BA.1) we found a few with a weird number of spike mutations, in particular only one AA substitution, see sample16 here:
report_snippet
(Linage assignment and mutations by nextclade.)

So I was looking into the data and tracked it down to a 5 nt deletion (21766-21770) in the spike, which causes a frame shift (S:70-1274).

That's why substitutions after S:A67V are missing and we see deletions S:I68- and SH69- instead of S:H69-, S:V70-, which are normally found in 21K aka BA.1 (Omicron). On nucleotide level we see a 6 nt deletion (21765-21770) in 'normal' 21K Omicrons.

In the genome browser all looks fine, also the coverage on position 21765, which should be a deletion:
igv_snapshot_run382_del

The ARTIC pipeline is called as follows (within poreCov):

artic minion --medaka --medaka-model r941_min_hac_g507 --min-depth 20 --normalise 500 --threads 16 --scheme-directory external_primer_schemes

With nanopolish instead of medaka we see the expected 6 nt deletion.

Primer protocol is ARTIC V4.1.


Updates:

  • same behavior for medaka 1.5.0 and medaka 1.4.3 (inside ARTIC)
  • checked medaka model
  • solved with sup (super-acc) basecalling and respective medaka model

The question is now: Can we fix that by adapting parameters for medaka?

environment.yml

Hi,
I tried to install the pipeline and didn't find artic in my env.
It seems to have an error on the environment.yml
dependencies:

  • artic-porechop==0.3.2pre

I solved by separate artic from porechop.
Cheers

remove ebov-flongle/fast5_pass

Please can you remove or exclude the test-data/ebov-flongle/fast5_pass folder? It has a size of nearly 1 GB and it takes very long to clone it during conda install or update and sometimes breaks on slow connections.
Unfortunately, it is neither possible to shallow clone a git repository using pip nor using filters introduced with git 2.19.

Could not find primer scheme

Hello,

I am using my own scheme of primers, the directory looks like:
fieldbioinformatics/schemes
fieldbioinformatics/schemes/vc07
fieldbioinformatics/schemes/vc07/vc07.insert.bed
fieldbioinformatics/schemes/vc07/vc07.primer.bed
fieldbioinformatics/schemes/vc07/vc07.reference.fasta
fieldbioinformatics/schemes/vc07/vc07.reference.fasta.fai

the primer.bed file looks like:

contig 2905711 2905730 Vc07_LEFT 1 +
contig 2907061 2907080 Vc07_RIGHT 1 -

if I type: artic-tools validate_scheme schemes/vc07/vc07.scheme.bed
I get:
[01:19:09] [artic-tools::validate_scheme] starting primer scheme validator
[01:19:09] [artic-tools::validate_scheme] reading scheme
[01:19:09] [artic-tools::validate_scheme] collecting scheme stats
[01:19:09] [artic-tools::validate_scheme] primer scheme file: schemes/vc07/vc07.scheme.bed
[01:19:09] [artic-tools::validate_scheme] reference sequence: VCAR10162
[01:19:09] [artic-tools::validate_scheme] number of pools: 1
[01:19:09] [artic-tools::validate_scheme] number of primers: 2 (includes 0 alts)
[01:19:09] [artic-tools::validate_scheme] minimum primer size: 19
[01:19:09] [artic-tools::validate_scheme] maximum primer size: 19
[01:19:09] [artic-tools::validate_scheme] number of amplicons: 1
[01:19:09] [artic-tools::validate_scheme] mean amplicon size: 1331
[01:19:09] [artic-tools::validate_scheme] maximum amplicon size: 1331
[01:19:09] [artic-tools::validate_scheme] scheme ref. span: 2905711-2907080
[01:19:09] [artic-tools::validate_scheme] scheme overlaps: 0.0%

but when I run: artic minion --read-file 'barcode19.fastq' --scheme-directory schemes --skip-nanopolish --medaka --medaka-model r941_min_fast_g303 --scheme-version vc07 barcode19

I get:

could not find primer scheme and reference sequence, attempting to download
Running: artic-tools get_scheme vc07 --schemeVersion 1
[01:15:00] [artic-tools::get_scheme] starting primer scheme downloader
[01:15:00] [artic-tools::get_scheme] requested scheme: vc07
[01:15:00] [artic-tools::get_scheme] requested version: 1
[01:15:00] [artic-tools::get_scheme] fetching manifest file
[01:15:00] [artic-tools::get_scheme] ARTIC manifest URL: https://raw.githubusercontent.com/artic-network/primer-schemes/master/schemes_manifest.json
[01:15:00] [artic-tools::get_scheme] ARTIC repository DOI: 10.5281/zenodo.4004423
[01:15:00] [artic-tools::get_scheme] finding primer scheme
[01:15:00] [artic-tools::get_scheme] scheme not found: vc07
[01:15:00] [artic-tools::get_scheme] listing available scheme aliases (case insensitive)
[01:15:00] [artic-tools::get_scheme] - ebola
[01:15:00] [artic-tools::get_scheme] - ebov
[01:15:00] [artic-tools::get_scheme] - zaire
[01:15:00] [artic-tools::get_scheme] - nipah
[01:15:00] [artic-tools::get_scheme] - niv
[01:15:00] [artic-tools::get_scheme] - sars-cov-2
[01:15:00] [artic-tools::get_scheme] - scov2
[01:15:00] [artic-tools::get_scheme] - ncov
[01:15:00] [artic-tools::get_scheme] - ncov-2019
error--> no primer scheme available for vc07

Why is not finding the primer scheme? Running on conda environment, artic 1.2.1

Thanks,

Anyi

SNP filtering/masking/soft clipping with medaka

Hi - I have 2 SNP sites in a Sars Cov2 BAM that have become N in the resulting consensus sequence and I was hoping you could help me understand why. I am using V3 primer set. After reading Quick J et al. 2017 and these docs https://artic.readthedocs.io/en/latest/minion/ . My understanding of this paragraph - 'By softmasking, we refer to the process of adjusting the CIGAR of each alignment segment such that soft clips replace any reference or query consuming operations in regions of an alignment that fall outside of primer boundaries. The leftmost mapping position of alignments are also updated during softmasking.' - is that if the sequence in a read is meant to be excluded from contributing it will be soft clipped. This fits with what I see in BAMs in IGV. The following two SNP sites do not fit with my understanding of how this is meant to work:

Position 28111 which is found between - but not contained within - V3 primers sites 93_LEFT and 92_RIGHT. 93_LEFT ends at 28105 so it is 6 bp away.

VCFs:

grep 28111 *vcf
2270.fail.vcf:MN908947.3	28111	.	A	G	30.82	PASS	DP=4;AC=0,4;AM=0;MC=0;MF=0.0;MB=0.0;AQ=16.84;GM=1;PH=6.02,6.02,6.02,6.02;SC=None;	GT:GQ:PS:UG:UQ	1/1:9.19:.:1/1:9.19
2270.merged.vcf:MN908947.3	28111	.	A	G	30.82	PASS	DP=4;AC=0,4;AM=0;MC=0;MF=0.000;MB=0.000;AQ=16.84;GM=1;PH=6.02,6.02,6.02,6.02;SC=None;	GT:GQ:PS:UG:UQ	1/1:9.19:.:1/1:9.19
2270.nCoV-2019_1.vcf:MN908947.3	28111	.	A	G	43.555	PASS		GT:GQ	1:44
2270.nCoV-2019_2.vcf:MN908947.3	28111	.	A	G	17.292	PASS		GT:GQ	1:17

depths

grep 28111 *depths
2270.coverage_mask.txt.nCoV-2019_1.depths:MN908947.3	nCoV-2019_1	28111	84
2270.coverage_mask.txt.nCoV-2019_2.depths:MN908947.3	nCoV-2019_2	28111	4

igv_snapshot_28111_2

RG1 is pink (ish) and all 84 reads look to have been filtered out. Not sure why. RG2 (blue) has 4 reads (only 2 show above as only 2 are from that primer site) and SNP A>G at 28111 is in fail VCF due to having DP 4.

The second SNP site i'm interested in is 1596

VCFs:

grep 1596 *vcf
2270.fail.vcf:MN908947.3	1596	.	A	G	275.91	PASS	DP=19;AC=0,18;AM=1;MC=0;MF=0.0;MB=0.0;AQ=13.78;GM=1;PH=6.02,6.02,6.02,6.02;SC=None;	GT:GQ:PS:UG:UQ	1/1:48.9:.:1/1:48.9
2270.merged.vcf:MN908947.3	1596	.	A	G	275.91	PASS	DP=19;AC=0,18;AM=1;MC=0;MF=0.000;MB=0.000;AQ=13.78;GM=1;PH=6.02,6.02,6.02,6.02;SC=None;	GT:GQ:PS:UG:UQ	1/1:48.90:.:1/1:48.90
2270.nCoV-2019_1.vcf:MN908947.3	1596	.	A	G	44.158	PASS		GT:GQ	1:44
2270.nCoV-2019_2.vcf:MN908947.3	1596	.	A	G	36.182	PASS		GT:GQ	1:36

depths

grep 1596 *depths
2270.coverage_mask.txt.nCoV-2019_1.depths:MN908947.3	nCoV-2019_1	1596	19
2270.coverage_mask.txt.nCoV-2019_2.depths:MN908947.3	nCoV-2019_2	1596	31

igv_snapshot_1596

This one is at the edge of the soft clipping. All 31 RG2 (blue) reads are filtered out by something leaving 19 RG1 (pink) and thus and N in the consensus sequence.

Is this behavior intended? Can you please tell me why one RG is being filtered out in this region of overlap between primer sites?

Kind Regards,
Liam

artic minion fails with midnight primers

Hello,

I've run into a bug where the artic minion pipeline fails at artic_make_depth_mask in some samples which have reads mapping to near the end of the reference genome.

I did some tracing, and it looks like there is an off-by-one error caused by align_trim creating a bam file which goes beyond the end of the reference genome. So far, I've only seen this in midnight primers, but am unsure if that's the cause of this issue.

To replicate this issue, I've uploaded a sample read, which can be run with:

artic minion --medaka --no-longshot --normalise 1000 --threads 4 --scheme-directory primer_schemes --read-file single_broken.fastq.gz --medaka-model r941_min_high_g360 nCoV-2019/V1200 single_broken
single_broken.fastq.gz

Midnight primer files can be found: here
artic version = 1.2.1
Ubuntu 20.04

Edge where N being called instead of reference bp

tdlr : Edge case where N being called in position where sufficient coverage in pileup for reference base pair but being missed since nanopolish variant called low quality variant and being filtered out in downstream analyses.

We examine our consensuses calling against the pileup and noticed 3 positions ( 7203 ; 24,891 ; 24,892 ) where N's were present when sufficient coverage was found in pileup.

Example below.

image

In the case above, C should have been called since 99% of the reads matched, however nanopolish variant indicate there was an insertion from a C to CT . As indicate from vcf file below from read group nCoV-2019_2 as seen here (to my limited knowledge, read group nCoV-2019_2 is primer set of reads)

Based on the info column from that entry, looks like nanopolish has a completely different view of is in the pileup. I am sure there is post quality filtering step happening.

In any case, it does seems to be an edge case since downstream that variant is filtered into fail.vcf and then into artic_mask to be masked as N base pair. Would it not make more sense to check failed variant to see if there enough evidence to call the reference base pair?

Perhaps artic_mask command can be modified to include checking every reference base pair for depth of coverage to ensure it can be called? Indirectly calling non-variants.

What do you think?

read depth plot bug/feature?

I happened to examine one of the amplicon read depth plots (e.g. sample-barplot.png) that the ARTIC pipeline produces during some troubleshooting, and noticed that the reported read depths appear to be about half of that expected based on the values in the depth files (e.g. sample.coverage_mask.txt.nCoV-2019_1.depths). In looking at the code for the plotting script added back in May:

amplicons = sorted(rgAmplicons[rg])
starts = sorted(rgStarts[rg])

# bin read depths by amplicon for this readgroup
df['amplicon'] = pd.cut(
    x=df['position'], bins=starts, labels=amplicons)

# store the mean of each bin
bins = (df.groupby(['amplicon'])[
        'depth'].mean()).rename(depthFile.name)

...it seems like this is taking the mean of the depths between each start position, but wouldn't this include the adjacent amplicon in the opposite read group (which would be all zeros). If so, this would explain why the values are about half those expected.

Sorry if I'm missing something obvious here.

artic_make_depth_mask now ignoring RG1 coverages

Since the new update to artic_make_depth_mask, the resulting coverage_mask.txt file masks any positions covered by read group 1 amplicons, even if they have sufficient coverage. Only regions covered by read group 2 amplicons are unmasked. When the script is run without the "--store-rg-depths" option, the resulting coverage_mask.txt file is correct.

Heterozygotic variants more precise filtering

Hi,
as described in this pull request #58 , I found out that a more precise filtering for heterozygotic variants for artic minion --medaka pipeline is desirable.

Here is one of many examples I found in my samples of significant heterozygotic variants, which were filtered out by artic_vcf_filter --longshot. With the proposed patch such variants are left in *.pass.vcf.gz files, while at the same time homopolymer false positives are still put in *.fail.vcf (all of them in my over a dozen samples).

Good heterozygotic variant filtered out with current pipeline, retained with the patch:
MN908947.3 24872 . G T 500.0 PASS DP=400;AC=120,227;AM=53;MC=0;MF=0.0;MB=0.0;AQ=11.48;GM=1;PH=6.02,6.02,6.02,6.02;SC =None; GT:GQ:PS:UG:UQ 0/1:147.24:.:0/1:147.24
image

False positive homopolymer filtered out with current and patched pipelines:
MN908947.3 10527 . C CT 96.06 PASS DP=398;AC=130,59;AM=209;MC=0;MF=0.0;MB=0.0;AQ=7.4;GM=1;PH=6.02,6.02,6.02,6.02;SC=None; GT:GQ:PS:UG:UQ 0/1:96.06:.:0/1:96.06
image

Do you think this more precise filtering could be the default? Maybe with different default filtering parameters, although I found these in the patch (0.5 minimal fraction of alternate reads, 12 reads minimum to retain) to be ok.

artic_get_stats fails because V4 primer names in align_trim report don't match names in .bed file

I'm running a pipeline which uses artic minion for consensus generation and then artic _get_stats for quality control. When trying to run this with the V4 primers, artic_get_stats fails with the following error:

amplicon in align_trim report but not in primer scheme SARSCoV2120_1_LEFT_SARSCoV2120_1_RIGHT

I used the V4 primer scheme in the primer-schemes repository, where the primers are named "SARS-CoV-2_1_LEFT" and "SARS-CoV-2_1_RIGHT" respectively; the other names are also affected. I suspect the error either gets introduced in parsing the bed file in read_bed_file in vcftagprimersites.py or in outputting the report in align_trim.py.

Variant in amplicon overlapping region

I have encountered a mutation that consistently arises in an amplicon overlap region. However one amplicon has coverage exceeding 300x whereas the other has coverage <20 throughout my dataset. In the vcfreport.txt file the mutation is recorded as ‘located within an amplicon overlap region; nothing seen at position yet, holding var’ and then when the next variant is located at an overlapping region the message explains that the new var is being held and the old var is being dropped. This variant does then not pass checks despite having over 300x coverage and appearing present in the bam file. Is there a way to explain this or mitigate for the low coverage in one amplicon but ample coverage in the other?

I also found that where the variant passes checks in other samples the vcfreport.txt reports ‘multiple copies of var found at pos X in overlap region, keeping all copies’. What is the threshold that needs to met to keep the variant?

Thanks for your help!

Multiqc issue

My colleague is using an older version of this pipeline and gets barplots however I see you've deleted this feature and installed multiqc. When I run multiqc I am not getting any files. How do I run multiqc or get the barplots used in the old version? Thanks!

artic minion pyhton3.6 error

Hi sir,

I have installed Miniconda Python 3.6 version, conda version 4.8.3. i got following error if you can please resolve this issue.

Thanks

artic minion --threads 8 --scheme-directory ~/artic-ncov2019/primer_schemes --read-file sample01.fastq --sequencing-summary /data/COVID_SET1/COVID-19/merged_fastq_files/ artic-ncov2019/primer_schemes/ nCoV-2019/V3 sample01

Traceback (most recent call last):
File "/home/grid/miniconda3/envs/artic-ncov2019/bin/artic", line 10, in
sys.exit(main())
File "/home/grid/miniconda3/envs/artic-ncov2019/lib/python3.6/site-packages/artic/pipeline.py", line 216, in main
args.func(parser, args)
File "/home/grid/miniconda3/envs/artic-ncov2019/lib/python3.6/site-packages/artic/pipeline.py", line 35, in run_subtool
submodule.run(parser, args)
File "/home/grid/miniconda3/envs/artic-ncov2019/lib/python3.6/site-packages/artic/minion.py", line 23, in run
scheme_name, scheme_version = args.scheme.split('/')
ValueError: too many values to unpack (expected 2)

SARS-CoV-2 workflow comparison - kindly check if your work is represented correctly

Hello ARTIC Team,

I am from the University Hospital Essen, Germany, and we work extensively with SARS-CoV-2 in our research. We have also developed a SARS-CoV-2 workflow. In preparation for the publication of our workflow, we have looked at several other SARS-CoV-2 related workflows, including your work. We will present this review in the publication and want to ensure that your work is represented as accurately as possible.

Moreover, there is currently little to no current overview of SARS-CoV-2 related workflows. Therefore, we have decided to make the above comparison publicly available via this GitHub repository. It contains a table with an overview of the functions of different SARS-CoV-2 workflows and the tools used to implement these functions.

We would like to give you the opportunity to correct any misunderstandings on our side. Please take a moment to make sure we are not misrepresenting your work or leaving out important parts of it by taking a look at this overview table. If you feel that something is missing or misrepresented, please feel free to give us feedback by contributing directly to the repository.

Thank you very much!

cc @alethomas

Tmp file issue while using medaka instead of longshot

I have been running a bunch of viruses in parallel using --no-longshot from the same directory. Medaka appears to very briefly have a tmp file that is not uniquely named (tmp.medaka-annotate.vcf), and if several jobs get there at the same time mayhem ensues. It works fine if run individually.

Point of failure:

Running: medaka tools annotate --pad 25 --RG nCoV-2019_2 barcode22/EDB14609.nCoV-2019_2.vcf primer_schemes/nCoV-2019/V3/nCoV-2019.reference.fasta barcode22/EDB14609.trimmed.rg.sorted.bam tmp.medaka-annotate.vcf
Running: mv tmp.medaka-annotate.vcf barcode22/EDB14609.nCoV-2019_2.vcf
Command failed:mv tmp.medaka-annotate.vcf barcode22/EDB14609.nCoV-2019_2.vcf

This seems suboptimal :)

Please port to Python3

Hello,
The Debian Med team is packaging fieldbioinformatics for official Debian. The recently released Debian 10 was the last Debian release featuring Python2 since this programming language is EOL. If you are interested that we package and maintain fieldbioinformatics in official Debian (and that users of other modern distributions will have no problems to install fieldbioinformatics on their systems) I'd recommend you port your code to Python3. The 2to3 tool might be of great help here.
Kind regards,
Mali

Question about --model in medaka pipeline

Hello,

I can see in the master branch, though not in any release yet, that --model has been added to the medaka command.
I imagine this is from using the latest medaka versions.

In your experience is there much difference in output quality? Have you done any testing in this area?

Thanks for your time.
James

issue assigning variants occuring within an amplicon overlap region

Running to an issue with variant assignment of omicron datasets with the ARTIC pipeline reverts variants to the reference despite sufficient read support for the variants in the bam file.
The variants are occuring within amplicon overlap regions
Looking at the vcf report i see the following;

12:27:11] [artic-tools::check_vcf] variant at pos 23599: T->G
[12:27:11] [artic-tools::check_vcf] located within an amplicon overlap region
[12:27:11] [artic-tools::check_vcf] var pos does not match with that of previously identified overlap var, holding new var (and dropping held var at 23013)
[12:27:11] [artic-tools::check_vcf] variant at pos 23604: C->A
[12:27:11] [artic-tools::check_vcf] located within an amplicon overlap region
[12:27:11] [artic-tools::check_vcf] var pos does not match with that of previously identified overlap var, holding new var (and dropping held var at 23599)

Wondering if this is a known issue and how best to address this.

Longshot issue with test-runner.sh

Hello,

Describe the bug
I have installed artic v1.1.3 with python 3.7.4. I tested my installation with the bash script .\test-runner medaka. The pipelines artic gather, demultiplex and gupplypex pass the test. But minion fails at the longshot step.

Would be thankful for any advice. Thank you!
Baptiste

Logging
Running: artic_vcf_merge ebov-mayinga ../test-data/primer-schemes/IturiEBOV/V1/IturiEBOV.scheme.bed Ebov-DRC_1:ebov-mayinga.Ebov-DRC_1.vcf Ebov-DRC_2:ebov-mayinga.Ebov-DRC_2.vcf
Found primer binding site mismatch: {}
Running: bgzip -f ebov-mayinga.merged.vcf
Running: tabix -p vcf ebov-mayinga.merged.vcf.gz
Running: longshot -P 0 -F -A --no_haps --bam ebov-mayinga.primertrimmed.rg.sorted.bam --ref ../test-data/primer-schemes/IturiEBOV/V1/IturiEBOV.reference.fasta --out ebov-mayinga.longshot.vcf --potential_variants ebov-mayinga.merged.vcf.gz

2020-11-02 15:56:26 Automatically determining max read coverage.
2020-11-02 15:56:26 Estimating mean read coverage...
2020-11-02 15:56:27 Total reference positions: 18953
2020-11-02 15:56:27 Total bases in bam: 5999445
2020-11-02 15:56:27 Mean read coverage: 316.54
2020-11-02 15:56:27 Min read coverage set to 6.
2020-11-02 15:56:27 Max read coverage set to 405.
2020-11-02 15:56:27 Estimating alignment parameters...
2020-11-02 15:56:27 Done estimating alignment parameters.

				Transition Probabilities:
				match -> match:          0.950
				match -> insertion:      0.017
				match -> deletion:       0.033
				deletion -> match:       0.657
				deletion -> deletion:    0.343
				insertion -> match:      0.699
				insertion -> insertion:  0.301

				Emission Probabilities:
				match (equal):           0.943
				match (not equal):       0.019
				insertion:               1.000
				deletion:                1.000

2020-11-02 15:56:27 Reading potential variants from input VCF...
WARNING: Potential variant VCF contains contig b'BTB20484' not found in BAM contigs.
error: Error reading potential variants VCF file.
caused by: Error accessing tid from chrom2tid data structure
Command failed:longshot -P 0 -F -A --no_haps --bam ebov-mayinga.primertrimmed.rg.sorted.bam --ref ../test-data/primer-schemes/IturiEBOV/V1/IturiEBOV.reference.fasta --out ebov-mayinga.longshot.vcf --potential_variants ebov-mayinga.merged.vcf.gz
`

Tests fail on M1 Mac

medaka command fails with 'zsh: illegal hardware instruction' . Seems to be a known issue importing Tensorflow in Python?

Installation with conda

Hi guys,

I am trying to get artic reinstalled on our server after having to reinstall miniconda3 (v.3.7).

I have tried to create the env from the enviroment.yml

conda env create -f environment.yml

However, it has been stuck on this for hours...

I then thought I would just install with conda as suggested in the README.md, however, I am getting the below error.

Any advice on how to get back up and running again?!

`conda install -c bioconda artic
\Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: /
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versionsThe following specifications were found to be incompatible with your system:

  • feature:/linux-64::__glibc==2.31=0
  • feature:|@/linux-64::__glibc==2.31=0

Your installed version is: 2.31
`
Thank you :)

Command failed:align_trim

Hi I'm trying to run the artic pipeline on a minION dataset, r9.4.1
the earlier versions of this pipeline worked (https://github.com/artic-network/artic-ncov2019), i updated to this repository (ver: 1.2.1)
and "align_trim" step

Command failed:align_trim --normalise 200 /Compute/CovidSeq/minION/artic-ncov2019/primer_schemes/nCoV-2019/V3/nCoV-2019.scheme.bed --start --remove-incorrect-pairs --report barcode01.alignreport.txt < barcode01.sorted.bam 2> barcode01.alignreport.er | samtools sort -T barcode01 - -o barcode01.trimmed.rg.sorted.bam

started giving this error:

samtools sort: failed to read header from "-"

cat barcode01.alignreport.er

Traceback (most recent call last):
  File "/staging/appdir/anaconda3/bin/align_trim", line 33, in <module>
    sys.exit(load_entry_point('artic==1.2.1', 'console_scripts', 'align_trim')())
  File "/staging/appdir/anaconda3/lib/python3.8/site-packages/artic-1.2.1-py3.8.egg/artic/align_trim.py", line 296, in main
    go(args)
  File "/staging/appdir/anaconda3/lib/python3.8/site-packages/artic-1.2.1-py3.8.egg/artic/align_trim.py", line 169, in go
    bam_header = infile.header.copy().to_dict()
AttributeError: 'dict' object has no attribute 'to_dict'

artic-tools check_vcf Segmentation fault on empty VCF files

Issue:

artic-tools check_vcf produces a segmentation fault when VCF files have no mutations (i.e. negative control samples).

Version:

$ artic --version
artic 1.3.0

Steps to reproduce:

Ran artic-tools check_vcf --summaryOut summary.out sample.merged.vcf.gz SARS-CoV-2.scheme.bed
sample.merged.vcf.gz has a VCF header but no mutations in the file

##fileformat=VCFv4.2
##nanopolish_window=MN908947.3:1-29902
##INFO=<ID=TotalReads,Number=1,Type=Integer,Description="The number of event-space reads used to call the variant">
##INFO=<ID=SupportFraction,Number=1,Type=Float,Description="The fraction of event-space reads that support the variant">
##INFO=<ID=SupportFractionByStrand,Number=2,Type=Float,Description="Fraction of event-space reads that support the variant for each strand">
##INFO=<ID=BaseCalledReadsWithVariant,Number=1,Type=Integer,Description="The number of base-space reads that support the variant">
##INFO=<ID=BaseCalledFraction,Number=1,Type=Float,Description="The fraction of base-space reads that support the variant">
##INFO=<ID=AlleleCount,Number=1,Type=Integer,Description="The inferred number of copies of the allele">
##INFO=<ID=StrandSupport,Number=4,Type=Integer,Description="Number of reads supporting the REF and ALT allele, by strand">
##INFO=<ID=StrandFisherTest,Number=1,Type=Integer,Description="Strand bias fisher test">
##INFO=<ID=SOR,Number=1,Type=Float,Description="StrandOddsRatio test from GATK">
##INFO=<ID=RefContext,Number=1,Type=String,Description="The reference sequence context surrounding the variant call">
##INFO=<ID=Pool,Number=1,Type=String,Description="The pool name">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample

Log output:

[11:51:06] [artic-tools::check_vcf] starting VCF checker
[11:51:06] [artic-tools::check_vcf] reading scheme
[11:51:06] [artic-tools::check_vcf] collecting scheme stats
[11:51:06] [artic-tools::check_vcf] 	primer scheme file:	SARS-CoV-2.scheme.bed
[11:51:06] [artic-tools::check_vcf] 	reference sequence:	MN908947.3
[11:51:06] [artic-tools::check_vcf] 	number of pools:	2
[11:51:06] [artic-tools::check_vcf] 	number of primers:	198 (includes 0 alts)
[11:51:06] [artic-tools::check_vcf] 	minimum primer size:	20
[11:51:06] [artic-tools::check_vcf] 	maximum primer size:	31
[11:51:06] [artic-tools::check_vcf] 	number of amplicons:	99
[11:51:06] [artic-tools::check_vcf] 	mean amplicon size:	356
[11:51:06] [artic-tools::check_vcf] 	maximum amplicon size:	373
[11:51:06] [artic-tools::check_vcf] 	scheme ref. span:	25-29854
[11:51:06] [artic-tools::check_vcf] 	scheme overlaps:	18.39485%
[11:51:06] [artic-tools::check_vcf] setting parameters
[11:51:06] [artic-tools::check_vcf] 	output report: /tmp/summary.out
[11:51:06] [artic-tools::check_vcf] 	filtering variants: false
[11:51:06] [artic-tools::check_vcf] 	minimum quality threshold: 10.0
[11:51:06] [artic-tools::check_vcf] reading VCF file
Segmentation fault

How do deal with overlapping mutations in pass.vcf and fail.vcf?

Hello!

This is somewhat related to #53

In my fail.vcf I have:

MN908947.3	694	.	T	A	54.15	PASS	DP=2813;AC=40,25;AM=2748;MC=0;MF=0.0;MB=0.0;AQ=4.55;GM=1;PH=6.02,6.02,6.02,6.02;SC=None;	GT:GQ:PS:UG:UQ	0/1:54.15:.:0/1:54.15

And the pass.vcf I have:

MN908947.3      685     .       AAAGTCATTT      A       500.0   PASS    DP=2812;AC=2,2810;AM=0;MC=0;MF=0.0;MB=0.0;AQ=35.89;GM=1;PH=6.02,6.02,6.02,6.02;SC=None; GT:GQ:PS:UG:UQ  1/1:500.0:.:1/1:500.0
MN908947.3      691     .       AT      A       500.0   PASS    DP=2813;AC=0,2489;AM=324;MC=0;MF=0.0;MB=0.0;AQ=10.48;GM=1;PH=6.02,6.02,6.02,6.02;SC=None;       GT:GQ:PS:UG:UQ  1/1:500.0:.:1/1:500.0

When artic_mask is run, AAAGTCATTT becomes AAAGTCATTN in the preconsensus.fasta, and as a consequence bcftools complains:

Note: the --sample option not given, applying all records regardless of the genotype
The fasta sequence does not match the REF allele at MN908947.3:685:
   REF .vcf: [AAAGTCATTT]
   ALT .vcf: [A]
   REF .fa : [AAAGTCATTN]GACTTAG.....

What would be your recommended procedure in these very rare cases? Ignore the fail.vcf, mask the variant in the pass.vcf?

Deletion no longer being called

Hello,
We run positive controls from previous specimens as part of our quality control. We are running the Medaka pipeline. On the current version of Artic, there is a deletion that was called on multiple previous samples at 28271 (Alpha) (SARS-COV-2), which appears to currently be filtered out. This is a 1 nucleotide deletion that appears real on BAM files, and is present in the vcf.pass file but is not present in the ending vcf.merge file. It appears as if it is actively being filtered out. Depth in the region is good. Is this supposed to be filtered out now?
Thank you!
Paul

Longshot VCF causing bcftools to crash

Hi,

Since the introduction of Longshot to the artic minion pipeline, I am seeing some genomes won't assemble. Seems like the VCF from Longshot doesn't match the reference (preconsensus.fasta) passed to bcftools consensus and causes an error resulting in an empty consensus.fasta file.

EPI2ME ARTIC failing for Midnight primers

I had recently did a Nanopore sequencing run for SARS-CoV-2 in MinION with midnight primers 1200 bp amplicons (Nikki Freed protocol) and native barcoding kit (EXP-NBD96). I have earlier used EPI2ME pipeline for data analysis using ARTIC V3 primers. This time the ARTIC analysis is failing for Midnight primers. I put my queries on the community, where one of the users is saying I should use the recommended wf-ARTIC bioinformatics pipeline with rapid barcodes. Can someone please support me with the data analysis. Which bioinformatics pipeline I should use here and how to fix this problem?

Thanks in advance!

quality Filter for pass.vcf

Hello,
I am using artic pipeline and as it filters the variants into pass and fail .vcf.
can anyone tell me what filters are applied on merged.vcf and what is the cutoff for quality of variants and for depth.
Please let me know as soon as possible.

Thanx in advanced.

Not seeing default argument values in pipeline.py

Hello,

I see that there is some code in place using ArgumentParserWithDefaults to include the specified argument defaults in the help string, but it's not picking them up for me. As a result, all the default values are hidden when you use artic <subcommand> -h. This tripped me up because I thought that simply removing --normalise <N> from artic minion would drop the normalisation step (instead it defaulted to --normalise 100). Not sure where the argparser it's going wrong, but it'd be great if someone with a sharper eye than me could take a look.

Thanks,
John

Artic-network for Aarch64 system

Does artic-network support aarch64 system (like jetson families)? If yes, how can I install artic pipeline in a aarch64 system?

Best
Han

Release 1.3.0

An issue to track progress on the next minor release. This is in response to the CLIMB/ARTIC workshop where there were some usability issues and improvement suggestions. Features to target for the next minor release (1.3.0):

QC report:

  • parameterise the coverage cutoff on the amplicon plot using the --depth flag provided to the artic_make_depth_mask. This is currently set to 50 on the plot but actually 20 is the minimum read depth used in the depth mask.
  • add more meaningful VCF fail output - include vars from the PASS and FAIL files
  • add some coverage stats from the alignment file (trimmed / primertrimmed)

Software:

  • update to the new medaka release on PyPi - test it and wait for conda recipe before releasing 1.3.0
  • optional - check the output of latest medaka vs longshot and evaluate if longshot can be replaced

Minion:

  • parameterise minimum read depth for masking, this will feed into the amplicon plotting
  • add the QC report to the core pipeline (remove requirement of --strict)
  • enforce full length amplicon alignment during align_trim when pipeline run with --strict

VCF report:

  • update the report to be more intuitive and informative (issue for artic-tools)

README/Tests:

  • looks like Travis is retiring/limiting free service plan? Try GitHub actions out instead
  • add the DOI to the readme, which has been minted since 1.2.0

V1200/midnight primer scheme

Hi guys,

Been testing out the midnight primers in our lab which look good when running the epi2me analysis, however, I am now at the command line and it is saying it does not recognise the V1200 scheme... I have added this to the directory with the other schemes locally. Is there a reason why these aren't included in the pipeline? Should I be using a different pipeline?

Thanks :)

Compatibility between new MinKNOW and Artic pipeline

Hello,

With the release of the new MinKNOW, basecalling models for Guppy 4.0 were made the default.

Since the Medaka software used with the Artic pipeline seems to be based on older versions of Guppy,
we were wondering whether it is fine to use Fastq files from the new MinKNOW (HAC in real-time GPU) with your pipeline?

Artic_vcf_merge failed

Hi, I am getting the following error when I run the minion command:

Command failed:artic_vcf_merge CERI-KRISP-K032245_ONT /home/idowu/Downloads/Bioinfo/fieldbioinformatics/test-data/primer-schemes/nCoV-2019/V3/nCoV-2019.scheme.bed 2> CERI-KRISP-K032245_ONT.primersitereport.txt nCoV-2019_2:CERI-KRISP-K032245_ONT.nCoV-2019_2.vcf nCoV-2019_1:CERI-KRISP-K032245_ONT.nCoV-2019_1.vcf

The command I entered was:

artic minion --threads 10 --normalise 200 --skip-nanopolish --scheme-directory ~/Downloads/Bioinfo/fieldbioinformatics/test-data/primer-schemes --strict --read-file ../CERI-KRISP-K032245_ONT.fastq.gz nCoV-2019/V3 CERI-KRISP-K032245_ONT

I don't know what seems to be wrong here

Question: read group assignment in align_trim

Hi,

I am just learning how to process Nanopore sequencing data. I am starting from a FASTQ file and trying to understand the artic pipeline.

Could you please explain me how align_trim assigns the read groups ?

Thanks

Question regarding --normalise

Hello,
Can you provide more detail as to how the normalization option in artic minion works? In the documentation, you state "normalise/reduce the number of read alignments to each amplicon." Does this mean that if I use --normalize 200 that once an amplicon reaches 200 reads, the others will be discarded? Does anything happen to amplicons with less than 200 reads?

Thanks for the clarification,
Candice

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.