Comments (8)
Thanks for the report. Firstly please can you check you are using latest nanopolish, as indel behaviour has been slightly adjusted in recent times.
So from what I can see, nanopolish is reporting:
- BaseCalledReadsWithVariant=64
- BaseCalledFraction=0.160804
- SupportFraction=0.59658
This means that 16% of the basecalled reads support an insertion at this position, but the nanopolish model believes that actually 60% of reads support an insertion.
So it could well be that there is an insertion at this position (at some frequency).
So I think the behaviour to mask that region with Ns is intended, particularly as there is uncertainty there.
You can turn off indel calling with the --no-indels flag to artic minion if you desire.
@jts may have an opinion on this too.
from fieldbioinformatics.
Hi @Takadonet ,
The QUAL score in the VCF record is very low relative to the number of reads at the position (133.0, 364 reads) so this variant fails our filters. I think this particular variant is a calling artifact and filtering it is the correct behaviour.
We consider positions that fail our filters as being unreliable to make a definitive call at, so we mask them with an N so they don't confuse phylogenetic analysis.
Thanks for pointing this variant out, I recently tweaked the indel calling behaviour so I appreciate the feedback. I'll keep on an eye these spurious insertions.
Jared
from fieldbioinformatics.
Thanks for quick reply!
We are currently on version 0.13.0 for nanopolish (latest env.yml from artic-ncov2019) and see that you just updated a few days ago to 0.13.1 in this repo.
Will make a new conda env and will report back the results at theses positions.
Sine there such a great disparity between what end user can review in the IGV/Tablet and what nanopolish is calling, is there another method to review base pair being called?
Anyway to address their concerns of the differences? Since there are over 10k samples in gisaid now, no point rushing to be first and more worried of submitting quality and reviewed consensus sequences.
from fieldbioinformatics.
You could try the medaka pipeline and compare with that?
I am confused about the difference in basecalled frequency (16%) and the IGV view which suggests only 2% (8 insertions at depth 395) though. Would you be able to share the BAM file with me (perhaps privately?).
from fieldbioinformatics.
I will try out medaka as well and see how they differ.
I am confused as well about the differences and that why seeking for help. Just asking permission before sharing the BAM file with you privately.
from fieldbioinformatics.
Sent bam file link to your email @nickloman
from fieldbioinformatics.
Thanks. So from what I can see from the alignment there is some support for that variant (judging from Tablet), but not sufficient for it to be called a variant. I agree you could confidently call reference there. One way of doing that easily is to add '--no-indels' to artic minion (but be cautious, you won't get any indels with that). I might add another optional argument to the pipeline to let it drop those low confidence variants instead of masking it.
from fieldbioinformatics.
Thanks for taking a look. Using medaka beta addressed this particular issue for these strains but failed for others. We are investigating what is going on.
For moment we are just going to fix them 'by hand' since only 3 SNV's across a dozen strains but in future will use the optional argument when it is available.
from fieldbioinformatics.
Related Issues (20)
- artic pipeline not installing properly: ModuleNotFoundError: No module named 'importlib.metadata' HOT 1
- Longshot `thread 'main' panicked at 'assertion failed: p <= 0.0'` error: possible to update longshot? HOT 3
- vcf_filter.py filtering parameters and align_trim.py for alternative primer scheme
- Error with medaka consensus HOT 1
- Unable to install artic on mac
- artic_vcf_merge Fails HOT 2
- What is the default mindepth ? HOT 4
- pipeline has stopped working...! HOT 5
- Conda recipe for 1.2.3 not working HOT 8
- no version information available (required by samtools) HOT 5
- Overlapping variants in fail.vcf and pass.vcf cause preconsensus to be masked and thus bcftools complains when applying the "pass" variant HOT 2
- Artic v5 mismatched primer names in bedfile causes "dropped" amplicons HOT 1
- Update medaka version HOT 2
- KeyError: 'SupportFraction' when running margin_cons.py
- Clarify what's meant by 'Suppress variant'
- Conda environment set up HOT 3
- Custom bed file with "primer gaps" causes workflow to crash at artic-tools vcf_checker step HOT 1
- Error: bcftools consensus fail with overlapping variants HOT 5
- htslib =1.17 dependency PLEASE UPDATE HOT 1
- PyVCF assigns a QUAL value of '.' instead of zero
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fieldbioinformatics.