tdlr : Edge case where N being called in position where sufficient co

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Sent bam file link to your email <a class="user-mention notranslate" data-hovercard-ty

Edge where N being called instead of reference bp about fieldbioinformatics HOT 8 OPEN

Takadonet commented on August 16, 2024

Edge where N being called instead of reference bp

from fieldbioinformatics.

Comments (8)

nickloman commented on August 16, 2024

Thanks for the report. Firstly please can you check you are using latest nanopolish, as indel behaviour has been slightly adjusted in recent times.

So from what I can see, nanopolish is reporting:

BaseCalledReadsWithVariant=64
BaseCalledFraction=0.160804
SupportFraction=0.59658

This means that 16% of the basecalled reads support an insertion at this position, but the nanopolish model believes that actually 60% of reads support an insertion.

So it could well be that there is an insertion at this position (at some frequency).

So I think the behaviour to mask that region with Ns is intended, particularly as there is uncertainty there.

You can turn off indel calling with the --no-indels flag to artic minion if you desire.

@jts may have an opinion on this too.

from fieldbioinformatics.

jts commented on August 16, 2024

Hi @Takadonet ,

The QUAL score in the VCF record is very low relative to the number of reads at the position (133.0, 364 reads) so this variant fails our filters. I think this particular variant is a calling artifact and filtering it is the correct behaviour.

We consider positions that fail our filters as being unreliable to make a definitive call at, so we mask them with an N so they don't confuse phylogenetic analysis.

Thanks for pointing this variant out, I recently tweaked the indel calling behaviour so I appreciate the feedback. I'll keep on an eye these spurious insertions.

Jared

from fieldbioinformatics.

Takadonet commented on August 16, 2024

Thanks for quick reply!

We are currently on version 0.13.0 for nanopolish (latest env.yml from artic-ncov2019) and see that you just updated a few days ago to 0.13.1 in this repo.

Will make a new conda env and will report back the results at theses positions.

Sine there such a great disparity between what end user can review in the IGV/Tablet and what nanopolish is calling, is there another method to review base pair being called?

Anyway to address their concerns of the differences? Since there are over 10k samples in gisaid now, no point rushing to be first and more worried of submitting quality and reviewed consensus sequences.

from fieldbioinformatics.

nickloman commented on August 16, 2024

You could try the medaka pipeline and compare with that?

I am confused about the difference in basecalled frequency (16%) and the IGV view which suggests only 2% (8 insertions at depth 395) though. Would you be able to share the BAM file with me (perhaps privately?).

from fieldbioinformatics.

Takadonet commented on August 16, 2024

I will try out medaka as well and see how they differ.

I am confused as well about the differences and that why seeking for help. Just asking permission before sharing the BAM file with you privately.

from fieldbioinformatics.

Takadonet commented on August 16, 2024

Sent bam file link to your email @nickloman

from fieldbioinformatics.

nickloman commented on August 16, 2024

Thanks. So from what I can see from the alignment there is some support for that variant (judging from Tablet), but not sufficient for it to be called a variant. I agree you could confidently call reference there. One way of doing that easily is to add '--no-indels' to artic minion (but be cautious, you won't get any indels with that). I might add another optional argument to the pipeline to let it drop those low confidence variants instead of masking it.

from fieldbioinformatics.

Takadonet commented on August 16, 2024

Thanks for taking a look. Using medaka beta addressed this particular issue for these strains but failed for others. We are investigating what is going on.

For moment we are just going to fix them 'by hand' since only 3 SNV's across a dozen strains but in future will use the optional argument when it is available.

from fieldbioinformatics.

Edge where N being called instead of reference bp about fieldbioinformatics HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent