Hi, I am working with trios ONT data for which I have Illumina shor

HI <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

some queries about SViper about sviper HOT 3 OPEN

prasundutta87 commented on May 26, 2024

some queries about SViper

from sviper.

Comments (3)

smehringer commented on May 26, 2024

HI @prasundutta87,

thanks for your interest in SViper!
I try to maintain SViper but often I have troubles finding the time.

When I was testing the software with one set of sample, I found that most of the FAILs were FAIL5 ("The variant was polished away."). What is the reason behind this fail?

This means that after polishing the variant is not visible anymore in the data (in the corrected long reads).

What is meant by FAIL3 (The long read regions do not fit)? Can this please be elaborated?

This means that for the given variant, no proper region from the long read could be extracted. E.g. although a long read has a desired deletion of 200, the flanking regions of this deletion are mapped very poorly, or the mapping indicates a complex variant. SViper can only polish simple deletions and insertions.

I am aware that no tags should be present. SViper skips the variants. With bcftools, I am getting the error that SKIP is not defined. If you are still developing the tool, can that please be added? Although I have changed the SVTYPE to INS, SViper checks the variant type by tags rather than SVTYPE.

I'll try to change this! But I can't promise that it will be in the next days.

I observed that the the SViper score is put on the QUAL field. How is the score calculated and does it have any biological significance?

The score is computed here:

SViper/include/sviper/evaluate_final_mapping.h

Lines 21 to 27 in 3b57a9c

 // Score computation 

 // ----------------- 

 double error_rate = ((double)length(record.cigar) - 1.0)/ (config.flanking_region * 2.0); 

 double fuzzyness = (1.0 - error_rate/0.15) * 100.0; 

 variant.quality = std::max(fuzzyness, 0.0); 

 record.mapQ = variant.quality;

It does not have a biological significance! As far as I remember my own code, it was experimentally derived and proved to work well on manual inspection.

Should I filter my SVs based on SViper score again? Is there a threshold based on which I should remove SVs?

Unfortunately this is very hard to answer and heavily depends on your use case. In general I can say that you should filter out variants with a FAIL tag. Those are very unlikely to be true. But variants with a low score might just mean that the polishing didn't work well. I might not filter by the score but only regard this as a confidence score.

I actually filter my SVs using QUAL value of the variant caller (cuteSV). So, replacing this value with the score effects my pipeline.

Can you filter the SVs before polishing them with SViper? Otherwise I might need to see if I can add an option that does not overwrite the quality scores but adds them in the INFO field.

Best,
Svenja

from sviper.

prasundutta87 commented on May 26, 2024

Hi @smehringer ,

Thank you so much to answer my queries. Is there a document anywhere where the algorithm or working on SViper is mentioned anywhere. I am just trying to get my head around the algorithm (not the numerical/quantification bit, but the general concept of polishing). For example, when you say polishing the variant is not visible anymore in the corrected long reads, can this please be elaborated?

Thanks for the suggestion to use SViper after my final filtering.

Also, do we need to sort the BAMs by name? It seems to be specifically mentioned for utilities_merge_split_alignments tool. Currently I have coordinate sorted my BAMs.

This may be trivial, but the master version of SViper is 2.0.0, but the most recent version is 2.1.0. At least it shows in the help that it is 2.0.0. Could you kindly clarify this?

Regards,
Prasun

from sviper.

smehringer commented on May 26, 2024

Is there a document anywhere where the algorithm or working on SViper is mentioned anywhere.

I've send you an email with my thesis that hopefully can answer most of your questions.

Also, do we need to sort the BAMs by name? It seems to be specifically mentioned for utilities_merge_split_alignments tool. Currently I have coordinate sorted my BAMs.

For sviper, sorting by coordinate is fine (even required).
Only when using the utility utilities_merge_split_alignments you need to have the BAM sorted by names. But do you want to use this utility at all? It's an rather advanced utility I used in my validation pipelines.

This may be trivial, but the master version of SViper is 2.0.0, but the most recent version is 2.1.0. At least it shows in the help that it is 2.0.0. Could you kindly clarify this?

I'm very sorry for the confusion! This is a documentatio bug. I forgot to change the version in the help page. I'll correct this.

from sviper.

some queries about SViper about sviper HOT 3 OPEN

Comments (3)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	// Score computation
	// -----------------
	double error_rate = ((double)length(record.cigar) - 1.0)/ (config.flanking_region * 2.0);
	double fuzzyness = (1.0 - error_rate/0.15) * 100.0;

	variant.quality = std::max(fuzzyness, 0.0);
	record.mapQ = variant.quality;