Git Product home page Git Product logo

Comments (10)

kcleal avatar kcleal commented on May 20, 2024 1

Hi brent, thanks for looking at this, I can see why this is confusing! Basically, the ones you have pulled out that do not have insertion sequences called - these are events that were identified from split/supplementary mappings (they have RMS re-mapping score==0, and WR within-read support == 0). These events will take a bit more handling to infer the insertion sequence. I will try and get this fixed in the next few days.

from dysgu.

brentp avatar brentp commented on May 20, 2024

Could use some help on this @kcleal .

from dysgu.

kcleal avatar kcleal commented on May 20, 2024

Hi brentp,

For the 1st record you have highlighted it is possible to get the insertion sequence by slicing the first 36 bases (svlen=36) from the soft-clipped region i.e:

tttctttctttctttctttctttctttctttctttc

This is not guarenteed to work for all insertions though, as some only have partial mappings (or no mappings). I guess full insertion sequences could be added into the alt allele field, however it is not clear what to do for partial or no-mappings.

The lowercase letters are simply non-reference aligned bases, so any insertion or soft-clipped sequences will be represented as lowercase.

In the second example, there looks like a 2bp insertion sequence in the consensus, but this is not related to the deletion. The actual deletion sequence is quite difficult to extract from the record, but can be sliced from the reference using POS and END; this is obviously a pain to do, so I will try and get that added to the alt allele field I think.

from dysgu.

brentp avatar brentp commented on May 20, 2024

Thanks for the reply.
Having the insertion/deletion sequence in the ALT field where possible helps for use with downstream tools. I can certainly add the deletion sequence.
For manta, in cases where it can't assemble the entire insertion, it reports LEFT_SVINSSEQ and RIGHT_SVINSSEQ and I can stitch those together separated by 'N' * 100 so that a tool like paragraph can still be used for genotyping.

from dysgu.

kcleal avatar kcleal commented on May 20, 2024

Thanks, I think that is a good solution, I will get this fixed.

from dysgu.

brentp avatar brentp commented on May 20, 2024

That would be awesome! And, if it can be assembled, manta puts the insertion sequence in SVINSSEQ in the INFO. Though, of course having it in the ALT would be even better.

from dysgu.

kcleal avatar kcleal commented on May 20, 2024

Hi brent, ive added some improvements to insertion reporting. If possible the insertion seq is now written in the ALT field, however this applies only to re-mapped sequences where the whole insertion was likely mapped, and 'within-read' events. Other insertion sequences that have partial mappings (or no mappings) are available in the LEFT_SVINSSEQ and RIGHT_SVINSSEQ fields.
Also of note, there is the option to set the contig verbosity. e.g. --verbosity 0 will hide all contigs in the output, for example (but keep the ALT and insertion sequences)

from dysgu.

brentp avatar brentp commented on May 20, 2024

Excellent, Thank you! I am evaluating this now.

from dysgu.

brentp avatar brentp commented on May 20, 2024

I can run this and get some INS sequences. For those that do not have it, what does that mean?
E.g.:

1       1413382 35      C       <INS>   .       PASS    SVMETHOD=DYSGUv1.2.6;SVTYPE=INS;END=1413382;CHR2=1;GRP=829;NGR
P=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=133;CONTIGA=TTGTATTTTAGTAGAGACGGGGTTTCTCCATGTTGGTCAGGCTGGTCTCTAACTCCCGACCTCAGGTG
ATCCACCCGCCTCGGCCTCTCAAACTGTTGGGATTACAGGCATGTGCCACCACGCCTGGCtaatgttgtattttagtagagacggg;RIGHT_SVINSSEQ=taatgttgtattttag
tagagacggg;KIND=extra-regional;GC=51.95;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=0;OL=0;SU=9;WR=0;PE=0;SR=2;SC=9;BND=5;LPREC=1;RT
=pe     GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB   0/1:145:26.0:9:0:0:2:9:5:21.11
:1:6:5:52:0:6:0.521:1.46:0.76:0.739

has only RIGHT
and:

1       1362768 33      .       <INS>   .       PASS    SVMETHOD=DYSGUv1.2.6;SVTYPE=INS;END=1362769;CHR2=1;GRP=773;NGR
P=1;CT=3to5;CIPOS95=0;CIEND95=63;SVLEN=31;CONTIGA=tcccctgcatcaccctgccctgccccttcccctccaccaccctgccctgcccccTCCCCTCCATCACC
CTGCCCTGCCCCCTCCCCTCCATCACCCTGCCCTGCCCCCACCCCTCCATCATCCCGCCCGCTCCCCTCTCCACCCCTCC;KIND=extra-regional;GC=73.65;NEXP=0;S
TRIDE=0;EXPSEQ=;RPOLY=31;OL=0;SU=3;WR=0;PE=0;SR=3;SC=1;BND=0;LPREC=1;RT=pe    GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH
10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB   0/1:34:43.0:3:0:0:3:1:0:20.86:1:2:4:0:0:22:0.667:0.36:0.24:0.517

has neither.
Thanks!

from dysgu.

brentp avatar brentp commented on May 20, 2024

Here is another one:

chr2  134621015       35526   .       <INS>   .    P
ASS     SVMETHOD=DYSGUv1.2.6;SVTYPE=INS;END=134621065;CHR2=chr2;GRP=432474;NGRP=1;CT=5to3;CIPOS95=0;CIEND95=0;SVLEN=47
;CONTIGA=agttgataaccgaatatagtcaaaataaaattttctgtgcttcaaaaaatatctttaagaaaatgaaaagacaagctacttactgtgaaaaaatAATATCTTTAAGAAA
ATGAAAAGACAAGCTACTTACTGTGAAAAAATAATTGCAAATCATATTTCTGATAAACTACTTGCATCCAGAATATATATCCC;CONTIGB=AGTTGATAACCGAATATAGTCAAAAT
AAAATTTTCTGTGCTTCAAAAAATATCTTTAAGAAAATGAAAAGACAAGCTACTTACTGTGAAAAAATAATatctttaagaaaatgaaaagacaagctacttactgtgaaaaaataat
tgcaaatcatatttctgataaactacttgcatccagaatatatatccc;KIND=extra-regional;GC=25.52;NEXP=0;STRIDE=0;EXPSEQ;RPOLY=3;OL=3;SU=7
;WR=0;PE=0;SR=7;SC=14;BND=0;LPREC=1;RT=pe       GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:O
CN:PROB 0/1:200:60:7:0:0:7:14:0:35.08:1:6:8:0:0:2:0.725:1.5:1.088:0.794

has contiga and contigb, but no SEQ fields.

from dysgu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.