Comments (10)
Hi brent, thanks for looking at this, I can see why this is confusing! Basically, the ones you have pulled out that do not have insertion sequences called - these are events that were identified from split/supplementary mappings (they have RMS re-mapping score==0, and WR within-read support == 0). These events will take a bit more handling to infer the insertion sequence. I will try and get this fixed in the next few days.
from dysgu.
Could use some help on this @kcleal .
from dysgu.
Hi brentp,
For the 1st record you have highlighted it is possible to get the insertion sequence by slicing the first 36 bases (svlen=36) from the soft-clipped region i.e:
tttctttctttctttctttctttctttctttctttc
This is not guarenteed to work for all insertions though, as some only have partial mappings (or no mappings). I guess full insertion sequences could be added into the alt allele field, however it is not clear what to do for partial or no-mappings.
The lowercase letters are simply non-reference aligned bases, so any insertion or soft-clipped sequences will be represented as lowercase.
In the second example, there looks like a 2bp insertion sequence in the consensus, but this is not related to the deletion. The actual deletion sequence is quite difficult to extract from the record, but can be sliced from the reference using POS and END; this is obviously a pain to do, so I will try and get that added to the alt allele field I think.
from dysgu.
Thanks for the reply.
Having the insertion/deletion sequence in the ALT field where possible helps for use with downstream tools. I can certainly add the deletion sequence.
For manta, in cases where it can't assemble the entire insertion, it reports LEFT_SVINSSEQ
and RIGHT_SVINSSEQ
and I can stitch those together separated by 'N' * 100 so that a tool like paragraph
can still be used for genotyping.
from dysgu.
Thanks, I think that is a good solution, I will get this fixed.
from dysgu.
That would be awesome! And, if it can be assembled, manta puts the insertion sequence in SVINSSEQ
in the INFO. Though, of course having it in the ALT would be even better.
from dysgu.
Hi brent, ive added some improvements to insertion reporting. If possible the insertion seq is now written in the ALT field, however this applies only to re-mapped sequences where the whole insertion was likely mapped, and 'within-read' events. Other insertion sequences that have partial mappings (or no mappings) are available in the LEFT_SVINSSEQ
and RIGHT_SVINSSEQ
fields.
Also of note, there is the option to set the contig verbosity. e.g. --verbosity 0
will hide all contigs in the output, for example (but keep the ALT and insertion sequences)
from dysgu.
Excellent, Thank you! I am evaluating this now.
from dysgu.
I can run this and get some INS sequences. For those that do not have it, what does that mean?
E.g.:
1 1413382 35 C <INS> . PASS SVMETHOD=DYSGUv1.2.6;SVTYPE=INS;END=1413382;CHR2=1;GRP=829;NGR
P=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=133;CONTIGA=TTGTATTTTAGTAGAGACGGGGTTTCTCCATGTTGGTCAGGCTGGTCTCTAACTCCCGACCTCAGGTG
ATCCACCCGCCTCGGCCTCTCAAACTGTTGGGATTACAGGCATGTGCCACCACGCCTGGCtaatgttgtattttagtagagacggg;RIGHT_SVINSSEQ=taatgttgtattttag
tagagacggg;KIND=extra-regional;GC=51.95;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=0;OL=0;SU=9;WR=0;PE=0;SR=2;SC=9;BND=5;LPREC=1;RT
=pe GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB 0/1:145:26.0:9:0:0:2:9:5:21.11
:1:6:5:52:0:6:0.521:1.46:0.76:0.739
has only RIGHT
and:
1 1362768 33 . <INS> . PASS SVMETHOD=DYSGUv1.2.6;SVTYPE=INS;END=1362769;CHR2=1;GRP=773;NGR
P=1;CT=3to5;CIPOS95=0;CIEND95=63;SVLEN=31;CONTIGA=tcccctgcatcaccctgccctgccccttcccctccaccaccctgccctgcccccTCCCCTCCATCACC
CTGCCCTGCCCCCTCCCCTCCATCACCCTGCCCTGCCCCCACCCCTCCATCATCCCGCCCGCTCCCCTCTCCACCCCTCC;KIND=extra-regional;GC=73.65;NEXP=0;S
TRIDE=0;EXPSEQ=;RPOLY=31;OL=0;SU=3;WR=0;PE=0;SR=3;SC=1;BND=0;LPREC=1;RT=pe GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH
10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB 0/1:34:43.0:3:0:0:3:1:0:20.86:1:2:4:0:0:22:0.667:0.36:0.24:0.517
has neither.
Thanks!
from dysgu.
Here is another one:
chr2 134621015 35526 . <INS> . P
ASS SVMETHOD=DYSGUv1.2.6;SVTYPE=INS;END=134621065;CHR2=chr2;GRP=432474;NGRP=1;CT=5to3;CIPOS95=0;CIEND95=0;SVLEN=47
;CONTIGA=agttgataaccgaatatagtcaaaataaaattttctgtgcttcaaaaaatatctttaagaaaatgaaaagacaagctacttactgtgaaaaaatAATATCTTTAAGAAA
ATGAAAAGACAAGCTACTTACTGTGAAAAAATAATTGCAAATCATATTTCTGATAAACTACTTGCATCCAGAATATATATCCC;CONTIGB=AGTTGATAACCGAATATAGTCAAAAT
AAAATTTTCTGTGCTTCAAAAAATATCTTTAAGAAAATGAAAAGACAAGCTACTTACTGTGAAAAAATAATatctttaagaaaatgaaaagacaagctacttactgtgaaaaaataat
tgcaaatcatatttctgataaactacttgcatccagaatatatatccc;KIND=extra-regional;GC=25.52;NEXP=0;STRIDE=0;EXPSEQ;RPOLY=3;OL=3;SU=7
;WR=0;PE=0;SR=7;SC=14;BND=0;LPREC=1;RT=pe GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:O
CN:PROB 0/1:200:60:7:0:0:7:14:0:35.08:1:6:8:0:0:2:0.725:1.5:1.088:0.794
has contiga and contigb, but no SEQ fields.
from dysgu.
Related Issues (20)
- what is SVLEN? HOT 2
- Error with --search option HOT 11
- Generating Alternative Reference HOT 16
- Run OSError: [Errno 24] Too many open files Mac OS M HOT 4
- OverflowError: can't convert negative value to size_t HOT 2
- Dysgu filter IndexError: string index out of range HOT 6
- long reads default mapq lowered to 1: help text for dysgu call still says pacbio and nanopore mode has --mq 20 HOT 1
- When will docker image with new release be available? HOT 1
- Got an warning when Loading Model in "dysgu run" HOT 1
- problems genotyping, dysgu run --sites HOT 3
- clarification needed on RG and samples HOT 4
- Getting SV length in dysgu output vcf HOT 3
- _pickle.UnpicklingError: invalid load key, 'A'. Failed to read from standard input: unknown file type HOT 2
- Subject: Inquiry on Benchmarking DEL and INS Events with dysgu Pipelines. HOT 35
- TypeError: an integer is required when using --sites option and manta.vcf HOT 6
- When combining a large number of samples, the speed is very slow HOT 13
- When merging a large number of samples, the process is very slow
- Long run time HOT 13
- Parameters for R9 Guppy2, 4, 6 HOT 4
- Process_KILLED HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dysgu.