Comments (12)
Yes, absolutely! My colleague and I have put together a utility to extract splice junctions/exon positions as well as the transcripts that contain them. I'm going to run some tests on it and finalize the details, and then once I'm comfortable all is going as intended, I'll let you know so you can try it out!
from talon.
Hey, we just fixed how long things were taking. It should run MUCH faster now, and you should be able to do so with full gtfs. Let us know if it's working for you!
from talon.
Hi! So to clarify, are you attempting to look at cases where you have novel splice junctions in a known gene, and then see how far they are away from known junctions? I don't think we have an existing formal utility that outputs the splice junctions, but it wouldn't be too difficult to make one!
from talon.
Yes! I am trying to quantify the distance from the reference splice junctions to the novel junctions I'm seeing in my samples, which have splicing aberrations. The mis-splicing can be transcriptome-wide so I am trying to generate an exon-based matrix for my cells. Does that make sense?
from talon.
Hi, I am very happy to find this tool for my analysis. I tried first run today and found a problem. The error message was "SAM transcript xxx lacks an MD tag". My samples were DirectRNA Nanopore-seq mapped by Minimap2.
By the way, will you develop a tool like MISO or rMATs to help detect the change of alternative splicing?
from talon.
Hi iam2b,
You should be able to fix this issue by running Minimap2 with the --MD flag (see issue #45). Currently we are not in the business of developing our own downstream alt splicing tool, but you might consider trying this one https://bioconductor.org/packages/release/bioc/html/IsoformSwitchAnalyzeR.html. The developer has added support for TALON abundance files.
from talon.
Thank you very much. I have sloved this problem.
Merry Chrismas!
from talon.
Hi dewyman,
I wanted to clarify my question a little. The reason I was asking for the distance from the canonical splice junction is that I am trying to identify (and quantify) alternate 3' and 5' splice site usage and thought that the positional information for each junction would be useful since it could be compared with the reference. Thanks for your help and any thoughts/suggestions would be welcome! Hope you're having a good new year.
from talon.
Hi! Don't worry, your question makes total sense. We've been working on a utility to help address your question. It's technically complete and passed our tests, but is running slowly so we were hoping to do a bit more work on it to make it run faster. In the meantime though, you're welcome to try it out:
usage: talon_get_sjs [-h] [--gtf GTF] [--db DB] [--ref REF_GTF] [--mode MODE]
[--outprefix OUTPREFIX]
Extracts the locations, novelty, and transcript assignments of exons/introns
in a TALON database or GTF file. All positions are 1-based.
optional arguments:
-h, --help show this help message and exit
--gtf GTF TALON GTF file from which to extract exons/introns
--db DB TALON database from which to extract exons/introns
--ref REF_GTF GTF reference file (ie GENCODE). Will be used to label
novelty.
--mode MODE Choices are 'intron' or 'exon' (default is 'intron').
Determines whether to include introns or exons in the
output
--outprefix OUTPREFIX
Prefix for output file
As a side note, when you run this script in 'intron' mode, the start/end positions currently include the exon base that flanks the intron on each side.
Another approach you might try for extracting splice junctions from a TALON GTF file would be to use the TranscriptClean utility described here. Outputs from this script follow the STAR splice junction output format, which is described in the STAR manual (section 4.4) here.
I hope this helps, but don't hesitate to reach out if you have more questions!
Best,
Dana
from talon.
Hi Dana,
Thanks for your help. I am trying to run this script and it's either extremely slow or getting stuck. I subsetted my gtf by chromosome and took the smallest one (chrM in my case, with 48 total lines in the gtf) and the script is still running. Is that the expected speed or do you think there is another issue?
here is the code I'm running:
~/talon/talon-4.4.2/python/bin/talon_get_sjs --gtf ${file} --ref ~/gencode.v31.annotation.gtf --mode intron --outprefix intron
from talon.
Thanks for letting us know- we'll look into it some more.
from talon.
The reason it's taking so long with your current command is because your --ref file is the entire annotation. If you want to run just chrM, consider subsetting the reference GTF also.
from talon.
Related Issues (20)
- Isoforms defined by reads with high fraction A (>0.5) HOT 2
- Abundance and fraction_as not showing properly
- TALON seems to be stuck or I have a least no idea what it is doing. HOT 11
- TALON support for CIGAR strings found in pbmm2 sam files
- new release? HOT 2
- TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType HOT 1
- Does TALON database contain the alignments ? HOT 4
- Issue with "Could not retrieve index file" HOT 2
- NameError: name 'vertex_counter' is not defined HOT 6
- Error with installation HOT 1
- Problem with talon_initialize_database HOT 13
- Question - merging TALON databases HOT 1
- Issue with talon filter HOT 4
- internal priming on PCS111 cDNA kit HOT 1
- Antisense after pychopper minimap2. to -uf or not -uf HOT 1
- 'check_database_integrity' error HOT 3
- error when running talon annotator
- What is the meaning of ISM None HOT 2
- Strange error HOT 2
- Multithreading is not working
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from talon.