Comments (3)
Really? Are you really going to argue that you just don't see a duplicate ID gene29892 in those first two lines of your ref.gff quote you're showing here ? :)
While the error message might not be very accurate (there is no actual overlap there, for those two duplicated gene IDs), the problem still stands: the ID gene29892 has obvious transcript properties (parenting CDS and exon features) while it is being declared twice in the file (1st at 74631-74744, 2nd time at 146276-147073). This is not a valid GFF3 format, as my understanding is that the feature IDs should be unique.. (though indeed my GFF parser is a bit more lenient about this, accepting non-unique IDs if the features are on separate reference sequences.. which is not the case here).
Then things went downhill ("downstream") from there.. according to the merciless GIGO principle.. I hope you understand, it's hard to somehow automagically fix invalid input reference annotation data, or guess what the authors of those data really meant there -- e.g. that it was somehow a 2-part gene there (?!), so it's OK if they broke the ID uniqueness rule on a whim..
from gffcompare.
Leaving aside the cheeky fun I had with my reply above, it turns out that in fact I was in the wrong there -- not about the duplicate IDs, but about the validity of that annotation, as this is a special case of trans-splicing where actually the current GFF3 specification does allow for the same ID in the case of discontinuous features like trans-splicing and fusions.
So please accept my belated apologies for the incorrect/incomplete answer -- and thanks for this trans-splicing example! I ran into this old closed issue while looking for trans-splicing examples so I can add trans-splicing support to my GFF parser (and thus to gffread, gffcompare etc.).
from gffcompare.
Hello Geo!
Glad to hear that. It's good that we are coming across such scenarios and I am happy that the project is on constant development. Taking it on a positive note.
Your hard work is much appreciated!
Regards
Vijay
from gffcompare.
Related Issues (20)
- Availability of new version on Conda and biocontainers
- gff file and the XLOC numbers are not ordered correctly
- Extract transposon HOT 1
- Missing tmap and refmap files in query gff directory + missing statistics in stats file
- How to keep the CDS info for each transcript? HOT 2
- Transcript classification codes ? HOT 1
- class_code 'u' in all transcripts HOT 1
- Merging different gtf files
- How is class code X is identified in stranded data?
- There is no CDS information in the generated file๏ผ
- add option to assemble transfrags in the combined.gtf output HOT 1
- A question on gffcompare
- -p should rename transcripts even when running in "annotation mode" (single input file, no merge options)
- all input transcript attributes and features should be preserved in annotation mode
- No sensitivity or precision stats reported if multiple gffs given as input
- Number of loci and mRNAs in reference keeps changing HOT 1
- GFFCompare ".tracking" file not containing all transcript IDs of all the provided annotation files
- "Warning: merging adjacent/overlapping segments" when using GFFCompare
- How can one visualize GFF files coverage, FPKM ; TPM values in a genome browser?
- missing stringTie ID in gffcomapre tmap file
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gffcompare.