Comments (8)
Hi dleopold,
Thank you very much for reporting the issue. I will check it when I have more time.
I agree with your point "It would seem more appropriate to stop at the same position as the reference if possible, particularly if the reference is highly similar" if the annotation of reference is correct (but in most cases, we don't know the truth).
MitoZ checks the initial CDS ranges from the file DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.cds.position
(https://github.com/linzhi2013/MitoZ/wiki/Tutorial#3-the-annotation-step), and tries to find the stop codons or T/TA endings.
So, we can look at the file DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.cds.position
to confirm that the annotation hits the reference mitogenome (NC_030597). If it hits another reference, we can understand why it happened like now.
Secondly, if it hits the reference genome (NC_030597), the gene length information might start to have an effect on the annotation process (the COX2 on NC_030597 might be too short). You can compare the COX2 on NC_030597 with the COX2 of other closely related species, do a multiple sequence alignment (codon-based version), and see if all species end at the same position.
from mitoz.
In this case, the "...position.cds" file shows that the sequence being annotated is hitting itself in the reference database for COX2. Here is the relevant part of the file:
>gi_NC_030597_COX2_Sargocentron_spiniferum_230_aa_Ref1:230aa [mRNA] locus=NC_030597.1:7206:7895:+
ATGGCACATCCCTCTCAACTAGGATTCCAAGATGCGGCTTCACCCGTTAT
AGAAGAGCTCCTTCACTTCCACGACCACGCTTTAATAATCGTCTTTCTAA
TTAGCACACTAGTTCTTTACATTATTGTGGCGATAGTCTCCACTAAACTA
ACCAACAAATATATCCTCGACTCCCAAGAAATCGAAATTATCTGAACAGT
ACTCCCTGCAGTAATTCTTATCCTAATTGCCCTCCCCTCACTACGAATTC
TTTATCTTATGGATGAAATTAATGACCCACACCTAACTATTAAAGCAATA
GGACACCAATGATACTGAAGCTACGAATATACTGATTACGAGGATCTTGG
CTTCGACTCTTATATAATTCCTACCCAAGACCTTACCCCAGGACAATTCC
GCCTCCTAGAAGCAGACCATCGAATAGTTATCCCAATTGAATCCCCTATT
CGTGTTCTAGTCTCAGCCGAAGACGTCCTACACTCATGAGCAGTTCCAGC
ACTAGGCGTTAAAATAGACGCAGTGCCTGGCCGACTAAACCAAACAGCCT
TTATTACATCCCGCCCAGGTGTATTCTACGGTCAATGCTCCGAAATCTGC
GGCGCAAACCACAGCTTTATACCCATCGTCGTTGAAGCTGTCCCACTAGA
ACACTTTGAAAACTGATCCTCTATAATACTTGAAGACGCT
However, although the alignment occurs at positions 7206:7895, the final annotation of the gene is 7206:7916, extending 19bp into the tRNA-Lys despite a T/TA in the 'correct' stop position. As far as I can tell this is not due to an issue with length - I downloaded mitogenomes for 4 similar taxa and they all align very well (codon or nucleotide based alignment) and end at the same T or TA. They are also all the same length, only differing by 1 bp if the stop codon was identified as T or TA.
I do realize that we often do not know the "correct" position when working with novel taxa, but I chose this Ref Seq mitogenome as a reproducible example of something that I am seeing quite frequently in my data (~800 fish mitogenomes). To me, this seems like a clear case where the stop position in the original annotation should be properly re-identified by MitoZ. Perhaps I am missing something about how length enters into the delineation of the start/stop positions? I did not see anything about that in the original paper or documentation, so I am just trying to understand why the annotation is incorrect.
from mitoz.
Could you please tell me the accessions for the "4 similar taxa"? thanks! I would like to check them further.
from mitoz.
When I check this COX2 of NC_030597.1, I do find a standard stop codon after the extension (the first line below). We can check (at the protein level) if this is also true for the other 4 similar taxa. If all taxa show a similar pattern (conserved proteins at the extended region), then we can say probably that the NCBI Ref annotation for this gene is wrong.
from mitoz.
from mitoz.
Perhaps in the future, we can add an option to ask MitoZ not to do such a greedy extension if the users don't want, to make the annotation result more similar to the reference gene (although it's not necessary to be correct).
from mitoz.
It looks like you compared the results with a similar set of reference sequences as the ones I used. Thank you for looking into it. I don't have any additional data to determine which is the 'correct' annotation. However, it appears that for this gene, in the Holocentridae, the NCBI annotations are fairly consistent, all ending before the neighboring tRNA. This is also consistent with annotations produced by other annotation software (e.g. Mitos2, MitoAnnotator). I have also had many of my NCBI submissions rejected when the annotation differs from all of the annotations of currently accepted submissions in this way (for this and other taxonomic groups / genes). So, whether or not there is uncertainty about which is correct, there does seem to be a prevailing consensus.
from mitoz.
Thanks for your information.
Maybe we can try to first determine the boundaries of tRNA genes and then set some constraints on the boundaries of PCGs, to avoid the overlapping of PCGs and tRNAs. (However, I remember that at least of some PCGs of some clades like ATP6 and ATP8 genes, the overlapping of genes happens, right?)
But anyway, if the goal of studies is for phylogenetic analysis using PCGs, or gene order rearrangement analysis, such kind of issues don't matter at all as far as I can see.
from mitoz.
Related Issues (20)
- Megahit sync PE fastq HOT 4
- missing Perl modules:GD HOT 3
- Vertebrate mitogenome comes out 10k bp, linear, and unannotated HOT 3
- Annotating shorter than 600 bp control region HOT 6
- 你好,我在使用mitoz时遇到了一些问题。我正在使用的脚本是“mitoz all -- outfix text -- fq1 text -- fq2 text -- data_size_for_mt_assembly 3,0-- assembler megahit -- requiring _taxa”。同一脚本有两对样本,一组成功运行了结果,另一对报告了错误。显示的代码是运行创世纪外壳文件运行基因将结果转换为gff3格式Cat:'/ Csd5057. megait. hmmtblout. besthit. sim. fa. genewise/*/*. genewise ': 不存在那个文件或目录 抱歉,注释没有结果!请问原因是什么 HOT 3
- Issue with tRNA annotation HOT 2
- Error in subcommand annoate in the Singularity image MitoZv3.4. HOT 2
- Problem with annotation HOT 1
- 在select_sequences_by_gene_completeness函数中,运行时出现了错误 HOT 4
- mitoz all: error: the following arguments are required: --fq1, --requiring_taxa HOT 1
- overlap is 0 HOT 9
- Issue with Installation HOT 1
- the result assembled mito genome have more than one contigs HOT 4
- Missing tRNA annotation HOT 8
- MitoZ cannot access input files - Singularity on HPC HOT 4
- A typo in the 'help' section HOT 1
- Nanopore reads HOT 4
- summary.txt empty when no CDS genes annotated HOT 4
- Error during annotation
- Mitogenome sequences identification HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mitoz.