MitoZ version? 3.4 How did you install MitoZ? (

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

Incorrect stop codon about mitoz HOT 8 OPEN

dleopold commented on July 24, 2024

Incorrect stop codon

from mitoz.

Comments (8)

linzhi2013 commented on July 24, 2024

Hi dleopold,

Thank you very much for reporting the issue. I will check it when I have more time.

I agree with your point "It would seem more appropriate to stop at the same position as the reference if possible, particularly if the reference is highly similar" if the annotation of reference is correct (but in most cases, we don't know the truth).

MitoZ checks the initial CDS ranges from the file DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.cds.position (https://github.com/linzhi2013/MitoZ/wiki/Tutorial#3-the-annotation-step), and tries to find the stop codons or T/TA endings.

So, we can look at the file DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.cds.position to confirm that the annotation hits the reference mitogenome (NC_030597). If it hits another reference, we can understand why it happened like now.

Secondly, if it hits the reference genome (NC_030597), the gene length information might start to have an effect on the annotation process (the COX2 on NC_030597 might be too short). You can compare the COX2 on NC_030597 with the COX2 of other closely related species, do a multiple sequence alignment (codon-based version), and see if all species end at the same position.

from mitoz.

dleopold commented on July 24, 2024

In this case, the "...position.cds" file shows that the sequence being annotated is hitting itself in the reference database for COX2. Here is the relevant part of the file:

>gi_NC_030597_COX2_Sargocentron_spiniferum_230_aa_Ref1:230aa	[mRNA]	locus=NC_030597.1:7206:7895:+
ATGGCACATCCCTCTCAACTAGGATTCCAAGATGCGGCTTCACCCGTTAT
AGAAGAGCTCCTTCACTTCCACGACCACGCTTTAATAATCGTCTTTCTAA
TTAGCACACTAGTTCTTTACATTATTGTGGCGATAGTCTCCACTAAACTA
ACCAACAAATATATCCTCGACTCCCAAGAAATCGAAATTATCTGAACAGT
ACTCCCTGCAGTAATTCTTATCCTAATTGCCCTCCCCTCACTACGAATTC
TTTATCTTATGGATGAAATTAATGACCCACACCTAACTATTAAAGCAATA
GGACACCAATGATACTGAAGCTACGAATATACTGATTACGAGGATCTTGG
CTTCGACTCTTATATAATTCCTACCCAAGACCTTACCCCAGGACAATTCC
GCCTCCTAGAAGCAGACCATCGAATAGTTATCCCAATTGAATCCCCTATT
CGTGTTCTAGTCTCAGCCGAAGACGTCCTACACTCATGAGCAGTTCCAGC
ACTAGGCGTTAAAATAGACGCAGTGCCTGGCCGACTAAACCAAACAGCCT
TTATTACATCCCGCCCAGGTGTATTCTACGGTCAATGCTCCGAAATCTGC
GGCGCAAACCACAGCTTTATACCCATCGTCGTTGAAGCTGTCCCACTAGA
ACACTTTGAAAACTGATCCTCTATAATACTTGAAGACGCT

However, although the alignment occurs at positions 7206:7895, the final annotation of the gene is 7206:7916, extending 19bp into the tRNA-Lys despite a T/TA in the 'correct' stop position. As far as I can tell this is not due to an issue with length - I downloaded mitogenomes for 4 similar taxa and they all align very well (codon or nucleotide based alignment) and end at the same T or TA. They are also all the same length, only differing by 1 bp if the stop codon was identified as T or TA.

I do realize that we often do not know the "correct" position when working with novel taxa, but I chose this Ref Seq mitogenome as a reproducible example of something that I am seeing quite frequently in my data (~800 fish mitogenomes). To me, this seems like a clear case where the stop position in the original annotation should be properly re-identified by MitoZ. Perhaps I am missing something about how length enters into the delineation of the start/stop positions? I did not see anything about that in the original paper or documentation, so I am just trying to understand why the annotation is incorrect.

from mitoz.

linzhi2013 commented on July 24, 2024

Could you please tell me the accessions for the "4 similar taxa"? thanks! I would like to check them further.

from mitoz.

linzhi2013 commented on July 24, 2024

When I check this COX2 of NC_030597.1, I do find a standard stop codon after the extension (the first line below). We can check (at the protein level) if this is also true for the other 4 similar taxa. If all taxa show a similar pattern (conserved proteins at the extended region), then we can say probably that the NCBI Ref annotation for this gene is wrong.

from mitoz.

linzhi2013 commented on July 24, 2024

from mitoz.

linzhi2013 commented on July 24, 2024

Perhaps in the future, we can add an option to ask MitoZ not to do such a greedy extension if the users don't want, to make the annotation result more similar to the reference gene (although it's not necessary to be correct).

from mitoz.

dleopold commented on July 24, 2024

It looks like you compared the results with a similar set of reference sequences as the ones I used. Thank you for looking into it. I don't have any additional data to determine which is the 'correct' annotation. However, it appears that for this gene, in the Holocentridae, the NCBI annotations are fairly consistent, all ending before the neighboring tRNA. This is also consistent with annotations produced by other annotation software (e.g. Mitos2, MitoAnnotator). I have also had many of my NCBI submissions rejected when the annotation differs from all of the annotations of currently accepted submissions in this way (for this and other taxonomic groups / genes). So, whether or not there is uncertainty about which is correct, there does seem to be a prevailing consensus.

from mitoz.

linzhi2013 commented on July 24, 2024

Thanks for your information.

Maybe we can try to first determine the boundaries of tRNA genes and then set some constraints on the boundaries of PCGs, to avoid the overlapping of PCGs and tRNAs. (However, I remember that at least of some PCGs of some clades like ATP6 and ATP8 genes, the overlapping of genes happens, right?)

But anyway, if the goal of studies is for phylogenetic analysis using PCGs, or gene order rearrangement analysis, such kind of issues don't matter at all as far as I can see.

from mitoz.

Incorrect stop codon about mitoz HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent