Hi! Thank you for creating this excellent tool. I have a questio

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

About read_gff3 and intron visualization about gggenomes HOT 4 CLOSED

thackl commented on May 26, 2024

About read_gff3 and intron visualization

from gggenomes.

Comments (4)

thackl commented on May 26, 2024 2

@xvtyzn just wanted to say thanks again for reporting this issue. read_gff3() now by defaults checks and if warranted sorts exon/CDS coordinates for each gene in a file based on start coordinates.

Also, your data made me aware of an issue with reading gffs produced by Augustus. The CDS annotations in those files don't follow proper gff format, and therefore are not properly read/displayed - you probably noticed that as well. I might try to find a workaround for that in the future. See #65 for more info

Cheers
Thomas

from gggenomes.

thackl commented on May 26, 2024 1

Hi Keigo,

thanks a lot for bringing this to our attention. The intron/exon support is still quite experimental and getting feedback like this really helps! And great job on spotting the issue in the code.

Your fix is a nice one! Although, I think that ideally read_gff() should not get the order of exons wrong in the first place, i.e. read them as they are given in the gff file. Then resorting would not be necessary. I'm currently on holidays. I'll look into this once I'm back!

from gggenomes.

xvtyzn commented on May 26, 2024

I solved the cause of this error.
It is caused by mrna_exon_introns in the read_gff3 function.

mrna_exon_introns <- filter(x, type == "exon") %>% select(exon_id = feat_id, 
        start, end, feat_id = parent_ids) %>% unchop(feat_id) %>% 
        group_by(feat_id) %>% summarize(introns = list(coords2introns(start, end)))

# A tibble: 14 x 4
# Groups:   feat_id [1]
   exon_id                     start     end feat_id           
   <chr>                       <int>   <int> <chr>             
 1 aten_0.1.m1.802.m1.exon1  9916243 9919080 aten_0.1.m1.802.m1
 2 aten_0.1.m1.802.m1.exon10 9930781 9931032 aten_0.1.m1.802.m1
 3 aten_0.1.m1.802.m1.exon11 9932680 9932805 aten_0.1.m1.802.m1
 4 aten_0.1.m1.802.m1.exon12 9935662 9935778 aten_0.1.m1.802.m1
 5 aten_0.1.m1.802.m1.exon13 9935983 9936081 aten_0.1.m1.802.m1
 6 aten_0.1.m1.802.m1.exon14 9938879 9939091 aten_0.1.m1.802.m1
 7 aten_0.1.m1.802.m1.exon2  9920151 9920639 aten_0.1.m1.802.m1
 8 aten_0.1.m1.802.m1.exon3  9921529 9921776 aten_0.1.m1.802.m1
 9 aten_0.1.m1.802.m1.exon4  9922434 9922598 aten_0.1.m1.802.m1
10 aten_0.1.m1.802.m1.exon5  9924117 9924315 aten_0.1.m1.802.m1
11 aten_0.1.m1.802.m1.exon6  9924592 9924801 aten_0.1.m1.802.m1
12 aten_0.1.m1.802.m1.exon7  9927854 9927979 aten_0.1.m1.802.m1
13 aten_0.1.m1.802.m1.exon8  9928552 9928674 aten_0.1.m1.802.m1
14 aten_0.1.m1.802.m1.exon9  9930577 9930693 aten_0.1.m1.802.m1

The problem can be solved by sorting the start column for this exon tibble using arrange function.

mrna_exon_introns <- filter(x, type == "exon") %>% select(exon_id = feat_id, 
        start, end, feat_id = parent_ids) %>% unchop(feat_id) %>% 
        group_by(feat_id) %>% arrange(start) %>% summarize(introns = list(coords2introns(start, end)))

# A tibble: 14 x 4
# Groups:   feat_id [1]
   exon_id                     start     end feat_id           
   <chr>                       <int>   <int> <chr>             
 1 aten_0.1.m1.802.m1.exon1  9916243 9919080 aten_0.1.m1.802.m1
 2 aten_0.1.m1.802.m1.exon2  9920151 9920639 aten_0.1.m1.802.m1
 3 aten_0.1.m1.802.m1.exon3  9921529 9921776 aten_0.1.m1.802.m1
 4 aten_0.1.m1.802.m1.exon4  9922434 9922598 aten_0.1.m1.802.m1
 5 aten_0.1.m1.802.m1.exon5  9924117 9924315 aten_0.1.m1.802.m1
 6 aten_0.1.m1.802.m1.exon6  9924592 9924801 aten_0.1.m1.802.m1
 7 aten_0.1.m1.802.m1.exon7  9927854 9927979 aten_0.1.m1.802.m1
 8 aten_0.1.m1.802.m1.exon8  9928552 9928674 aten_0.1.m1.802.m1
 9 aten_0.1.m1.802.m1.exon9  9930577 9930693 aten_0.1.m1.802.m1
10 aten_0.1.m1.802.m1.exon10 9930781 9931032 aten_0.1.m1.802.m1
11 aten_0.1.m1.802.m1.exon11 9932680 9932805 aten_0.1.m1.802.m1
12 aten_0.1.m1.802.m1.exon12 9935662 9935778 aten_0.1.m1.802.m1
13 aten_0.1.m1.802.m1.exon13 9935983 9936081 aten_0.1.m1.802.m1
14 aten_0.1.m1.802.m1.exon14 9938879 9939091 aten_0.1.m1.802.m1

Thanks!

Keigo

from gggenomes.

xvtyzn commented on May 26, 2024

Thank you very much!

I really like gggenomes package, and I'm willing to help test it. Have a nice holiday!

from gggenomes.

About read_gff3 and intron visualization about gggenomes HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent