brennanpincardiff / drawproteins Goto Github PK
View Code? Open in Web Editor NEWCreating package to draw proteins from Uniprot API
License: Other
Creating package to draw proteins from Uniprot API
License: Other
Hi,
Thank you for the great package!
I wonder if there is a way to draw multiple isoforms under a single UniProt ID. For example, there are three human NFkB isoforms annotated from the gene: P19838-1, P19838-2, and P19838-3. Can I draw them together to compare? It would be very helpful if I could do this.
Best,
Hi. Im using drawProteins to plot the domains of various genes. As a test i followed the script of this page:
http://rforbiochemists.blogspot.com/2018/02/my-drawproteins-package-has-been.html
This worked perfectly well.
When trying to draw the chains for the gene MARCH5 (https://www.uniprot.org/uniprot/Q9NX47#family_and_domains) Im proceeding as follows:
`
prot_data <- drawProteins::get_features("Q9NX47")
# produce data frame
prot_data <- drawProteins::feature_to_dataframe(prot_data)
# make protein schematic
p <- draw_canvas(prot_data)
p <- draw_chains(p, prot_data)
p <- draw_domains(p, prot_data)
`
I get a grey chain with no domains (see attached picture).
Although when checking prot_data
I can see that there are domain features listed:
> prot
type
featuresTemp CHAIN
featuresTemp.1 TRANSMEM
featuresTemp.2 TRANSMEM
featuresTemp.3 TRANSMEM
featuresTemp.4 TRANSMEM
featuresTemp.5 ZN_FING
featuresTemp.6 MUTAGEN
featuresTemp.7 MUTAGEN
featuresTemp.8 MUTAGEN
description
featuresTemp
E3 ubiquitin-protein ligase MARCH5
featuresTemp.1
Helical
featuresTemp.2
Helical
featuresTemp.3
Helical
featuresTemp.4
Helical
featuresTemp.5
RING-CH-type
featuresTemp.6 Loss of ubiquitin ligase activity, formation of highly interconnected mitochondria, chan
ge in mitochondria morphology that in turns triggers senescence, and perinuclear accumulation
featuresTemp.7 Loss of E3 ubiquitin ligase activity. Formation
of highly interconnected mitochondria and perinuclear accumulation; when associated with S-68
featuresTemp.8 Loss of E3 ubiquitin ligase activity. Formation
of highly interconnected mitochondria and perinuclear accumulation; when associated with S-65
begin end length accession entryName taxid order
featuresTemp 1 278 277 Q9NX47 MARH5_HUMAN 9606 1
featuresTemp.1 99 119 20 Q9NX47 MARH5_HUMAN 9606 1
featuresTemp.2 139 159 20 Q9NX47 MARH5_HUMAN 9606 1
featuresTemp.3 209 229 20 Q9NX47 MARH5_HUMAN 9606 1
featuresTemp.4 238 258 20 Q9NX47 MARH5_HUMAN 9606 1
featuresTemp.5 6 75 69 Q9NX47 MARH5_HUMAN 9606 1
featuresTemp.6 43 43 0 Q9NX47 MARH5_HUMAN 9606 1
featuresTemp.7 65 65 0 Q9NX47 MARH5_HUMAN 9606 1
featuresTemp.8 68 68 0 Q9NX47 MARH5_HUMAN 9606 1
Did I do sth wrong or is there simply not more data to be retrieved?
Thanks in advance
`Running under: Ubuntu 14.04.5 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_DK.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_DK.UTF-8 LC_COLLATE=en_DK.UTF-8
[5] LC_MONETARY=en_DK.UTF-8 LC_MESSAGES=en_DK.UTF-8
[7] LC_PAPER=en_DK.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils
[7] datasets methods base
other attached packages:
[1] ggplot2_3.1.0 drawProteins_1.2.0
[3] ensembldb_2.6.3 AnnotationFilter_1.6.0
[5] GenomicFeatures_1.34.1 AnnotationDbi_1.44.0
[7] Biobase_2.41.2 GenomicRanges_1.32.4
[9] GenomeInfoDb_1.18.1 IRanges_2.15.16
[11] S4Vectors_0.19.19 BiocGenerics_0.28.0
[13] nvimcom_0.9-58
loaded via a namespace (and not attached):
[1] Rcpp_0.12.18 lattice_0.20-38
[3] prettyunits_1.0.2 Rsamtools_1.33.3
[5] Biostrings_2.49.0 assertthat_0.2.0
[7] digest_0.6.15 R6_2.3.0
[9] plyr_1.8.4 RSQLite_2.1.1
[11] httr_1.3.1 pillar_1.3.0
[13] zlibbioc_1.28.0 rlang_0.3.0.1
[15] progress_1.2.0 lazyeval_0.2.1
[17] curl_3.2 blob_1.1.1
[19] Matrix_1.2-15 labeling_0.3
[21] BiocParallel_1.15.8 stringr_1.3.1
[23] ProtGenerics_1.14.0 RCurl_1.95-4.11
[25] bit_1.1-14 biomaRt_2.38.0
[27] munsell_0.5.0 DelayedArray_0.7.21
[29] compiler_3.5.1 rtracklayer_1.41.3
[31] pkgconfig_2.0.2 tidyselect_0.2.4
[33] SummarizedExperiment_1.10.1 tibble_1.4.2
[35] GenomeInfoDbData_1.2.0 matrixStats_0.54.0
[37] XML_3.98-1.12 withr_2.1.2
[39] crayon_1.3.4 dplyr_0.7.7
[41] GenomicAlignments_1.16.0 bitops_1.0-6
[43] grid_3.5.1 jsonlite_1.5
[45] gtable_0.2.0 DBI_1.0.0
[47] magrittr_1.5 scales_1.0.0
[49] stringi_1.2.4 XVector_0.21.3
[51] bindrcpp_0.2.2 tools_3.5.1
[53] bit64_0.9-7 glue_1.3.0
[55] purrr_0.2.5 hms_0.4.2
[57] colorspace_1.3-2 memoise_1.1.0
[59] bindr_0.1.1 `
Hello there,
Today I started to learn how to use this great package, this is going to be very useful for at least two publications I am trying to wrap up :)
I am interested to use these schematics to show the effect of some alternative splicing events into domains. Do you know of a n easy way of plotting the exons into the proteins? I think I should be able to extract this information manually and then using the draw_domains
function. However, it would be handy if you know of more smooth way to do automatically extract this information.
Any guide or help on this would be very well appreciated!
Thanks
Hi, I was wondering if it is possible to get a description for the legend. In the case of drawing all features of the protein it is unclear what the legend is actually describing. as in, is it a motif, or a repeat...etc?
As discussed @CardiffRUG, there must be a way to export an SVG from ggplot and then set up a unit test to compare the objects.
This shouldn't be too difficult but will really help plotting control.
I think it's a normal function in ggplot2...
This can be done manually I think - maybe we need a use case....
Hey, Great package!
i am very, very new to R
Would it be possible to use drawprotiens, to create the protein schematic and put it directly below an X axis, containing amino acid number, and have Y as something else? (i.e FATHMM scores?)
Figure from: Millán Ortiz, Nicolas Guex, Etienne Patin, Olivier Martin, Ioannis Xenarios, Angela Ciuffi, Lluís Quintana-Murci, Amalio Telenti, Evolutionary Trajectories of Primate Genes Involved in HIV Pathogenesis, Molecular Biology and Evolution, Volume 26, Issue 12, December 2009, Pages 2865–2875, https://doi.org/10.1093/molbev/msp197
My thought process was -
Draw the protien
set to y= -(n, where n looks the best!)
Then plot my actual Y values
I´m trying to show 7 proteins of different CoVs, 4 of these proteins (red arrows) show only part of the protein S2 (subunit 2).
I don´t understand why this happens. I reviewed if the protein in Uniprot is complete and it looks like that all is fine.
Here is my script:
data <- drawProteins::get_features("P0DTC2 A0A6B9WHD3 A0A6M3G9R1 P59594 E0XIZ3 K9N5Q8 U5NJG5")
p <- draw_canvas(real_data)
p <- draw_chains(p, real_data, label_size = 2.5)
p <- draw_domains(p, real_data, show.legend = F, label_domains = F)
p <- draw_motif(p, real_data)
p <- p + theme_bw(base_size = 10) +
theme(panel.grid.minor = element_blank(),
panel.grid.major = element_blank()) +
theme(axis.ticks = element_blank(),
axis.text.y = element_blank()) +
theme(panel.border = element_blank())
So, clearly, there is a issue in these protein draws, that only shows the S2.
There is a way to force the code to show the complete protein?
Here is the plot
Hello,
I have installed the software following the instruction on the github page. It can read, however, despite the multiple installations I get the same error: error in draw_canvas(rel_data) : could not find function "draw_canvas".
Do you know what can be the problem?
Roberto
I'm trying to plot a figure similar to Fig. 1A at https://academic.oup.com/view-large/figure/77823485/molbiolevolmss081f01_3c.jpeg.
How can I plot the chain labels on the top left of for each chain? Maybe it could also be useful to add additional labels (e.g. protein size) to add additional information.
Best,
Jan
By default drawProteins appears to use some sort of rainbow colour scheme which become hard to read when there are more than 5 feature types/categories displayed in the legend.
I think it would be great to have more control of the legend colours for protein features.
Best,
Jan
Sometimes the info in the description or other names are more useful.
I need to add a way to include/allow this.
While drawing receptors, if we want to draw type I and type II receptors on the same image, we need to be able to flip some of the chains. This is interesting and creates some challenges.
Might be better to plot separately and combine the plots... Can we keep the scale?
It's possible to get Uniprot Accession numbers out of BiomaRt. Writing a vignette or at least a script to demonstrate this would be very helpful.
I was trying to figure out how to vertically flip domain labels for a large protein.
The issue was solved by modifying draw_domains function. Switched to 'geom_text' layer on top of 'geom_rect' layer (latter serves as a box for the text). The original function uses geom_label layer instead.
Also added scale_fill_npg() function from 'ggsci' package which is totally optional. Dropping this here in case it is useful for others.
library(ggsci)
draw_domains <- function(p,
data = data,
label_domains = TRUE,
label_size = 4,
show.legend = TRUE,
type = "DOMAIN"){
begin=end=description=NULL
p <- p + ggplot2::geom_rect(data= data[data$type == type,],
mapping=ggplot2::aes(xmin=begin,
xmax=end,
ymin=order-0.25,
ymax=order+0.25,
fill=description),
show.legend = show.legend) + scale_fill_npg()
if(label_domains == TRUE){
## xmin, xmax values can be optimized for better visualization
p <- p + geom_rect(data = data[data$type == type, ],
ggplot2::aes(xmin = (begin + (end-begin)/2 - 40), xmax = (begin + (end-begin)/2 + 50),
ymin = (order - 0.05), ymax = (order + 0.05)), fill = "grey80", alpha = 0.75) +
ggplot2::geom_text(data = data[data$type == type, ],
ggplot2::aes(x = begin + (end-begin)/2, y = order, label = description),
size = label_size, angle = 90)
}
return(p)
}
It would be good to have a way to draw receptors aligned by their transmembrane proteins. Could just rotate the image but that's crude. A proper way needs to be conceived.
very useful package!
I have a similar issue as described at #13 (comment) trying to find the best solution to plot types currently not supported by any of the draw_*
function.
I'm trying to draw schematics for multiple proteins and I'm currently looking for the best way to draw coiled coil domains (prot_data$type == "COILED"
) and compositional bias regions (prot_data$type == "COMPBIAS"
).
My prot_data
frame looks as follows:
> my.prot_data
type description begin end length accession entryName taxid order
1 CHAIN PF3D7_0530300 1 1446 1445 C0H4G8 C0H4G8_PLAF7 36329 1
2 TRANSMEM Helical 20 39 19 C0H4G8 C0H4G8_PLAF7 36329 1
3 TRANSMEM Helical 91 115 24 C0H4G8 C0H4G8_PLAF7 36329 1
4 TRANSMEM Helical 1422 1441 19 C0H4G8 C0H4G8_PLAF7 36329 1
5 REGION Disordered 568 599 31 C0H4G8 C0H4G8_PLAF7 36329 1
6 REGION Disordered 611 648 37 C0H4G8 C0H4G8_PLAF7 36329 1
7 COILED NONE 328 348 20 C0H4G8 C0H4G8_PLAF7 36329 1
8 TRANSMEM Helical 779 805 26 C0H4G8 C0H4G8_PLAF7 36329 1
9 TRANSMEM Helical 857 880 23 C0H4G8 C0H4G8_PLAF7 36329 1
10 TRANSMEM Helical 886 906 20 C0H4G8 C0H4G8_PLAF7 36329 1
11 TRANSMEM Helical 1252 1272 20 C0H4G8 C0H4G8_PLAF7 36329 1
12 TRANSMEM Helical 1292 1314 22 C0H4G8 C0H4G8_PLAF7 36329 1
13 TRANSMEM Helical 1326 1343 17 C0H4G8 C0H4G8_PLAF7 36329 1
14 TRANSMEM Helical 1363 1381 18 C0H4G8 C0H4G8_PLAF7 36329 1
15 TRANSMEM Helical 1393 1416 23 C0H4G8 C0H4G8_PLAF7 36329 1
16 CHAIN PF3D7_0415800 1 875 874 Q8I1S9 Q8I1S9_PLAF7 36329 2
17 REGION Disordered 560 611 51 Q8I1S9 Q8I1S9_PLAF7 36329 2
18 COMPBIAS Polar 560 599 39 Q8I1S9 Q8I1S9_PLAF7 36329 2
19 DOMAIN RING-type 79 117 38 Q8I1S9 Q8I1S9_PLAF7 36329 2
20 CHAIN PF3D7_0508900 1 3134 3133 Q8I414 Q8I414_PLAF7 36329 3
21 COILED NONE 3073 3093 20 Q8I414 Q8I414_PLAF7 36329 3
22 COMPBIAS Polar 728 745 17 Q8I414 Q8I414_PLAF7 36329 3
23 COMPBIAS Polyampholyte 746 794 48 Q8I414 Q8I414_PLAF7 36329 3
24 COMPBIAS Polyampholyte 931 954 23 Q8I414 Q8I414_PLAF7 36329 3
25 COMPBIAS Polyampholyte 1739 1759 20 Q8I414 Q8I414_PLAF7 36329 3
26 COMPBIAS Polar 1760 1799 39 Q8I414 Q8I414_PLAF7 36329 3
27 COMPBIAS Acidic 2487 2771 284 Q8I414 Q8I414_PLAF7 36329 3
28 REGION Disordered 817 844 27 Q8I414 Q8I414_PLAF7 36329 3
29 REGION Disordered 931 965 34 Q8I414 Q8I414_PLAF7 36329 3
30 REGION Disordered 1739 1801 62 Q8I414 Q8I414_PLAF7 36329 3
31 REGION Disordered 2335 2371 36 Q8I414 Q8I414_PLAF7 36329 3
32 REGION Disordered 2476 2771 295 Q8I414 Q8I414_PLAF7 36329 3
33 COILED NONE 660 680 20 Q8I414 Q8I414_PLAF7 36329 3
34 COILED NONE 862 882 20 Q8I414 Q8I414_PLAF7 36329 3
35 COILED NONE 1520 1540 20 Q8I414 Q8I414_PLAF7 36329 3
36 COILED NONE 2875 2895 20 Q8I414 Q8I414_PLAF7 36329 3
37 REGION Disordered 714 797 83 Q8I414 Q8I414_PLAF7 36329 3
38 CHAIN PF3D7_1229300 1 990 989 Q8I5C6 Q8I5C6_PLAF7 36329 4
39 REGION Disordered 83 106 23 Q8I5C6 Q8I5C6_PLAF7 36329 4
40 REGION Disordered 333 355 22 Q8I5C6 Q8I5C6_PLAF7 36329 4
41 REGION Disordered 429 453 24 Q8I5C6 Q8I5C6_PLAF7 36329 4
42 REGION Disordered 751 771 20 Q8I5C6 Q8I5C6_PLAF7 36329 4
43 COMPBIAS Polyampholyte 38 58 20 Q8I5C6 Q8I5C6_PLAF7 36329 4
44 COMPBIAS Polyampholyte 86 105 19 Q8I5C6 Q8I5C6_PLAF7 36329 4
45 REGION Disordered 38 71 33 Q8I5C6 Q8I5C6_PLAF7 36329 4
46 CHAIN PF3D7_0822900 1 1176 1175 Q8IB63 Q8IB63_PLAF7 36329 5
47 COMPBIAS Acidic 266 372 106 Q8IB63 Q8IB63_PLAF7 36329 5
48 COMPBIAS Polar 373 417 44 Q8IB63 Q8IB63_PLAF7 36329 5
49 REGION Disordered 976 995 19 Q8IB63 Q8IB63_PLAF7 36329 5
50 REGION Disordered 1010 1032 22 Q8IB63 Q8IB63_PLAF7 36329 5
51 COILED NONE 7 30 23 Q8IB63 Q8IB63_PLAF7 36329 5
52 COMPBIAS Basic 55 69 14 Q8IB63 Q8IB63_PLAF7 36329 5
53 COMPBIAS Polyampholyte 70 91 21 Q8IB63 Q8IB63_PLAF7 36329 5
54 COMPBIAS Polar 92 173 81 Q8IB63 Q8IB63_PLAF7 36329 5
55 COMPBIAS Polyampholyte 175 196 21 Q8IB63 Q8IB63_PLAF7 36329 5
56 COMPBIAS Basic 197 214 17 Q8IB63 Q8IB63_PLAF7 36329 5
57 COMPBIAS Polyampholyte 235 257 22 Q8IB63 Q8IB63_PLAF7 36329 5
58 REGION Disordered 53 425 372 Q8IB63 Q8IB63_PLAF7 36329 5
59 CHAIN PF3D7_1318700 1 749 748 Q8IEC9 Q8IEC9_PLAF7 36329 6
60 REGION Disordered 705 749 44 Q8IEC9 Q8IEC9_PLAF7 36329 6
61 COILED NONE 232 259 27 Q8IEC9 Q8IEC9_PLAF7 36329 6
62 COILED NONE 274 332 58 Q8IEC9 Q8IEC9_PLAF7 36329 6
63 COILED NONE 432 466 34 Q8IEC9 Q8IEC9_PLAF7 36329 6
64 COILED NONE 495 515 20 Q8IEC9 Q8IEC9_PLAF7 36329 6
65 COILED NONE 562 600 38 Q8IEC9 Q8IEC9_PLAF7 36329 6
66 COMPBIAS Polar 385 412 27 Q8IEC9 Q8IEC9_PLAF7 36329 6
67 REGION Disordered 385 415 30 Q8IEC9 Q8IEC9_PLAF7 36329 6
68 CHAIN PF3D7_1312800 1 2361 2360 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
69 COILED NONE 1001 1028 27 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
70 REGION Disordered 148 195 47 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
71 COMPBIAS Polyampholyte 61 87 26 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
72 COMPBIAS Polyampholyte 148 185 37 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
73 COMPBIAS Polyampholyte 1242 1315 73 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
74 COMPBIAS Polyampholyte 1646 1685 39 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
75 COMPBIAS Polar 1686 1718 32 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
76 COMPBIAS Polyampholyte 1719 1736 17 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
77 COMPBIAS Polyampholyte 1935 1969 34 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
78 COMPBIAS Acidic 1970 2017 47 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
79 COMPBIAS Polyampholyte 2046 2064 18 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
80 COMPBIAS Polar 2065 2109 44 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
81 COMPBIAS Polyampholyte 2110 2177 67 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
82 COMPBIAS Polar 2178 2194 16 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
83 COMPBIAS Polyampholyte 2195 2245 50 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
84 REGION Disordered 1229 1315 86 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
85 REGION Disordered 1404 1436 32 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
86 REGION Disordered 1638 1753 115 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
87 REGION Disordered 1786 1813 27 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
88 REGION Disordered 1935 2252 317 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
89 REGION Disordered 2341 2361 20 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
90 COILED NONE 282 302 20 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
91 COILED NONE 433 453 20 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
92 REGION Disordered 61 92 31 Q8IEJ4 Q8IEJ4_PLAF7 36329 7
93 CHAIN PF3D7_0308300 1 337 336 O77324 O77324_PLAF7 36329 8
I tried to use the following to draw coiled coil domains (which works):
### add COILED block in blue
p <- p + ggplot2::geom_rect(data = my.prot_data[my.prot_data$type == "COILED",],
mapping=ggplot2::aes(xmin=begin,
xmax=end,
ymin=order-0.2,
ymax=order+0.2),
fill = "blue")
p
Yet, I'm currently not sure what the best way is to add coiled coils to the legend?
Alternatively, I think I could just (manually) define coiled coils as domain types and maybe compositional bias as region type?!
I would be very happy about feedback and suggestions.
Many thanks in advance!
Hello, I found this tool very useful for protein visualization. Will it be possible to include in the graphs information about glycosylation sites based on the information on the Uniprot website?, I am quite new to R, it is very likely someone had already asked this, apologies
Hello. I am very glad to know about this package.
I am working on a way to represent peptide coverage over a protein sequence (i. e. represent peptides identified by MS over the whole protein). I think this could be a very good implementation for this package.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.