karstenslab / microshades Goto Github PK

This repo contains the R microshades package, which contains a color blind accessible color palette with 30 unique colors and functions for applying these colors to microbiome data.

Home Page: https://karstenslab.github.io/microshades

License: Other

R 100.00%

microbiome-data color-palette microshades-cvd-palettes phyloseq cvd r data-visualization

microshades's Introduction

microshades

The microshades R package is designed to provide custom color shading palettes that improve accessibility and data organization. Approximately 300 million people in the world have Color Vision Deficiency (CVD), which is comparable to the most recent estimate of the US population. When creating figures and graphics that use color, it is important to consider that individuals with CVD will interact with this material, and may not perceive all of the information tied to the colors as intended. This package includes carefully crafted palettes that improve CVD accessibility and may be applied to any plot. Additionally, there are functions to apply microshades palettes to microbiome data in a color-oriented organization.

Installation

remotes::install_github("KarstensLab/microshades", dependencies = TRUE)

If you plan to use microshades with microbiome data, you may need to install packages such as phyloseq or speedyseq.

If you experience a lazy-load error please try clearing and restarting your R session.

The shades

Here are the two crafted color palettes, microshades_cvd_palettes and microshades_palettes. Each color palette contains six base colors with five incremental light to dark shades, for a total of 30 available colors per palette type that can be directly applied to any plot.

CVD Visualization

The palettes above provide accessibility to individuals with CVD. Below are visualizations of the palettes under a CVD simulation.

The three main types of CVD

Deuteranope : This is the most common type of CVD, also know as “red-green color blindness”
Protanope : Less common than Deuteranope, described as mutated red pigment and less ability to discriminate colors
Tritanope : Relatively Rare, also known as “blue-yellow color blindness”

microshades_cvd_palettes

The microshades_cvd_palettes contain colors that are universally CVD friendly.

microshades_palettes

The individual microshades_palettes are CVD friendly, and some microshades_palette combinations are universially CVD friendly. When using multiple microshades_palettes carefully consider color palette choices, since they may not be universally accessible.

To learn more about the different functions and shades in microshades, please visit the reference section of our website.

Phyloseq Combatibility

In addition to color palettes, this package can also be used in conjunction with the R package phyloseq to enhance microbiome data visualization. The microshades functions create stacked bar plots from phyloseq objects, color organized by a data-driven hierarchy. The accessibility and advanced color organization features described help data reviewers and consumers notice visual patterns and trends easier.

For detailed tutorials on how to use microshades function with phyloseq objects, please review the vignettes provided:

Below is an example of a plot generated with microshades on Curated Metagenomic Data of the Human Microbiome. On the left is the original stacked barplot made using phyloseq. On the right are two barplot of the same data, with microshades palettes and functions applied. microshades uses coloring to correspond with taxonomic group and subgroup levels.

In this example, the phylum and genus information are explored. Darker shades indicate the most abundant genera for each phylum, and lighter shades are less abundant. Users can additionally reorder the samples based on a specified taxonomic rank and name, or reorder the phylum groups.

Apply the microshades palette to non-microbiome data

Users can apply a microshades palette color to a plot using scale_fill_manual() to set the custom colors and microshades_palette() to select a desired palette.

Please refer to the following examples with microshades on non-microbiome:

The example below uses the palmerpenguins dataset to show how to apply the color palettes to non-microbiome data. This first plot contains the default colors from ggplot and is colored by species. The second plot uses microshades colors for a combination variable of penguin species and year of data collection. This organization can help data reviewers notice trends easier with more variables visible in the data.

microshades's People

Contributors

Stargazers

Watchers

Forkers

jpgourdine erictleung katefu satopan sterrettjd gandalab marynjerey anashen jiaweishn turneraya

microshades's Issues

re: no Microshades hexagon logo

Hello !
I was noticing there is a banner header but no R hexagon logo. Can I design one for this package ? I will adhere strictly to the colours used in the banner.
I have made a logo for the Terra package. I will do it for free, need more examples for my portfolio. Please and thank you.

custom legend configuration

Create a ggplot custom microshades legend that uses The group name title and subgroup section of colors

Readme vignette links

The readme file links to the vignettes do not work. Check to see if they work once github pages is activated

reorder_samples_by() group level

Currently function allows for samples to be reordered by a particular subgroup. Add feature so they could be organized by subgroup or group level.

flip stack orientation

flip stack abundance so most abundant is at the top

TreeSummarizedExperiment support

Thanks for the great work.

TreeSummarizedExperiment is a new data container for microbiome data in Bioconductor. Also the curatedMetagenomicData used in the examples is providing the data in this format.

-> Would be great to have support for TreeSummarizedExperiment in addition to phyloseq.

Error with plot_microshades() with R 4.3.0

Hi,
Thank you for developing this very nice package.

I have used microshades with R version 4.1 without problems, but after updating to R 4.3.0, the plot_microshades() function fails with the following message:

Error in is.na(cdf$hex) || is.na(cdf$group) :
'length = 29' in coercion to 'logical(1)'

I found by googling that this might be due to changes in R:

https://stackoverflow.com/questions/72848442/r-warning-lengthx-2-1-in-coercion-to-logical1
https://cran.r-project.org/doc/manuals/r-devel/NEWS.html

Would it be possible to update the code to work with the newer R versions?

Thank you in advance.

Reference Links

@KarstensLab Review the reference links for the datasets in each vignette. Currently there is a github link, and I also added a secondary bioconductor link. Please determine what link is optimal to use and/or if there is a different link that should be used.

Reduce parameters

Reduce parameters in match_color_df() to determine selected_groups from mdf_group.

Issue when generating a color object

Hi,

Thanks for the useful tool! However, issue is that when I am trying to generate a color object using the code:

color_objs_GP <- create_color_dfs(mdf_prep,selected_groups = c("Verrucomicrobia", "Proteobacteria", "Actinobacteria", "Bacteroidetes", "Firmicutes") , cvd = TRUE)

I got the message:

Error in create_color_dfs(mdf_prep, selected_groups = c("Verrucomicrobia", :
some 'selected_groups' do not exist in the dataset. Consider SILVA 138 c('Proteobacteria', 'Actinobacteriota', 'Bacteroidota', 'Firmicutes')

Any help would be appreciate :)

Jesus

Custom legend below plot

Hi! Thank you for developing this very useful package!
I was wondering if it is possible to get the custom legend displayed at the bottom of the plot. I tried to play around with the custom_legend function, but couldn't manage to get a working function. Is it a command one could use to choose how to split the rows and columns of the custom_legend?

Again, thank you for the package! The color combinations and visual display ia so beautiful!

color_reassign()

create color_reassign function to change the color assignment of the groups

Install bug/fix

remotes::install_github("KarstensLab/microshades")

in the install instructions in README doesn't work consistently - suggest to change to...

remotes::install_github("KarstensLab/microshades", dependencies=TRUE)

[BUG/Version Control] `fct_explicit_na()` was deprecated in forcats 1.0.0.

Hello!

Just to let you know about a function change in one of your dependencies!

Warning message: There was 1 warning in mutate(). ℹ In argument: Genus = fct_explicit_na(Genus, "Unknown"). Caused by warning: ! fct_explicit_na()was deprecated in forcats 1.0.0. ℹ Please usefct_na_value_to_level()instead. ℹ The deprecated feature was likely used in the microshades package. Please report the issue to the authors. This warning is displayed once every 8 hours. Calllifecycle::last_lifecycle_warnings()to see where this warning was generated.

Best,

Erfan

Add to CRAN & conda?

These are nice looking palettes! Would you consider submitting the package to CRAN and, once accepted by CRAN, submitting a corresponding recipe to conda-forge? This will allow users to install your package with conda in reproducible projects.

microshades GP example

Examine different sample types as groups (Soil and Sediment) and run microshades on each grouping to show why it might be favorable to run

inconsistent bar width

Hello,

Thanks for this cool package! I'm wondering if you have any suggestions on inconsistent section widths within bars in the plots I've been creating. As you can see in the example posted below, the coloured sections in each bar/sample are sometimes different widths to the other sections/colours in the same bar (particularly obvious in the top end of the graph). Any ideas why this might be happening?

Here is the plotting code used

plot_tads_T_t2 <- plot_microshades(mdf_tads_neworder_T_t2, cdf_tads_neworder_T_t2)
tads_legend_T_t2 <- custom_legend(mdf_tads_neworder_T_t2, cdf_tads_neworder_T_t2, group_level = "Class", subgroup_level = "Genus", legend_key_size = 0.5, legend_text_size = 9)

plot_tads_T1 <- plot_tads_T_t2 + scale_y_continuous(labels = scales::percent, expansion(0)) + theme(legend.position = "none", panel.background = element_blank(), axis.title = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank()) + ggtitle("Treated") + facet_wrap(~ Mother.ID, scales = "free_x", labeller = labeller(Mother.ID = (c("Mother A" = "Sibship A", "Mother B" = "Sibship B", "Mother C" = "Sibship C"))))
plot_tads_T2 <- plot_grid(plot_tads_T1, tads_legend_T_t2)
plot_tads_T2

sink_abundant_groups not working

Hello,

Thanks so much for developing this fantastic package. I'm trying to plot some of my own data following the Human Microbiome Project tutorial. I'm successfully able to reorder samples by abundance of, for example, Flavobacteriaceae, but sink_abundant_groups=TRUE doesn't seem to be bringing most abundant groups to the bottom (there is no difference between setting it to false vs. setting it to true). Is there something I'm missing? Here is some example code below and the plot output. Thanks!

`new.sample.order <- reorder_samples_by(mdf, cdf, order = "Flavobacteriaceae", group_level = "Phylum", subgroup_level = "Family", sink_abundant_groups = TRUE)

mdf.new.sample.order <-new.sample.order$mdf
cdf.new.sample.order <-new.sample.order$cdf

plot_2 <- plot_microshades(mdf.new.sample.order, cdf.new.sample.order, group_label = "Phylum Family")

plot_2 + scale_y_continuous(labels = scales::percent, expand = expansion(0)) +
theme(legend.key.size = unit(0.2, "cm"), text=element_text(size=10)) +
theme(axis.text.x = element_text(size= 6))`

Relative abundance calculation after tax_glom() gives non-representative relative abundance in figures

Hi microshades team,

I've been noticing a bit of a bug in how relative abundances are calculated for microshades.

Issue description

That is, when using lower levels of taxonomy that might have unclassified taxa, microshades is calculating the relative abundance using only the reads that were classified at that level.   If microshading at the genus level, any taxa that aren't classified at the genus level are thrown out. Because of this, the relative abundances shown in the figure are not representative of the actual relative abundances. This also impacts the relative abundances shown for higher taxonomic levels.

Example

For example, if we had a community with 2 taxa in the family Ruminococcaceae:
 1. Faecalibacterium (50%)
2. Unclassified at the genus level (50%)

Which in phyloseq results in a taxonomy table that has:
 Family Genus 
 Ruminococcaceae Faecalibacterium
Ruminococcaceae <NA>

When this is plotted in microshades with subgroup=“Genus”, it throws out the <NA> values before calculating relative abundance. In this simple example community, then Feacalibacterium would be plotted as 100% taxa present.

Further documentation (easy to see the effects in a real data set)

I’ve attached an example knitted Rmarkdown example, where you can click through the different levels, and you can see major changes in relative abundance based on the subgroup level (pdf at the bottom). The knitted Rmd file also has some proposed workarounds.

Addressing this?

I think this is happening during speedyseq’s tax_glom function (in prep_mdf). If it is from speedyseq, would it be possible to have an argument that disables this behavior? Or could there be some sort of note in the documentation/examples telling users to handle taxa unclassified at lower taxonomic levels prior to using microshades?

Real dataset example

In these figures, particularly reference the "control" microbiomes on the far right. These are contaminated germ free mice, with very low diversity microbiomes, which is why it's so evident. On the family level microshades plot (the first one), it appears that peptostreptococcaceae makes up ~75% of some of the microbiomes. However, the genus level plot shows that they're 90% Turicibacter, which is not in the peptostreptococcaceae family. This is because the main ASV in peptostreptococcaceae was unclassified at the genus level, and it was thrown out for relative abundance calculations, resulting in a misleading relative abundance of Turicibacter presented. You can also see differences in the phylum-level relative abundances of Firmicutes and Bacteroidota in the samples labeled 74. I go into more depth in the knitted Rmd. I've attached it below as a pdf, but it lends itself to better reading (for tabbing through the figures) as html, so I can email that version if you'd like it.

Taxa-bar.pdf

Thanks again for this package! It's helped our lab make some fantastic plots. I just wanted to bring this behavior to your attention, since it confused me for quite a few days :)

Thanks,
John

sink most abundant group taxonomy

Add a flag in reorder samples by or create new function to sink most abundant group after the color obj is created