tidyverse / ggplot2 Goto Github PK

View Code? Open in Web Editor NEW

6.4K 303.0 2.0K 1.1 GB

An implementation of the Grammar of Graphics in R

Home Page: https://ggplot2.tidyverse.org

License: Other

R 100.00%

r visualisation data-visualisation

ggplot2's Introduction

ggplot2

Overview

ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

Installation

# The easiest way to get ggplot2 is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just ggplot2:
install.packages("ggplot2")

# Or the development version from GitHub:
# install.packages("pak")
pak::pak("tidyverse/ggplot2")

Cheatsheet

Usage

It’s hard to succinctly describe how ggplot2 works because it embodies a deep philosophy of visualisation. However, in most cases you start with ggplot(), supply a dataset and aesthetic mapping (with aes()). You then add on layers (like geom_point() or geom_histogram()), scales (like scale_colour_brewer()), faceting specifications (like facet_wrap()) and coordinate systems (like coord_flip()).

library(ggplot2)

ggplot(mpg, aes(displ, hwy, colour = class)) + 
  geom_point()

Lifecycle

ggplot2 is now over 10 years old and is used by hundreds of thousands of people to make millions of plots. That means, by-and-large, ggplot2 itself changes relatively little. When we do make changes, they will be generally to add new functions or arguments rather than changing the behaviour of existing functions, and if we do make changes to existing behaviour we will do them for compelling reasons.

If you are looking for innovation, look to ggplot2’s rich ecosystem of extensions. See a community maintained list at https://exts.ggplot2.tidyverse.org/gallery/.

Learning ggplot2

If you are new to ggplot2 you are better off starting with a systematic introduction, rather than trying to learn from reading individual documentation pages. Currently, there are three good places to start:

The Data Visualization and Communication chapters in R for Data Science. R for Data Science is designed to give you a comprehensive introduction to the tidyverse, and these two chapters will get you up to speed with the essentials of ggplot2 as quickly as possible.
If you’d like to take an online course, try Data Visualization in R With ggplot2 by Kara Woo.
If you’d like to follow a webinar, try Plotting Anything with ggplot2 by Thomas Lin Pedersen.
If you want to dive into making common graphics as quickly as possible, I recommend The R Graphics Cookbook by Winston Chang. It provides a set of recipes to solve common graphics problems.

If you’ve mastered the basics and want to learn more, read ggplot2: Elegant Graphics for Data Analysis. It describes the theoretical underpinnings of ggplot2 and shows you how all the pieces fit together. This book helps you understand the theory that underpins ggplot2, and will help you create new types of graphics specifically tailored to your needs.

Getting help

There are two main places to get help with ggplot2:

The RStudio community is a friendly place to ask any questions about ggplot2.
Stack Overflow is a great source of answers to common ggplot2 questions. It is also a great place to get help, once you have created a reproducible example that illustrates your problem.

ggplot2's People

Contributors

Stargazers

Watchers

Forkers

strongh kohske jrandall fabrices briandiggs davidmorrison gsk3 briandk jthetzel wch wligtenberg jiho djmurphy420 kdaily cardillo tyf0n yoni sxfmol jlegewie cscheid charlescara jdiggans ecacarva huftis abresler eshilts jrnold anitameh stefan-pdx jimhester cswanghan exl022 psibre tarekrached alicederyn apontejosea crowding bart1 milktrader joshbode aaronwolen dsonnamaker quincysmiith nickreich brent-dickinson casunlight kairana honglongwu hal2001 jpritikin duqi varnerac thierryo yigu404 thomaskern kornl tdhock dlizcano smc-dta rpietro jcaton31 eejd robbieyeah lully6 lehy amackey dengyishuo chansonz johnsonhsieh tsieger kemey188 meren incas56 alstat janeshdev dannyarends jkruppa chibuisimaduka nicholasehamilton jofrhwld imclab agstudy geotheory lucentcosmos microbe rpruim stefanedwards dirkduellmann imgeek rasmusab fmitha kejun2013 hjmjohnson kevinushey xiuying nietzsche1993 abdugadir brodieg sebkopf superxroot

ggplot2's Issues

Order aesthetic does not affect order of dodged groups

In the following example, the groups should be displayed in alphabetical order: gain on the left and loss on the right.

df <- data.frame( 
  FRAME = factor(c("Loss", "Gain", "Loss", "Gain")), 
  Mean = c(5.28, 5.23, 4.95, 5.12), 
  FOCUS = factor(c("Prevention", "Prevention", "Promotion", "Promotion"))
)

dodge <- position_dodge(width=0.90)
ggplot(df, aes(FOCUS, Mean, order=FRAME, colour=FRAME)) + 
  geom_point(position=dodge, size = 4)

Store labels in options

By default, scale names should be stored in options. This will allow simplification of the current default scale production so that scales only need to be added when the plot is drawn, and fix the following bugs:

can not use ylab() to set the label for histograms
annotate overrides the scale defaults

geom_text should be able to produce expressions by parsing text

mydata <- data.frame(x = 1:10, y = 1:10)

ggplot(mydata, aes(x, y)) +
  geom_point() +
  annotate("text", x = 6, y = 7, label = "beta[1] == 1", parse = T)

geom_abline doesn't work with scale_log10

myd <- data.frame(
  myvar = c(1,5,10,50,100,500,1000,5000,10000),
  myvarb = c(1:9)
)

base <- ggplot(myd, aes(myvarb, myvar))+
  geom_point()+
  geom_abline(intercept = 1, slope = 100)

base + scale_y_log10()

geom_rect and geom_line should support infinite coordinates

This would be useful for annotations, and for geom_vline and geom_hline. The main complication would be ensuring correct translation in coordinate systems.

  ggplot(data.frame(x=0:1,y=0:1)) +
    geom_rect(aes(x=x,y=y),xmin=.1,xmax=.2,ymin=-Inf,ymax=Inf)

scale_***_ should not have trans argument

Because setting it is confusing.

For data, NULL != data.frame()

For example,

qplot(mpg, wt, data = mtcars) + 
  geom_point(data = data.frame(), colour = "red")

should display black points.

This can occur when subsetting and is annoying.

coord_equal does not work with log scales

df <- data.frame(a = 2 ^ (1:10), b = 3 ^ (1:10))
myplot <- ggplot(df, aes(a,b)) + 
  geom_line() + 
  scale_x_log10() + scale_y_log10()

myplot
myplot + coord_equal()

Transformations of scale_colour_gradient don't work

The following two plots are identical:

qplot(carat, price, data = diamonds, geom = "bin2d") +
  scale_colour_gradient()

qplot(carat, price, data = diamonds, geom = "bin2d") +
  scale_colour_gradient(trans = "log10")

Better interaction with theme font size and pointsize

I'm not sure what to do here, but it should at least be documented.

p = qplot(runif(5),runif(5)) + opts(title="A dumb plot")

# this is the default
theme_set(theme_grey(12))
quartz(width=5, height=4, pointsize=12)
p

# this has display issues (overwriting in legends etc.) because the
# font size is too large compared to what the device "expects"
theme_set(theme_grey(25))
quartz(width=5, height=4, pointsize=8)
p

# when the pointsize is set accordingly, the proportions are
# "harmonious" again (well, it still does not look great that's because
#25 is much to big for this size of plot)
theme_set(theme_grey(25))
quartz(width=5, height=4, pointsize=25)
p

# Ahhh, that's better
theme_set(theme_grey(10))
quartz(width=5, height=4, pointsize=10)
p

Ordering across geoms is inconsistent

df <- data.frame( 
  FRAME = factor(c("Loss", "Gain", "Loss", "Gain")), 
  Mean = c(5.28, 5.23, 4.95, 5.12), 
  FOCUS = factor(c("Prevention", "Prevention", "Promotion", "Promotion"))
)

dodge <- position_dodge(width=0.90)
base <- ggplot(df, aes(FOCUS, Mean, fill=FRAME, colour=FRAME)) + 
  geom_point(position=dodge, size = 4)

# Do work:
base + geom_text(aes(label = Mean), position = dodge)
base + geom_bar(position=dodge, stat="identity")
# Don't work
base + geom_bar(position=dodge)
base + geom_boxplot(position=dodge)

facet_grid doesn't work with multiple variables

 qplot(mpg, wt, data = mtcars) + facet_grid(vs + am ~ .)

Ordinal scales

For integers with limited range and ordered factors, ggplot2 should use ordinal scales. In particular, colour should default to an sequential brewer palette, and shape should use shapes with increasing number of edges.

coord_equal does not use space efficiently

In the following example, it expands the y-axis too much - it should change the aspect ratio instead.

qplot(mpg, wt, data = mtcars) + coord_equal()

Break names should be used as labels

If breaks is a named vector, the names should be used as labels.

coord_map should apply same transformation as coord_equal

To ensure that map aspect ratios are correct

scale should have option not to produce legend

geom_path & geom_line should have lineEnd parameter

To give control over low-level grid params.

scale_colour_gradientn ignores labels

Problem with end collision detection in coord_polar

df <- data.frame(x = runif(100) * 360, y = runif(100))
base <- ggplot(df, aes(x, y)) + geom_point() + coord_polar()

base
base + scale_x_continuous(breaks = c(0, 180, 270))

facet_wrap messes up data order

Because of use of merge. Replace with join function from devel version of plyr.

Layer with all coordinates set does not work

ggplot(mtcars, aes(mpg, wt)) + 
  geom_point() + 
  geom_rect(xmin = 15, xmax = 20, ymin = 3, ymax = 5)

vs.

ggplot() + 
  geom_point(aes(mpg, wt), data = mtcars) + 
  geom_rect(xmin = 15, xmax = 20, ymin = 3, ymax = 5)

coord_trans doesn't respect expand

qplot(rlnorm(100), rlnorm(100)) + 
  coord_trans(x="log") + 
  scale_y_continuous(limits=c(0,10),expand=c(0,0))

coord_polar incorrectly combines expression tick marks

df <- data.frame(x = runif(10) * 2 * pi)

base <- ggplot(df, aes(x, 1)) + 
  geom_point() + 
  ylim(0,1.1) + 
  coord_polar()

breaks <- seq(0, 2 * pi, pi/2)
base + scale_x_continuous(limits = range(breaks), breaks = breaks)

labels <- c("0", expression(frac(pi,2)), expression(pi),
   expression(frac(3*pi,2)), expression(pi))
base + scale_x_continuous(limits = range(breaks), 
  breaks = breaks, labels = labels)
#0 point converted to string

geom_contour

geom_contour generates errors and does not correctly draw contours for density_2d

qplot(rating, length, data = movies, geom = "density2d",
  colour = factor(Comedy), ylim = c(0, 150))

facet_grid free space calculations incorrect

da <- data.frame(
  x = rep(1:5, 2),
  y = c(1+(0:4), 1+(0:4)/10),
  z = rep(1:2, each=5))

qplot(x, y, data=da, group=z, geom="line") +
  facet_grid(z ~ ., scales="free_y", space="free")

Panel two is about half the size it should be

Should be possible to set right = FALSE in stat_bin

scale_shape and scale_size are named inconsistently

This makes it impossible to set defaults set_default_scale("shape","discrete", "manual",value=c(1)) and adds ugly special case code to Scales$add_defaults.

Add geom_curve to match book

Should be geom_line + stat_function

Facetting doesn't work with partial margins

qplot(mpg, wt, data = mtcars) + facet_grid(vs ~ am, margins = "grand_col")

coord_flip and free scales don't work

Although in principle it should be fairly straightforward.

coord_polar should have theta and r limits

Scales don't train on set values

qplot(mpg, wt, data=mtcars) + geom_point(x = 0)
qplot(mpg, wt, data=mtcars) + geom_point(x = 20)

hcl colour scale

That takes hue, chroma and luminance as inputs and outputs colour.

Legend with three constraints doesn't work

data <- data.frame(x = 1:10, y1 = 1:10, y2 = 1:10)
ggplot(data=data, aes(x=x)) + geom_smooth(aes(y=y1, linetype="y1"))
ggplot(data=data, aes(x=x)) + geom_smooth(aes(y=y1, color="y1"))
ggplot(data=data, aes(x=x)) + geom_smooth(aes(y=y1, color="a", linetype="b"))

stat_ecdf

Create stat for computing ecdf. Work with geom_step to create plot of ecdf.

Dates not correctly transformed when used as colour scale

a <- structure(list(date = structure(c(1227633710, 1227633725, 1227633740, 
1227633756, 1227633771, 1227633786, 1227633802, 1227633818, 1227633833, 
1227633848, 1227633863, 1227633878), class = c("POSIXt", "POSIXct"
), tzone = ""), x = c(10.2516422, 11.1053127, 11.6594761, 13.2709195, 
14.0127093, 14.7598268, 14.6802017, 15.2590637, 15.5089719, 15.5101305, 
15.1716798, 14.9050077), y = c(-11.5090033, -10.8087639, -10.0120475, 
-7.9856268, -6.5551218, -5.1593914, -4.1750486, -2.3780065, -0.5245896, 
1.3263083, 3.3031301, 4.5791333)), .Names = c("date", "x", "y"
), row.names = c(NA, -12L), class = "data.frame")

ggplot(a) + geom_point(aes(x=x, y=y, colour=date))

legend should respect layer data subset

In this example, points and lines are used for both values.

df.actual <- data.frame(x = 1:10, y = 1:10)
df.approx <- data.frame(x = 1:15, y = 1:15 + rnorm(15, sd = 3))
ggplot(df.actual, aes(x, y)) +
  geom_point(aes(colour="Actual")) +
  geom_line(aes(colour="Approximate"), data=df.approx)

coord_cartesian and non-linear scales

xlim and ylim should be specified in the original data units, and use the scales to transform them into plotted units.

Problem with coord_polar, geom_path and x limits

x <- seq(-5,5,by=0.1)
d <- data.frame(x = x, y = dnorm(x))

base <- ggplot(d, aes(x, y)) + geom_path() + geom_point() + ylim(0, 0.4)

base + coord_polar()
base + coord_polar() + xlim(-1, 1)

coord_map and geom_tile

library(maps)
data <- data.frame(
  lat = c(-41,-42,-41,-42),
  lon = c(170,170,171,171),
  var = c(1,2,3,4))
coast <- data.frame(map("nz", plot=FALSE)[c("x","y")])

plot <- ggplot(data,aes(lon, lat)) + 
  geom_tile(aes(fill = var)) +
  geom_polygon(data = coast,aes(x, y))
plot #OK
plot + coord_equal() #OK
plot + coord_map()

scale_date and scale_datetime should warn if dates not in correct format

coord_flip and geom_hline/geom_vline

qplot(mpg, wt, data = mtcars) + geom_vline(xintercept = 20) + coord_flip()
qplot(mpg, wt, data = mtcars) + geom_hline(yintercept = 5) + coord_flip()

dummy <- data.frame(
       x = rnorm(1000),
       y = rnorm(1000),
       z = gl(2, 500, labels = c("A rather long label", "An even longer
label than the first label"))
)
ggplot(dummy, aes(x = x, y = y, colour = z)) + geom_point()
last_plot() + theme_gray(base_size = 4)

Reported by Thierry Onkelinx

set.seed(1)
d = data.frame(x=1:10, y=runif(10), z=runif(10))
ggplot(d) + geom_point(aes(x,y, size=z, colour=z)) + scale_area()
# fails

set.seed(7)
d = data.frame(x=1:10, y=runif(10), z=runif(10))
ggplot(d) + geom_point(aes(x,y, size=z, colour=z)) + scale_area()
# works