Total numer of cases

Hello, I was wondering if you can add the total numer of cases in the insersection size plot. I mean the sum() each intersection size.

Highlight multiple intersections

Thanks for the great package. Is it possible to highlight / query multiple intersections in the intersection matrix using different color?

If I update the example from the doc, using

      intersect=c('Drama', 'Comedy'),
      intersect=c('Romance', 'Drama'),

I receive the error

Error: Aesthetics must be either length 1 or the same as the data (4): colour


How to use the aes parameter?


this code

upset(DT3, sampName, name='SNP', width_ratio=0.5, min_size=1000, sort_sets=FALSE, 
        'Intersection size'=intersection_size(text_colors=c(on_background='black', on_bar='black')
          upset_query(set='H4A4', fill=vc[2]), upset_query(intersect = sampName, color=vc[3], fill=vc[3], only_components=c('intersections_matrix', 'Intersection size'))


Error: Mapping should be created with `aes() or `aes_()`.

It seems to be expecting an aesthetic mapping for fill and color argument? I am just trying to color some bars and intersections as per the doc:

    movies, genres, name='genre', width_ratio=0.1, min_size=10,
    annotations = list(
            aes=aes(x=intersection, y=length),
            intersect=c('Drama', 'Comedy'),
            only_components=c('intersections_matrix', 'Intersection size')
            intersect=c('Romance', 'Comedy'),

Large counts numbers do not fit the bars

Hello, it's me again
I noticed sometimes the numbers displayed in the bars are not correct

The bar for the first intersection set is obviously not 39123 but more than 1 million. I would guess as the figure is big and doesn't fit inside the bar it's just chopped off.

In upset() plot: Vertical lines connecting dots are not displayed anymore

Hi Mike,

Today, noticed that
in the upset() plot everything is ok,
the vertical lines "connecting" any dots in a column
are not displayed anymore.

The black dots display fine,
but without "connector" lines btw them.

Simple Upset Plot
(or with any other +complex example),
no vertical "connector" lines are shown btw dots.

movies, genres,
base_annotations=list('Intersection size'=intersection_size(counts=FALSE)),

ComplexUpset pkg v 052,
w/ Ubuntu Linux 18.04, latest Rstudio and R.
see image below:


Put a geom_text related to the max intersection

Hi, I wanted to know how to situate a geom_text related to the max intersection of the plot.

  aes=aes(x=intersection, y=EDAD)
             ,geom_hline(aes(yintercept = median(EDAD))
             ,geom_text(aes(max(intersection),median(EDAD),label = round(median(EDAD),1), vjust = -1))

It locates in the red circle, and wanted to situate it in the red arrow.
Screen Shot 2020-06-15 at 12 35 55

Double/dodged bars

Great package!

I'm wondering if there is an easy way to have two side-by-side bars for each intersection?
I have my observed data that I want to mainly plot, but by each intersection I want the expected (based on my null model) intersection size so that I can clearly see how different the observed is from the expected. Sometimes expected is > observed and sometimes < observed so the stacked bars isn't clear for that.

Add geom_text to geom_bar

Sorry about this basic question...
I have this:

annotations = list(
    aes=aes(x=intersection, fill=FALLECIDOS)
      geom_bar(stat='count', position='fill')


I need to add the geom_text in black color only in the red bars

Order intersections

Is it possible to prespecify the order of the intersections? I know there are option sort_intersections and sort_intersections_by but they don't allow ordering by, say, a factor in the input data frame.

custom titles

I can't find a way to change the words "Intersection size" at the y-axis. Would be great to have an option to set all the titles manually, same as in ggplot. Or if I am blind, please show me the way.
Thank you in advance

Causes error with knitr?


I don't know if it's a knitr or upset issue, but trying to knit the Rstudio markdown returns

Quitting from lines 734-744 (NDPD_calling.Rmd) 
Error in `$<`(x, name, value) : 
  replacement has 0 rows, data has 1585017
Calls: <Anonymous> ... upset_data -> $<- -> $< -> $<
In addition: Warning message:
In get_engine(options$engine) :
  Unknown language engine 'bas' (must be registered via knit_engines$set()).
Execution halted

Here is the code of lines 734 - 744

DT2 = DT%>%mutate_at(-1, ~case_when(grepl('^0/1', .) ~ 1L, 
                           grepl('^0/0', .) ~ 0L,     
                           grepl('^1/1', .) ~ 1L, 
                           TRUE~ NA_integer_))
sum(Reduce('|', lapply(DT2,
DT3 = na.omit(DT2)
DT3 =
sampName = list(colnames(test)[2:12])
upset(DT3, sampName, name='SNP', width_ratio=0.5, min_size=1000, sort_sets=FALSE)

Commenting out the upset command resolves the error message.

Don't use hyphens in variable names

I don't know why it behaves like this, but if you use eg Co-amoxiclav instead of Co amoxiclav - you get an incorrect upset plot where all of the points of Co-amoxiclav are missing. Using Co amoxiclav fixes this. Took me about an hour to work out what was going on here so hopefully helpful to someone else (maybe even easily fixable).

Error in unlist(intersect) : object 'genres' not found

Using the sample data from the readme:


  annotations = list(
      aes=aes(x=intersection, y=length),
      aes=aes(x=intersection, y=rating),
        # if you do not want to install ggbeeswarm, you can use geom_jitter
        geom_violin(width=1.1, alpha=0.5)
      intersect=c('Drama', 'Comedy'),
      only_components=c('intersections_matrix', 'Intersection size')
      intersect=c('Romance', 'Drama'),

I get the following error:

Error in unlist(intersect) : object 'genres' not found

Is this related to the ggplot2 update?

Here is my sessionInfo():

R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.2

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ComplexUpset_0.4.0 ggplot2_3.3.0.9000 testthat_2.3.2     devtools_2.2.2     usethis_1.5.1     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3        compiler_3.6.2    pillar_1.4.3      prettyunits_1.1.1 remotes_2.1.1    
 [6] tools_3.6.2       digest_0.6.25     pkgbuild_1.0.6    pkgload_1.0.2     memoise_1.1.0    
[11] lifecycle_0.2.0   tibble_2.1.3      gtable_0.3.0      pkgconfig_2.0.3   rlang_0.4.5      
[16] cli_2.0.2         rstudioapi_0.11   patchwork_1.0.0   withr_2.1.2       dplyr_0.8.5      
[21] desc_1.2.0        fs_1.3.2          tidyselect_1.0.0  rprojroot_1.3-2   grid_3.6.2       
[26] glue_1.3.1        R6_2.4.1          processx_3.4.2    fansi_0.4.1       sessioninfo_1.1.1
[31] purrr_0.3.3       callr_3.4.2       magrittr_1.5      backports_1.1.5   scales_1.1.0     
[36] ps_1.3.2          ellipsis_0.3.0    assertthat_0.2.1  colorspace_1.4-1  munsell_0.5.0    
[41] crayon_1.3.4     

min_degree not working as intended

The parameter min_degree fails to work as intended. When given the parameter min_degree = 0, it will plot all intersections as expected included the case of the null set of intersections. However when given the parameter min_degree = 1, it will exclude not only the null set but any set consisting of just one intersection which removes the opportunity to exclude the null set.
0,0,0 0,0,0 0,0,1 0,1,0 0,1,1
When tested on the above csv, we get the following outcomes:
In my case, I am working with categorical data dealing with cases of a medical event which is massively sparse so the vast majority of people don't have any events recording but I need to exclude them in this analysis.
Hope that this can be resolved as otherwise I am very impressed with the package.

sort intersections by degree?

Is it possible to sort intersections by degree? I would like to be able to sort the intersections with the highest degree first (e.g., have the intersection shared between the most number of sets come first). This is possible with in the UpSetR package with UpSetR::upset( = "degree")

Highlight the empty intersection

Is it possible to highlight the "empty" intersection / complement (i.e. the intersection where all the binary variable are false)?

There is no problem plotting the empty intersection, but none of these hypothetical queries, work:

upset_query(intersect = character(), fill='yellow')
upset_query(intersect = c(), fill='yellow')
upset_query(intersect = NA, fill='yellow')

Missing 'x' argument in gsub

First of all great package, I really like the extensibility of it !
I have discovered that the 'x' parameter in the gsub function in names_of_true is missing, which causes an error if the colnames contain a '-' character. Reprex is below:

  a = c(TRUE, TRUE, TRUE),
  c = c(TRUE, FALSE, TRUE)

colnames(test_frame) <- c("a","b","C-")
apply(test_frame, 1, ComplexUpset::names_of_true)
#> Error in gsub("-", "_"): argument "x" is missing, with no default

#add name as 'x' parameter for gsub
names_of_true2<-function (row) 
  sanitized_names = c()
  for (name in names(which(row))) {
    if (grepl("-", name, fixed = TRUE)) {
      name = gsub("-", "_",name)
      if (name %in% names(which(row))) {
        stop("The group names contain a combination of minus characters (-) which could not be simplified; please remove those.")
    sanitized_names = c(sanitized_names, name)
  paste(sanitized_names, collapse = "-")

apply(test_frame, 1, names_of_true2)
#> [1] "a-C_" "a"    "a-C_"

Created on 2020-03-31 by the reprex package (v0.3.0)

Additionally I was wondering what the reason was for using a for loop in this function as grepl and gsub support vector inputs, removing the for loop could have significant performance gains especially for very 'wide' data

#test_data 10 obs 4000 variables
test_data <-data.frame(replicate(4000,sample(c(TRUE,FALSE),10,rep=TRUE)))

#Simple mock-up
names_of_true3<-function (row)
  true_row_names <- names(which(row))
  fixed_row_names <- gsub("-","_",true_row_names)
  if(any(grepl("-",fixed_row_names, fixed = TRUE))) {stop("- detected")}
  paste(fixed_row_names, collapse = "-")

  apply(test_data, 1, names_of_true3),
  apply(test_data, 1, ComplexUpset::names_of_true)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 x 6
#>   expression                                           min  median `itr/sec`
#>   <bch:expr>                                       <bch:t> <bch:t>     <dbl>
#> 1 apply(test_data, 1, names_of_true3)               24.4ms  27.2ms     37.0 
#> 2 apply(test_data, 1, ComplexUpset::names_of_true) 124.1ms 129.8ms      7.76
#> # ... with 2 more variables: mem_alloc <bch:byt>, `gc/sec` <dbl>

Created on 2020-03-31 by the reprex package (v0.3.0)

upset doesn't show all overlapping elements in sets


I've been trying to get upset to display all overlapping elements between sets but I can't, and its behaviour is puzzling:

# Example data
test <- data.frame(
  stringsAsFactors = FALSE,
           id_set1 = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"),
           id_set2 = c(NA, NA, NA, NA, "e", NA, NA, "h", "i", "j"),
           id_set3 = c("a", "b", NA, "d", "e", "f", "g", "h", "i", "j"),
           id_set4 = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j")

# Turn the values into Boolean
elements <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j")

test2 = data.frame(
	set1 = ifelse(test[,1] %in% elements, 1, 0),
	set2 = ifelse(test[,2] %in% elements, 1, 0),
	set3 = ifelse(test[,3] %in% elements, 1, 0),
	set4 = ifelse(test[,4] %in% elements, 1, 0))

upset(test2, colnames(test2))

upset(test2, colnames(test2))

I thought that the height of the bars should show counts of common elements in the given sets (their intersections, in the sense of intersect()), but it doesn't. For example, the plot suggests there are five common elements in sets 1, 3 and 4. But we can see from the table that there are nine (the only missing element is c in set3). On the other hand, it correctly shows that there are four common elements across all four sets (e, h, i, j). Then again, sets 1 and 4 are identical, so they have ten common elements. Where does the 1 come from on the plot? (I also don't mention here that not all combination of sets are shown)

Could someone please clarify this behaviour?


Change height of annotations

Hi, I wanted to know how to change the height of this annotations:

                 ,name='TERAPIAS ANTI COVID19'
                 ,stripes=c("white", "#B4D5DE")
                 ,keep_empty_groups = TRUE
                   'Intersection size'=intersection_size(text_aes=aes(label=paste0(round(intersection_size/union_size * 100), "%|",intersection_size,"")))
                 ,annotations = list(
                     aes=aes(x=intersection, y=LOS_DIAS),
                     aes=aes(x=intersection, fill=FALLECIDOS)
                       geom_bar(stat='count', position='fill')

Screen Shot 2020-06-12 at 13 49 55

I need to obtain something like this:
Screen Shot 2020-06-12 at 14 06 36

Add an example showing how to display counts on "Set size"s

Based on the suggestion from #23 (comment) by @sfd99.

One quick solution:

    movies, genres,
        geom=function(...) {
                geom_text(..., aes(label=..count..))

but I will also consider:

  • adding a dedicated helper function
  • a new parameter to upset_set_size
  • and other possible ways of achieving that without modifying the package

Single legend for side-by-side plots?

Thanks for this great package. I really like the ability to stack the intersection plots and I have been looking for something like this for a long time. I'm presenting two upsets side-by-side, but I would like like to be able to show a single legend. In the image below I would like to have just one legend showing the colors for the phyla.
Thanks! Chuck

Highlight covers on-bar counts

I might be doing something wrong, but it seems that if a bar on the intersection sizes plot is highlighted, then any on-bar counts are hidden.

In the example below, the count is on the bar for the first 2 bars, but when the second is highlighted the count isn't visible on that one. I tried changing the text colours, but it didn't have any effect on the highlighted bar.

model <-
    guess = c(T,T,T,T,F,F,F,F),
    real  = c(F,T,T,T,T,F,T,T)

  intersect = c("guess", "real"),
  queries = list(
      intersect = c("guess", "real"),
      fill = "blue"
      intersect = NA,
      fill = "blue"

Created on 2020-06-18 by the reprex package (v0.3.0)

Add ignore_tag=TRUE to the intersection matrix

For better compatibility with plots composition, ignore_tag should be set on the intersection matrix; it should be optional. An article om composing multiple upset plots should be also added.

Error message when running GitHub R example of ComplexUpset

Hi Mike, 

ComplexUpset is GREAT! .

A quick question...

...when I run your R code movies example,
in link:
(the example R source code just under the nice upset plot!...),

I get this error message:

[1] "Converting non-logical columns to binary: Action, Animation, Comedy, Drama, Documentary, Romance"
Error in, intersect) : no vector columns were selected**

ggplot2movies, beeswarm, etc. 
+all required pkgs are loaded ok...

Help! what am I missing?
San Francisco
Ubuntu Linux 18.04,
latest versions of Rstudio and R.

Consider renaming the parameters of intersection_size

intersection_size and intersection_ratio accept:

  • bar_number_threshold (could be bar_label_threshold?)
  • text and text_aes can be merged into a list, same way as annotations?
  • aest - rename to mapping?
  • text_colors - allow to specify one color for both easily

Maybe something like:

  # could be "text" instead of "counts"
  counts=upset_counts(aes=aes(), on_bar_when_lt=0.75, geom=list()),

when counts=FALSE, then do not display counts. Remember to update upset_text_percentage.

Slow plotting of a big dataset (10^6 observations, 11 sets)


thanks for this package, very cool looking and I am glad you help making live the upset concept in R. I have a dataset that looks like this:

              ID control D2A1 D2B3 D3A1 D4A3 D5B3 H2A3 H2C3 H4A4 H4C2 H5A3
3  Chrom_3_1398_C_T            1    0    0    0    0    0    0    0    0    1    0
10 Chrom_3_4061_A_T            0    1    1    1    1    0    1    1    1    1    1
11 Chrom_3_4064_C_T            0    1    1    1    1    0    1    1    1    1    1
12 Chrom_3_4065_G_A            0    1    1    1    1    0    1    1    1    1    1
13 Chrom_3_4069_C_T            0    1    1    1    1    0    1    1    1    1    1
14 Chrom_3_4093_A_C            1    1    1    1    1    0    1    1    1    1    1

Basically it's just a list of genomic mutations (ID column), across different biological samples (other columns).
My code is

sampName = list(colnames(DT3)[2:12])
upset(DT3, sampName, name='SNP', width_ratio=0.1)

The data set is roughly 1,5 million rows long and it seems to be taking forever to plot.

sort_intersections_by degree than by cardinality

Hi- I just started playing with ComplexUpset (0.5.15), thank you - great stuff!

As per title, it would be useful to have the option to have columns in the intersection matrix ordered by degree than by cardinality. E.g. like sort_intersections_by = c('degree', 'cardinality').

Probably a separate feature request: It would be nice if the intersect parameter (or some other parameter) could take a list of lists of intersections and plot only those (similar to the original upset in UpSetR), optionally respecting the order in the outer list

