Comments (6)
Hi @aderzelle - I know it was a long wait, but I finally got to optimize the code, in a few steps:
- I vectorized loops in
compute_matrix()
andcompute_unions()
:
upset_data()
was benchmarked with TCGA BRCA mutations data, showing a speed-up by factor of:- two for (from 0.8s to 0.4s) for ~1000 observations x 70 sets
- six for (from 26s to 4s) for ~1000 observations x 1000 sets
- Further re-writing of
compute_matrix()
,compute_unions()
, andnames_of_members()
led to another improvement:- down to 0.12s for ~1000 observations x 70 sets
- down to 0.65s for ~1000 observations x 1000 sets
- Finally I narrowed down another performance issue, this time specific to the datasets with very large #observations (incorrect assignment to
with_sizes
data frame), bringing the times down to:- 0.10s for ~1000 observations x 70 sets
- 0.52s for ~1000 observations x 1000 sets
- 6.1s for 1.5 milion rows x 11 sets, as in your question @aderzelle.
I used TCGA BRCA data set (84723 unique SNPs), randomly selected 11 patients, duplicated it 18 times to get over 1.5 milion rows. Plotting this dataset after all the optimizations took 7.9s.
I only optimized the code for plain use case of calling upset(data, columns)
so there still may be arguments that will execute non-optimized code (e.g. filtering or sorting) - I will address these subsequently (please feel welcome to highlight any such cases by opening a new issue!). These changes will be available in 0.7.4 version (it can be already installed directly from GitHub).
I would love to learn if the performance is now satisfactory for you.
from complex-upset.
Thanks for bringing this up. I also noted some slowdown when using omics datasets - I will have a look at improving the performance later tonight!
from complex-upset.
Perfect, I will be happy to do the testing.
from complex-upset.
It actually took roughly 1h to get the final plot.
from complex-upset.
A recent commit, df58c0d, should have a side effect of slightly improving the performance (although it would not be a big difference).
from complex-upset.
There are more speed-ups possible:
- everything so far is single-core, yet most of the functions are embarrassingly parallelizable.
parallel
could become a suggested package, and if installed, it could be used for some heavier tasks. - my code is to a degree still influenced by Python mindset where the key is not to vectorise everything, but to make a clever use of hashing. It is odd that R does not expose hastables as a concept, but it seems those are indeed in use: https://www.r-bloggers.com/2015/04/hash-table-performance-in-r-part-iiin-part-i-of-this-series-i-explained-how-r-hashed/
- use lazy evaluation for union size calculation
from complex-upset.
Related Issues (20)
- sort_sets user order HOT 1
- Change axis text size and bar text size HOT 1
- Incorrect intersection size HOT 8
- Memory issue when passing "intersection=" parameter to large matrix HOT 1
- Add more examples
- add intersection of the union of certain sets within the same intersection upset plot. HOT 2
- Set default color of dots in the matrix plot HOT 1
- Ordering by intersection sizes and degree
- y axis units for the intersection size and stacking plots HOT 7
- Adjusting “Intersection size” to display the percentage of each group when the bars are filled
- Issue with fill colours for set sizes HOT 3
- Changing the y-axis scale for intersection_size from ComplexUpset package HOT 1
- Mapping categorical values to dots in intersection matrix HOT 1
- Deviation
- Possible built-time bug in `upset_themes`
- Coloring multiple *specific* insections
- API seems broken after the latest updates in ggplot2 (>= 3.5.0) HOT 28
- Error "The `axis.text.theta` theme element is not defined in the element hierarchy." HOT 2
- How to color points by variable not in the dataset
- Setting order of intersections with a long list
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from complex-upset.