Git Product home page Git Product logo

paulgavrikov / parallel-matplotlib-grid Goto Github PK

View Code? Open in Web Editor NEW
14.0 1.0 2.0 188 KB

This Python 3 module helps you speedup generation of subplots in pseudo-parallel mode using matplotlib and multiprocessing. This can be useful if you are dealing with expensive preprocessing or plotting tasks such as violin plots per subplot.

License: MIT License

Python 100.00%
python matplotlib matplotlib-pyplot matplotlib-python multiprocessing

parallel-matplotlib-grid's Introduction

Parallel generation of grid-like plots using matplotlib

This Python 3 module helps you speedup generation of subplots in pseudo-parallel mode using matplotlib and multiprocessing. This can be useful if you are dealing with expensive preprocessing or plotting tasks such as violin plots per subplot.

Operation overview

How does it work?

This library uses pythons multiprocessing module to plot each cell individually. If provided, each process will first evaluate a user-defined preprocessing function. Afterwards, every process will call a second user-defined plotting function providing matplotlib axes to plot on. All created plots then stored as images and then retrieved and assembled by the main thread into a subplot without any decoration.

How do I install this module?

This module is in a very early stage, so no pypi releases are currently provided. However, you can simply install this module from git:

pip install git+https://github.com/paulgavrikov/parallel-matplotlib-grid/

How do I use it?

Aside from the data all you need to provide is the grid layout grid_shape and a plotting function plot_fn. Here is an example:

from parallelplot import parallel_plot

import matplotlib.pyplot as plt
import numpy as np


def violin(data, fig, axes):
    axes.violinplot(data)


# Gen some fake data 
X = np.random.uniform(low=-1, high=1, size=(30, 512, 512))

parallel_plot(plot_fn=violin, data=X, grid_shape=(3, 10))
plt.show()

Want to preprocess your data before plotting? No problem! just provide preprocess_fn. Here is an example where we apply a PCA transformation:

from parallelplot import parallel_plot

import matplotlib.pyplot as plt
import numpy as np
from sklearn.decomposition import PCA


def preprocess(data):
    return PCA().fit_transform(data)


def violin(data, fig, axes):
    axes.violinplot(data)


# Gen some fake data
X = np.random.uniform(low=-1, high=1, size=(30, 512, 512))

parallel_plot(plot_fn=violin, data=X, grid_shape=(3, 10), preprocessing_fn=preprocess)
plt.show()

When should I not use this library?

There are some cases where this module is either useless or adds overhead. Here are a few of those:

  • Your plot function and preprocessing functions execute fast, but your data is big. multiprocessing uses pickle as input / output format of process tasks which requires data to be serialized. This can introduce a significant overhead.
  • Your data is over 4 GiB big. For some reason multiprocessing is using some ancient pickle format that only supports data up to 4 GiB of size. There are ways to bypass that, but it's probably not worth it, as pickling is slow, and the computational overhead may not be worth it.
  • You only have one core available. Sorry 'bout that.

How do I contribute?

Just create a PR or feel free to raise an issue for questions, feature-requests etc.

parallel-matplotlib-grid's People

Contributors

paulgavrikov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

parallel-matplotlib-grid's Issues

[REQUEST] Windows surport

In WSL(Windows Subsystem Linux. it is just linux), this package works well.

But in Just normal windows, the simple code below is not run well but wait forever.
and terminal shows lot of errors about multiprocessing.

what i tried

import numpy as np
from parallelplot import parallel_plot

def scatter(data, fig, axes):
    axes.scatter(data[:, 0], data[:, 1])

X = np.random.uniform(low=-1, high=1, size=(5, 50, 50))

parallel_plot(plot_fn=scatter, data=X, grid_shape=(1, 5),  show_progress=False )

result in terminal is

Process SpawnPoolWorker-3:
Traceback (most recent call last):
  File "\Python310\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "\Python310\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "\Python310\lib\multiprocessing\pool.py", line 114, in worker
    task = get()
  File "\Python310\lib\multiprocessing\queues.py", line 367, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'scatter' on <module '__main__' (built-in)>
Process SpawnPoolWorker-1:
Traceback (most recent call last):
  File "\Python310\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "\Python310\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "\Python310\lib\multiprocessing\pool.py", line 114, in worker
    task = get()
  File "\Python310\lib\multiprocessing\queues.py", line 367, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'scatter' on <module '__main__' (built-in)>
Process SpawnPoolWorker-2:
Process SpawnPoolWorker-4:
Process SpawnPoolWorker-5:
Traceback (most recent call last):
  File "\Python310\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "\Python310\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "\Python310\lib\multiprocessing\pool.py", line 114, in worker
    task = get()
Traceback (most recent call last):
  File "\Python310\lib\multiprocessing\queues.py", line 367, in get
    return _ForkingPickler.loads(res)
  File "\Python310\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
AttributeError: Can't get attribute 'scatter' on <module '__main__' (built-in)>
  File "\Python310\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
  File "\Python310\lib\multiprocessing\pool.py", line 114, in worker
    task = get()
  File "\Python310\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "\Python310\lib\multiprocessing\queues.py", line 367, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'scatter' on <module '__main__' (built-in)>
  File "\Python310\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "\Python310\lib\multiprocessing\pool.py", line 114, in worker
    task = get()
  File "\Python310\lib\multiprocessing\queues.py", line 367, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'scatter' on <module '__main__' (built-in)>

Describe the solution you'd like
not sure, but...... probably multiprocessing is the key point.

[BUG] labels overlap

Describe the bug
Any row or col label will not wrap, but overlap if it is too long.

Minimal Code To Reproduce

from parallelplot import parallel_plot
import matplotlib.pyplot as plt
import numpy as np


def violin(data, fig, axes):
    axes.violinplot(data)


X = np.random.uniform(low=-1, high=1, size=(45, 512, 512))
# _parallel_plot_worker((0, X[0]), violin, figsize=(3, 6))

labels = ["this is a very long label blah blah blah" for _ in enumerate(X)]

parallel_plot(plot_fn=violin, data=X, grid_shape=(3, 15), cleanup=False,
              col_labels=labels, row_labels=labels)
plt.show()

Expected behavior
Labels should wrap.

[BUG] Top / Right Axis of subplots not always visible

Describe the bug
The Top / Right Axis of subplots not always visible. EDIT: This seems to be applicable to everything close to the edges.

Minimal Code To Reproduce

from parallelplot import parallel_plot

import matplotlib.pyplot as plt
import numpy as np


def violin(data, fig, axes):
    axes.violinplot(data)


# Gen some fake data 
X = np.random.uniform(low=-1, high=1, size=(30, 512, 512))

parallel_plot(plot_fn=violin, data=X, grid_shape=(3, 10))
plt.show()

Expected behavior
All axis should be visible

[BUG]

Describe the bug
TypeError: parallel_plot() got an unexpected keyword argument 'figsize'

I saw your blog post(https://towardsdatascience.com/plotting-in-parallel-with-matplotlib-and-python-f7efb3d944de)
and try scatter example has figsize option.

parallel_plot(plot_fn=scatter, data=X, grid_shape=(1, 5), figsize=(25, 5))

but it doesn't work.

Minimal Code To Reproduce
parallel_plot(plot_fn=scatter, data=X, grid_shape=(1, 5), figsize=(25, 5))

Expected behavior
figsize option accepted

Environment (please complete the following information):

  • OS: WSL
  • Python Python 3.8
  • latest

Additional context
Add any other context about the problem here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.