adrn / schwimmbad Goto Github PK

A common interface to processing pools.

License: MIT License

Python 93.58% TeX 6.42%

schwimmbad's Introduction

das Schwimmbad

schwimmbad provides a uniform interface to parallel processing pools and enables switching easily between local development (e.g., serial processing or with multiprocessing) and deployment on a cluster or supercomputer (via, e.g., MPI or JobLib).

Installation

The easiest way to install is via pip:

pip install schwimmbad

See the installation instructions in the documentation for more information.

Documentation

The documentation for schwimmbad is hosted on Read the docs.

Attribution

If you use this software in a scientific publication, please cite the JOSS article:

@article{schwimmbad,
  doi = {10.21105/joss.00357},
  url = {https://doi.org/10.21105/joss.00357},
  year  = {2017},
  month = {sep},
  publisher = {The Open Journal},
  volume = {2},
  number = {17},
  author = {Adrian M. Price-Whelan and Daniel Foreman-Mackey},
  title = {schwimmbad: A uniform interface to parallel processing pools in Python},
  journal = {The Journal of Open Source Software}
}

License

schwimmbad is free software made available under the MIT License. For details see the LICENSE file.

schwimmbad's People

Contributors

Stargazers

Watchers

Forkers

ahnitz miguelvm pombredanne uweschmitt jlustigy ajshajib zhangrite akumar01 aymgal luftwache helveg benjamindbb nanonabla reddtea kvolkel gregorydavidmartinez manodeep

schwimmbad's Issues

Get coverage to properly report for mpi.py

Right now, the MPI tests are run separately from the standard tests. After many attempts, I couldn't get the MPI tests to be included in the coverage report, so mpi.py is currently ignored in .coveragerc.

I fixed some issues I had with MPIPool

I had two issues with your current implementation of MPIPool:

Exceptions in the worker processes did not shutdown the pool, thus my applications froze.
MPIPool starts the worker tasks within the map implementations which then return None after some time. This does not cause problems if your code only calls map and exits afterwards, but using map multiple times in a script gave strange results and errors.

I implemented solutions for this in https://gitlab.com/uweschmitt/mpipool, maybe you want to integrate this solution in schwimmbad. I had no time to create a proper pull request.

Error using v0.4.0 in tests with github actions

Dear Adrian,

great that you update schwimmbad again. I find at a fantastic package.

Unfortunately, my github actions with Python >= 3.9 fail now for packages that use schwimmbad. See for example pyeee in Run tests. pytest tells:

def choose_pool(
...
>       if processes != 1 and MultiPool.enabled():
E       UnboundLocalError: local variable 'MultiPool' referenced before assignment

Kind regards,
Matthias

Release v0.2?

With @dfm's additions and with some documentation.

Speed difference to emcee.utils.MPIPool?

Is there a huge speed difference?

See: gnarayan/WDmodel@a65c676

Workers exit in choose_pool when using mpi

I noticed the following lines in init.py::choose_pool

if not pool.is_master():
sys.exit(0)

This seems to cause all workers to exit after constructing the MPIPool. Am I missing something?

use context manager?

It might be nice to use a context manager for things like close methods?

Is there any reason not to include conda install instructions?

It already seems to be on conda-forge...

SerialPool not starting

Hi there,

I'm currently using version 0.3.0:

In [1]: import schwimmbad
In [2]: schwimmbad.__version__
Out[2]: '0.3.0'

I'm finding that I can't get SerialPool to begin working. Here's a simple script which reproduces the problem:

import schwimmbad

with schwimmbad.MultiPool() as pool:
    inputs = [i for i in range(0,10)]
    pool.map(print, inputs)

with schwimmbad.SerialPool() as pool:
    inputs = [i for i in range(10,20)]
    pool.map(print, inputs)

The only output I get is 0..10. I've found the same problem on more complex examples, with my own functions, and with and without with statements.

Cheers

Always one MPI slave blocked, when pass in one Class

I have a lot of parameters to set, so first I pass in the whole class as one parameter to one function which is used to deal with MPI task, however I find that sometimes there is always one MPI blocked eg: 60 tasks distributed to 15 MPI processes, and at last only 59 tasks be worked out with one task blocked, and the program cannot exit succussfully because with one MPI process blocked!
Then I tried to use the Class with call method to deal with the 60tasks and the result is the same: sometimes all 60 tasks pass, sometimes onle 59 tasks pass.

As you can see when the 1 MPI process is blocked , all MPI processes showes 100 percent CPU usage.
Maybe it is because my Class is a little bit big?

Link to Read The Docs broken

Currently points here: http://http//schwimmbad.readthedocs.io/en/stable

imap in SerialPool and MPIPool

It'd be sweet if imap was implemented in MPIPool and SerialPool like in MultiPool (inherited) so that I could get a progress bar for big jobs using tqdm.

Right now I'm using:

with schwimmbad.choose_pool(mpi=args.mpi, processes=args.np) as pool:
    # make it so we can use imap in serial and mpi mode
    if not isinstance(pool, schwimmbad.MultiPool):
        pool.imap = pool.map
    main(pool)

and

with tqdm(total=nmc) as pbar:
    for i, d in enumerate(pool.imap(worker, arg_gen)):
        dat.append(d)
        pbar.update()

This works fine for SerialPool since map doesn't block (I guess?), but when using MPI it doesn't update the progress bar until it is entirely finished.

Previously I had MPI implemented just using mpi4py directly and was able to get the behavior I wanted, but I like the idea of using a single code for all possible run scenarios.

MPIPool is not working in python 3 environment

I am using exactly the same code mpi-demo.py and running the code in the python 3 environment by the command:
mpiexec -n 2 python3 mpi-demo.py.
But, I am getting an error given below:

bikash@b:~$ cd Documents/data_simulation/
bikash@b:~/Documents/data_simulation$ mpiexec -n 2 python3 mpi-demo.py
[b:04791] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[b:04791] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: /usr/lib/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
[b:04791] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[b:4791] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[37877,1],0]
  Exit code:    1
--------------------------------------------------------------------------
bikash@b:~/Documents/data_simulation$

Can anyone please suggest me how to recover this issue?
Thanks!

Move CI from travis -> actions

The current CI tests haven't been running because we were using Travis (RIP). We need to figure out how to install various MPI implementations on GitHub actions and run the tests there.

cc @dfm

Review the documentation

@dfm

Option to reuse a worker instance for different tasks

I wanted to gauge interest for introducing the option to reuse a worker object for multiple tasks in the MPIPool (instead of unpacking a separate instance of the worker before executing every task). Doing this would allow a worker object to cache some intermediate results (that may be non-trivial to pickle) and reuse it for subsequent tasks.

In one of my applications, I'm using a workaround in which I store the cached data in a global variable. However, this feature would facilitate the garbage collection of the cached result after the thread pool is closed.

I'm more than happy to submit a PR for this myself, but I wanted to make sure that there isn't strong opposition to this feature to before I start (especially since this is a bit of a departure from interface of multiprocessing.Pool).

Annoying coverage reporting

Related to the move to GitHub Actions

We're getting a lot of noise from coveralls which I don't totally understand because I don't see that in my other projects that use this workflow. I expect there's some leftover setting from the Travis days either on GitHub or on coveralls. I'm not sure what it would be because I don't have access to either, but maybe you can audit the settings @adrn? Let me know what you discover.

Large speed differences between MPIPool and MultiPool

Hi,

I'm testing the performance of both pools with the given demo script:

def worker(task):
    a, b = task
    return a**2 + b**2

def main(pool):
    # Here we generate some fake data
    import random
    n = 1000000
    a = [random.random() for _ in range(n)]
    b = [random.random() for _ in range(n)]

    tasks = list(zip(a, b))
    tic = time.time()
    results = pool.map(worker, tasks)
    toc = time.time()
    print(f'Elapsed time: {toc-tic}')
    pool.close()

    print(results[:8])

if __name__ == "__main__":
    import schwimmbad
    import time

    from argparse import ArgumentParser
    parser = ArgumentParser(description="Schwimmbad example.")

    group = parser.add_mutually_exclusive_group()
    group.add_argument("--ncores", dest="n_cores", default=32,
                       type=int, help="Number of processes (uses multiprocessing).")
    group.add_argument("--mpi", dest="mpi", default=False,
                       action="store_true", help="Run with MPI.")
    args = parser.parse_args()

    pool = schwimmbad.choose_pool(mpi=args.mpi, processes=args.n_cores)
    
    main(pool)

These are the result running on a single linux machine with 32 cores with python script-demo --ncores 32 and mpiexec -n 32 python script-demo.py --mpi respectively:

n = 100000
MultiPool : 0.03s
MPIPool : 0.58s

n = 1000000
MultiPool : 0.22s
MPIPool : 6.65s

n = 10000000
MultiPool : 2.37s
MPIPool : 68.76s

I've also run it on OSX with similar resulting gaps.

I understand that MPI may introduce some extra overhead, but are these large differences to be expected?

Trouble running MPI demo: need at least two MPI processes

When I try to run the MPI demo on my local machine:

mpiexec -n 2 python mpi-demo.py

I get the following error:

Traceback (most recent call last):
  File "mpi-demo.py", line 21, in <module>
    pool = MPIPool()
  File "/Users/avaj0001/anaconda3/envs/venv/lib/python3.7/site-packages/schwimmbad/mpi.py", line 89, in __init__
    raise ValueError("Tried to create an MPI pool, but there "
ValueError: Tried to create an MPI pool, but there was only one MPI process available. Need at least two.
Traceback (most recent call last):
  File "mpi-demo.py", line 21, in <module>
    pool = MPIPool()
  File "/Users/avaj0001/anaconda3/envs/venv/lib/python3.7/site-packages/schwimmbad/mpi.py", line 89, in __init__
    raise ValueError("Tried to create an MPI pool, but there "
ValueError: Tried to create an MPI pool, but there was only one MPI process available. Need at least two.

Any suggestions?

Submit the JOSS paper

Write and submit a JOSS paper for this package.

broadcast support?

I am using schwimmbad with some extensions for our particular work load. Specifically, we periodically need to guarantee that we run a function on each member of the pool. Would that be something that would be within the scope of this library, or is it too specialized? If it's within scope, I'd be interested in contributing the code here.

Write the documentation

We should have some minimal documentation...

TOC doesn't appear on most docs pages

e.g.

It would probably be good to add this unless there was a reason why you didn't want it.

MultiPool works always with only one core in my computer.

Hello Adrian,

thank you very much for your code for parallel computing. I would like to use it to change easily between my PC and the server.
I must have some local problem in my computer because schwimmbad.MultiPool() doesn't use more than one core. It runs, but using always one core.
I have checked that the pool is correctly defined:
<schwimmbad.multiprocessing.MultiPool state=RUN pool_size=8>
and have also checked that the Python multiprocess package (without schwimmbad) works correctly in the sense that
uses all the cores available.
Respect to MPI, it works correctly with schwimmbad.
I have installed schwimmbad with conda.

Do you have any clue of about which information should I provide in order to find the problem?

Thank you very much in advance,

Martín

JoblibPool on a cluster?

According to the documentation, using the joblib pool allows deployment on a cluster. Is this true in the case of multiple nodes? Or is it just limited to 1 node (like multiprocessing)?

Cheers,

Progress bar in MultiPool

Hi,

Is it possible to use something like tqdm to show a progress bar in a MultiPool call?

Thanks!

extend all pool interfaces to have common constructor

In some applications, there is read-only global state that needs to be distributed among workers. This can be implemented via multiprocessing.Pool's initializer and initargs options. The nonuniformity of the pool interfaces means that this workflow is not possible on an MPI pool. I think MPI broadcast allows this functionality to be built into this package. Would this addition be possible?

Bug w/ mpi4py >=3.0?

Via astrocrash on twitter:

there appears to be a bug in schwimmbad 0.3.1+ that makes it incompatible with mpi4py 3.0+. Are you aware?

File "/site-packages/schwimmbad/mpi.py", line 122, in wait
func, arg = task
ValueError: not enough values to unpack (expected 2, got 1)

How to use MPIPool in slurm?

I tried to use MPIPool in slurm but some errors occours:

Traceback (most recent call last):
  File "mpi_test.py", line 12, in <module>
    with MPIPool() as p:
  File "/home/nfs/admin0/miniconda3/lib/python3.8/site-packages/schwimmbad/mpi.py", line 89, in __init__
    raise ValueError("Tried to create an MPI pool, but there "
ValueError: Tried to create an MPI pool, but there was only one MPI process available. Need at least two.

Here is my python script:

import sys
from schwimmbad import MPIPool
import time
import random

def worker(task):
    time.sleep(1)
    return random.random()


if __name__ == "__main__":
    with MPIPool() as p:
        result = p.map(worker,range(100))
    print(result)

and here is my slurm script:

#!/bin/bash   
#SBATCH -J mpi_test      
#SBATCH -n 20         
#SBATCH -p cpu40   
#SBATCH -o mpi_test.out        # Output file name   

mpiexec -n 20 python mpi_test.py

Could anyone give me some suggestions? Is there something wrong with my python or slurm script?

Overcome the 2GB limit in MPI pickling

The MPI returns an error if the memory size is too large to be picked during the MPI. This issue has been reported before (see this). It raises an error like this.

File "/***/lib/python3.7/site-packages/schwimmbad/mpi.py", line 196, in map
    status=status)
File "mpi4py/MPI/Comm.pyx", line 1173, in mpi4py.MPI.Comm.recv
File "mpi4py/MPI/msgpickle.pxi", line 302, in mpi4py.MPI.PyMPI_recv
File "mpi4py/MPI/msgpickle.pxi", line 263, in mpi4py.MPI.PyMPI_recv_match
File "mpi4py/MPI/msgpickle.pxi", line 139, in mpi4py.MPI.Pickle.alloc
SystemError: Negative size passed to PyBytes_FromStringAndSize

Recently, mpi4py has provided a workaround in mpi4py.util (see this). To overcome this memory limit in MPI, I added two lines in schwimmbad/mpi.py as follows, and it worked. In the function _import_mpi,

def _import_mpi(quiet=False, use_dill=False):
    global MPI 
    try:
        from mpi4py import MPI as _MPI
        # see https://mpi4py.readthedocs.io/en/stable/mpi4py.util.pkl5.html
        from mpi4py.util import pkl5 # NEW, use pickle5
        _MPI.COMM_WORLD = pkl5.Intracomm(_MPI.COMM_WORLD) # NEW, use comm in pkl5
       ...

It will be great to enhance this in a future release of schwimmbad.