Git Product home page Git Product logo

Comments (3)

adrn avatar adrn commented on May 23, 2024

Sorry for the delay @martinmestre!

Can you say a bit more about how you are using the MultiPool, and what kind of machine you are running on?

When I test this on my MacBook Pro, running a Python script from the command line, it seems to work fine. Here's the script I ran:

import sys
import time
import random


def worker(task):
    i, num = task
    time.sleep(0.1)
    return num ** 2


if __name__ == '__main__':
    from schwimmbad import MultiPool

    size = None
    if len(sys.argv) > 1:
        size = int(sys.argv[1])

    tasks = [(i, random.random()) for i in range(100)]

    with MultiPool(processes=size) as pool:
        print(pool)

        results = []
        for r in pool.map(worker, tasks):
            results.append(r)

    sys.exit(0)

Some timing:

% time python schwimmbad_test.py 1
<schwimmbad.multiprocessing.MultiPool state=RUN pool_size=1>
python schwimmbad_test.py 1  0.59s user 0.51s system 9% cpu 11.184 total
% time python schwimmbad_test.py 2
<schwimmbad.multiprocessing.MultiPool state=RUN pool_size=2>
python schwimmbad_test.py 2  0.83s user 0.65s system 24% cpu 6.169 total
% time python schwimmbad_test.py 4
<schwimmbad.multiprocessing.MultiPool state=RUN pool_size=4>
python schwimmbad_test.py 4  1.45s user 1.02s system 63% cpu 3.885 total

So, the timings seem to be scaling as I would expect.

from schwimmbad.

adrn avatar adrn commented on May 23, 2024

Oh, and just in case you are using an interactive interpreter (like an IPython session or notebook), multiprocessing pools do not work in these settings: https://docs.python.org/3.8/library/multiprocessing.html#using-a-pool-of-workers

from schwimmbad.

martinmestre avatar martinmestre commented on May 23, 2024

Happy New Year @adrn !

Thanks for your answer.
My computer is Lenovo G50-80 with the following cpu info (Linux: Debian 9):

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 61
Model name: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz

I made some tests with your example. First I start giving the output from using your script. Then I will address the original problem I bumped into, which is your example of selecting a pool from the command line argument here:
https://schwimmbad.readthedocs.io/en/latest/examples/index.html.

  1. Using your original script above (with the worker just peforming a sleep task) I obtain the following outputs:

$ \time python script-adrn.py 1
<schwimmbad.multiprocessing.MultiPool state=RUN pool_size=1>
0.32user 0.02system 0:10.31elapsed 3%CPU
<schwimmbad.multiprocessing.MultiPool state=RUN pool_size=2>
0.25user 0.04system 0:05.50elapsed 5%CPU
<schwimmbad.multiprocessing.MultiPool state=RUN pool_size=4>
0.26user 0.04system 0:03.10elapsed 9%CPU

Looking at the elapsed times, I guess I am obtaining the same behaviour that your total times.
But there is a great discrepancy in the values of the %CPU with respect to your results.

When looking with the htop command in order to see how many processors are being used
simultaneously, I couldn't see much because the task is basically sleeping. So I modified the worker functions to be this:

def worker(task): i, num = task time.sleep(0.1) sum(np.linspace(0,1000,10000000)) return num ** 2

I repeated the analysis and obtained:
<schwimmbad.multiprocessing.MultiPool state=RUN pool_size=1>
86.76user 0.94system 1:37.75elapsed 89%CPU
<schwimmbad.multiprocessing.MultiPool state=RUN pool_size=2>
106.42user 0.94system 1:01.06elapsed 175%CPU
<schwimmbad.multiprocessing.MultiPool state=RUN pool_size=4>
190.03user 0.99system 0:54.65elapsed 349%CPU

Looking at htop I verified that the number of processors used in each case is correct.
although the elapsed times are not scaling as expected: 1.37, 1.01 and 0:54 seconds.
If you know of any reason for this behaviour please let me know.

  1. Now I address the original problem that made me open this issue. When running your example
    about selecting a pool with command line arguments, using a little heavier worker function
    and wrote my own time print inside:
import schwimmbad
import math
import time
import random

def worker(task):
    a, b = task
    return math.cos(a) + math.sin(b)

def main(pool):
    # Here we generate some fake data
    import random
    a = [random.uniform(0, 2*math.pi) for _ in range(10000000)]
    b = [random.uniform(0, 2*math.pi) for _ in range(10000000)]

    tasks = list(zip(a, b))
    results = list(pool.map(worker, tasks))
    pool.close()`

if __name__ == "__main__":
    import schwimmbad
    import time

    from argparse import ArgumentParser
    parser = ArgumentParser(description="Schwimmbad example.")

    group = parser.add_mutually_exclusive_group()
    group.add_argument("--ncores", dest="n_cores", default=1,
                       type=int, help="Number of processes (uses "
                                      "multiprocessing).")
    group.add_argument("--mpi", dest="mpi", default=False,
                       action="store_true", help="Run with MPI.")
    args = parser.parse_args()

    pool = schwimmbad.choose_pool(mpi=args.mpi, processes=args.n_cores)


    start_time = time.time()
    main(pool)
    execution_time = (time.time()-start_time)
    print('execution time =', execution_time)

The outputs are as follows:

$ \time python script-demo.py
execution time = 13.054332256317139
12.62user 0.62system 0:13.24elapsed 99%CPU

$ \time python script-demo.py --ncores=2
execution time = 15.358909130096436
21.56user 1.70system 0:15.56elapsed 149%CPU

$ \time python script-demo.py --ncores=4
execution time = 14.996244668960571
22.05user 2.18system 0:15.20elapsed 159%CPU

I see that the execution/elapsed times are not as expected (nor the %CPU ?).

Please let me know if I was not clear in something.
Thank you very much in advance.
All the best!
Martín

from schwimmbad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.