Git Product home page Git Product logo

parallelization's Introduction

Overview

Personal project on understanding mulit threading and multi processing based on Python.
The idea for this is based on the following Medium Post.

The tests for the scripts have been made on the MacBook Pro (14", 2021) using the Apple M1 Pro with 8 CPU Kernels.

Multi Threading in Python

As described in the article, multi threading can make sense in the case of simple I/O (Input/Output) tasks.

We are going to compare a simple downloading tasks without any multi-threading (see notebook simple_io_task.py) to the same task using mulit threading (see notebook multithreaded_io_task.py).

Results

To measure the time needed to execute the notebooks I used the commands time python simple_io_task.py and the same for the multithreaded notebook time python multithreaded_io_task.py.

simple_io_task.py: 1,95s user 0,28s system 18% cpu 12,333 total

multithreaded_io_task.py: 1,00s user 0,14s system 46% cpu 2,485 total

We can already see a clear time gain in using the multi threaded version of our notebook.

You may also want to investigate on why multi-threading in Pyhton may not always be a good idea (or faster than a normal implementation) as described in this article. Spoiler Alert: As explained in the Medium article, the reason for this is the Python GIL (Global Interpreter Lock).

Multiprocessing in Python

In this part we want to have a look at parallelism in Python, e.g. using the multiple CPU's available on my Mac to achieve performance gains by executing multiple tasks literally at the same time. To make use of multiprocessing we need a CPU-bound task. We will use the same function as proposed in the Mediun article above, which is a function appending random integers to a list.

Results

simple_cpu_task.py: 11,42s user 0,23s system 99% cpu 11,651 total

multiprocessed_cpu_task.py: 12,65s user 0,31s system 197% cpu 6,572 total

We notice almost only half the time needed when performing the task with multiprocessing. The CPU usage also confirms the correct execution utilizing 2 CPU's when performing multiprocessing.

Interestingly the CPU user time in the multiprocessing example is higher than the total time needed for executing the task. This is because CPU user time is the accumulated time for all CPU's. Since we are using 2 CPU's the time for each is added, therefore resulting in a higher CPU user time than total time. For more on the meaning of user, sys and total check out this Stackoverflow post

parallelization's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.