Git Product home page Git Product logo

parafun's Introduction

Citi

Citi/parfun

Lightweight parallelisation library for Python.


Parfun is a lightweight library providing helpers to make it easy to write and run a Python function in parallel and distributed systems.

The main feature of the library is its @parfun decorator that transparently executes standard Python functions following the map-reduce pattern:

from parfun import parfun
from parfun.combine.collection import list_concat
from parfun.partition.api import per_argument
from parfun.partition.collection import list_by_chunk

@parfun(
    split=per_argument(
        values=list_by_chunk
    ),
    combine_with=list_concat,
)
def list_pow(values: List[float], factor: float) -> List[float]:
    return [v**factor for v in values]

Features

  • Provides significant speedups to existing Python functions
  • Does not require any deep knowledge of parallel or distributed computing systems
  • Automatically estimates the optimal sub-task splitting (the partition size)
  • Automatically handles data transmission, caching and synchronization.
  • Supports various distributed computing backends, including Python's multiprocessing, Scaler or Dask.

Benchmarks

Parfun efficiently parallelizes short-duration functions.

When running a short 0.28-second ML function on an AMD Epyc 7313 16-Cores Processor, Parfun provides an impressive 7.4x speedup. Source code for this experiment here.

Benchmark Results

Quick Start

The built-in Sphinx documentation contains detailed usage instructions, implementation details, and an exhaustive API reference.

Use the doc Make target to build the HTML documentation from the source code:

make doc

The documentation's main page can then ben found at docs/build/html/index.html.

Take a look at our documentation's quickstart tutorial to get more examples and a deeper overview of the library.

Contributing

Your contributions are at the core of making this a true open source project. Any contributions you make are greatly appreciated.

We welcome you to:

Please review our community contribution guidelines and functional contribution guidelines to get started ๐Ÿ‘.

Code of Conduct

We are committed to making open source an enjoyable and respectful experience for our community. See CODE_OF_CONDUCT for more information.

License

This project is distributed under the Apache-2.0 License. See LICENSE for more information.

Contact

If you have a query or require support with this project, raise an issue. Otherwise, reach out to [email protected].

parafun's People

Contributors

jamieslome avatar rafa-be avatar sharpener6 avatar minchulshin254 avatar renovate[bot] avatar

Stargazers

 avatar  avatar Raphael Javaux avatar sรธren avatar Keyth M Citizen  avatar  avatar Bingbing Rao avatar  avatar

Watchers

Rhyddian avatar Daren Liang avatar  avatar  avatar

parafun's Issues

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

github-actions
.github/workflows/codeql.yml
  • actions/checkout v4
  • github/codeql-action v3
  • github/codeql-action v3
.github/workflows/dependency-review.yml
  • actions/checkout v4
  • actions/dependency-review-action v4
.github/workflows/linter.yml
  • actions/checkout v4
  • actions/setup-python v5
.github/workflows/pypi.yml
  • actions/checkout v4
  • actions/checkout v3
  • actions/setup-python v5
pep621
pyproject.toml
pip_requirements
requirements.txt

  • Check this box to trigger a request for Renovate to run again on this repository

Enable DependencyReview GitHub Action ๐Ÿ” ๐Ÿ“ฆ

The dependency review action scans your pull requests for dependency changes, and will raise an error if any vulnerabilities or invalid licenses are being introduced. The action is supported by an API endpoint that diffs the dependencies between any two revisions on your default branch.

name: 'Dependency Review'
on: [pull_request]

permissions:
  contents: read

jobs:
  dependency-review:
    runs-on: ubuntu-latest
    steps:
      - name: 'Checkout Repository'
        uses: actions/checkout@v4
      - name: 'Dependency Review'
        uses: actions/dependency-review-action@v4

Adds a new API for delayed computations

Adding a new decorator that executes the function in the background.

When a decorated function gets called, it returns immediately, and the computation will be executed in the background using the currently setup BackendEngine.

The return value of the function will behave like a regular value, except that it reading it will block until the computation finishes.

Example:

@delayed
def delayed_pow(a: float, b: float) -> float:
    return math.pow(a, b)

# This will compute all the `delayed_pow()` calls in parallel:
total_sum = sum([delayed_pow(x, 2) for x in range(0, 1000)])

# Operators should work too
print((delayed_pow(2, 2) + delayed_pow(4, 2)) * 4)

# Can be used without a decorator
a = delayed(math.sqrt)(16)
b = delayed(math.sqrt)(9)
print(a + b)

While being simpler to use that @parfun, it has a few disadvantages:

  • It does not try to find the optimal partitioning size of the parallelized tasks;
  • Exceptions are harder to understand, as these will be raised when the function return value is first accessed, not when then function is called.

As Pargraph already has a @delayed decorator, I'm thinking about using a different name. Here are some ideas:

  • @task;
  • @parasync;
  • @parallel_task;
  • @concurrent_task;

I already have a basic implementation working in new_delayed_api.

Remove deprecated APIs

There are a few deprecated APIs that should be discarded in Parfun 7.0.0.

Removing these old APIs will simplify the decorator's type signature.

Deprecated APIs include:

  • @parafun(partition_on=..., partition_with=...);
  • dfs_by_row(), partition_dfs_by_chunk(), dfs_by_group(), partition_dfs_group_by();
  • lists_by_chunk(), zip_partition_on_args();
  • lists_concat(), concat_lists(), unzip();
  • partition_nested();

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.