Git Product home page Git Product logo

helmholtz-analytics / heat Goto Github PK

View Code? Open in Web Editor NEW
192.0 10.0 54.0 19.76 MB

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

Home Page: https://heat.readthedocs.io/

License: MIT License

Python 94.31% Dockerfile 0.02% Jupyter Notebook 5.54% Shell 0.13%
gpu tensors distributed machine-learning mpi numpy python pytorch array-api data-analytics data-processing data-science hpc massive-datasets mpi4py multi-gpu multi-node-cluster parallelism

heat's Introduction


Heat is a distributed tensor framework for high performance data analytics.

Project Status

CPU/CUDA/ROCm tests Documentation Status coverage license: MIT PyPI Version Downloads Anaconda-Server Badge fair-software.eu OpenSSF Scorecard OpenSSF Best Practices DOI Benchmarks Code style: black

Table of Contents

What is Heat for?

Heat builds on PyTorch and mpi4py to provide high-performance computing infrastructure for memory-intensive applications within the NumPy/SciPy ecosystem.

With Heat you can:

  • port existing NumPy/SciPy code from single-CPU to multi-node clusters with minimal coding effort;
  • exploit the entire, cumulative RAM of your many nodes for memory-intensive operations and algorithms;
  • run your NumPy/SciPy code on GPUs (CUDA, ROCm, coming up: Apple MPS).

For a example that highlights the benefits of multi-node parallelism, hardware acceleration, and how easy this can be done with the help of Heat, see, e.g., our blog post on trucated SVD of a 200GB data set.

Check out our coverage tables to see which NumPy, SciPy, scikit-learn functions are already supported.

If you need a functionality that is not yet supported:

Check out our features and the Heat API Reference for a complete list of functionalities.

Features

  • High-performance n-dimensional arrays
  • CPU, GPU, and distributed computation using MPI
  • Powerful data analytics and machine learning methods
  • Seamless integration with the NumPy/SciPy ecosystem
  • Python array API (work in progress)

Getting Started

Go to Quick Start for a quick overview. For more details, see Installation.

You can test your setup by running the heat_test.py script:

mpirun -n 2 python heat_test.py

It should print something like this:

x is distributed:  True
Global DNDarray x:  DNDarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=ht.int32, device=cpu:0, split=0)
Global DNDarray x:
Local torch tensor on rank  0 :  tensor([0, 1, 2, 3, 4], dtype=torch.int32)
Local torch tensor on rank  1 :  tensor([5, 6, 7, 8, 9], dtype=torch.int32)

Check out our Jupyter Notebook Tutorials, choose local to try things out on your machine, or hpc if you have access to an HPC system.

The complete documentation of the latest version is always deployed on Read the Docs.

Installation

Requirements

Basics

  • python >= 3.8
  • MPI (OpenMPI, MPICH, Intel MPI, etc.)
  • mpi4py >= 3.0.0
  • pytorch >= 1.11.0

Parallel I/O

  • h5py
  • netCDF4

GPU support

In order to do computations on your GPU(s):

  • your CUDA or ROCm installation must match your hardware and its drivers;
  • your PyTorch installation must be compiled with CUDA/ROCm support.

HPC systems

On most HPC-systems you will not be able to install/compile MPI or CUDA/ROCm yourself. Instead, you will most likely need to load a pre-installed MPI and/or CUDA/ROCm module from the module system. Maybe, you will even find PyTorch, h5py, or mpi4py as (part of) such a module. Note that for optimal performance on GPU, you need to usa an MPI library that has been compiled with CUDA/ROCm support (e.g., so-called "CUDA-aware MPI").

pip

Install the latest version with

pip install heat[hdf5,netcdf]

where the part in brackets is a list of optional dependencies. You can omit it, if you do not need HDF5 or NetCDF support.

conda

The conda build includes all dependencies including OpenMPI.

 conda install -c conda-forge heat

Support Channels

Go ahead and ask questions on GitHub Discussions. If you found a bug or are missing a feature, then please file a new issue. You can also get in touch with us on Mattermost (sign up with your GitHub credentials). Once you log in, you can introduce yourself on the Town Square channel.

Contribution guidelines

We welcome contributions from the community, if you want to contribute to Heat, be sure to review the Contribution Guidelines and Resources before getting started!

We use GitHub issues for tracking requests and bugs, please see Discussions for general questions and discussion. You can also get in touch with us on Mattermost (sign up with your GitHub credentials). Once you log in, you can introduce yourself on the Town Square channel.

If you’re unsure where to start or how your skills fit in, reach out! You can ask us here on GitHub, by leaving a comment on a relevant issue that is already open.

If you are new to contributing to open source, this guide helps explain why, what, and how to get involved.

Resources

Parallel Computing and MPI:

mpi4py

License

Heat is distributed under the MIT license, see our LICENSE file.

Citing Heat

Please do mention Heat in your publications if it helped your research. You can cite:

  • Götz, M., Debus, C., Coquelin, D., Krajsek, K., Comito, C., Knechtges, P., Hagemeier, B., Tarnawa, M., Hanselmann, S., Siggel, S., Basermann, A. & Streit, A. (2020). HeAT - a Distributed and GPU-accelerated Tensor Framework for Data Analytics. In 2020 IEEE International Conference on Big Data (Big Data) (pp. 276-287). IEEE, DOI: 10.1109/BigData50022.2020.9378050.
@inproceedings{heat2020,
    title={{HeAT -- a Distributed and GPU-accelerated Tensor Framework for Data Analytics}},
    author={
      Markus Götz and
      Charlotte Debus and
      Daniel Coquelin and
      Kai Krajsek and
      Claudia Comito and
      Philipp Knechtges and
      Björn Hagemeier and
      Michael Tarnawa and
      Simon Hanselmann and
      Martin Siggel and
      Achim Basermann and
      Achim Streit
    },
    booktitle={2020 IEEE International Conference on Big Data (Big Data)},
    year={2020},
    pages={276-287},
    month={December},
    publisher={IEEE},
    doi={10.1109/BigData50022.2020.9378050}
}

FAQ

Work in progress...

Acknowledgements

This work is supported by the Helmholtz Association Initiative and Networking Fund under project number ZT-I-0003 and the Helmholtz AI platform grant.

This project has received funding from Google Summer of Code (GSoC) in 2022.


heat's People

Contributors

asrani1 avatar ben-bou avatar bhagemeier avatar cdebus avatar claudiacomito avatar coquelin77 avatar dependabot[bot] avatar dhruv454000 avatar fosterfeld avatar github-actions[bot] avatar inzlinger avatar juanpedroghm avatar krajsek avatar lehr-fa avatar lenablind avatar lscheib avatar lucaspataro avatar markus-goetz avatar mrfh92 avatar mtar avatar mystic-slice avatar neosunhan avatar pre-commit-ci[bot] avatar rainman110 avatar sai-suraj-27 avatar sebimarkgraf avatar shahpratham avatar simon-schmitz avatar step-security-bot avatar theslimvreal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

heat's Issues

Fix project description on PyPI

PyPI requires both a description and long_description to be set, with the former being used for listing a project among others and the latter for the detailed project page.

ht.floor: fix docstring to properly render math

Everytime I edit edit tensor.py and run pytest, pytest prints a warning that the docstring of ht.floor contains an unknown escape sequence \lfloor. Mysteriously, this warning does not show up in the second run of pytest, which might explain why this was not caught in CI yet.

Introduce possibility to allocate tensors on different devices

Possible design:

  • An allocation parameter for all factory functions
import heat as ht
a = ht.zeros((10,2,), device='gpu<:id>')
  • .cpu() and .gpu() methods for the tensor class that allow adhoc switching between devices
  • global default device toggle, using a function call and/or a context manager
import heat as ht
ht.device('gpu<:id>')

#alternative
with ht.gpu(<':id'>):
    ht.zeros()

Fix coverage report

Apparently, our coverage report is not generated again on every build. It seems like the 'coverage' executable is not among the cached contents for build, but the Python library is.

Add optional netCDF support

  • Allow rudimentary netCDF support akin to what the HDF5 layer can already do
  • Offer a possibility to write netCDF files

ht.randn(): unit test expects pytorch to detect malformed input

The unit test that checks whether ht.randn throws a ValueError depends on the fact that pytorch
shall detect malformed input. Unfortunately, this is not the case for my pytorch installation (trying to print the resulting tensor, however, results in an infinite loop, such that we can be sure that pytorch does not magically know how to handle negative dimensions ... )

For completeness the pytest output:

        with self.assertRaises(ValueError):
>           ht.randn(-1, 3, dtype=ht.float64)
E           AssertionError: ValueError not raised

heat/core/tests/test_tensor.py:278: AssertionError

There seems to be already some all_ints variable available in ht.randn, which could be augmented to also check for positive arguments?

Provide a array() tensor factory function

Similar to numpy, it would be nice to have a array factory function to create tensors from local data.
Major implementation challenge is how to deal with imbalanced data distributions/initial passed arrays. Do we need to balance the data?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.