Git Product home page Git Product logo

dragon's Introduction

Dragon is a composable distributed run-time for managing dynamic processes, memory, and data at scale through high-performance communication objects. Some of the key use cases for Dragon include distributed applications and workflows for analytics, HPC, and converged HPC/AI.

Dragon brings scalable, distributed computing to a wide range of programmers. It runs on your laptop, cluster, supercomputer, or any collection of networked computers. Dragon provides an environment where you can write programs that transparently use all the computers or nodes in your cluster. And the same program will run on your laptop with no change, just at smaller scale.

While Dragon implements many APIs, the primary one is Python multiprocessing. If you write your programs using Python's multiprocessing API, your program will run now and in the future. Dragon does not re-define the multiprocessing API, but it does extend it in some circumstances. For example, Dragon enables multiprocessing's Process and Pool objects to execute across the nodes of a distributed system, extending those objects which previously only supported parallel execution on a single node.

Dive into Dragon in the next section to see how easy it is to begin Writing and running distributed multiprocessing programs!

Dragon requires a Linux POSIX environment. It is not supported under Microsoft Windows (r) or Apple Mac OS (r). However, it can run in a container on either platform with Docker installed. Under Microsoft Windows you must install Docker with the WSL (Windows Subsystem for Linux). The Dragon repository is configured for use as a Docker container or it can run within a Linux OS either on bare hardware or within a VM.

You can download or clone the repository from here and open it inside a container to run it in single node mode within the container. The setup below allows you to configure and run Dragon either inside a container or directly on a Linux machine. If you have not used containers before, download Docker and Visual Studio Code (VSCode) to your laptop, start Docker, and open the Dragon repository in VSCode. VSCode will prompt you to reopen the directory inside a container. If you run inside a container you will need to first build Dragon. Open a terminal pane in VSCode and type the following from the root directory of the container.

Whether in a container or not, once you have a terminal window, navigate to the root directory of the repository and type the following.

. hack/clean_build

After building has completed you will have Dragon built and be ready to write and run Dragon programs.

If you wish to run multi-node or don't want to run in a container, you must set up your environment to run Dragon programs. Choose the version of Dragon to download that goes with your installed version of Python. Python 3.9+ is required to run Dragon. You must have Python installed and it must be in your path somewhere.

The untarred distribution file contains several subdirectories. Run the ./dragon-install file in that root directory to create a python virtual environment and install two wheel files. For further details, follow the instructions that you find in that README.md file in the distribution directory.

This set of steps show you how to run a parallel "Hello World" application using Python multiprocessing with Dragon. Because Dragon's execution context in multiprocessing is not dependent upon file descriptors in the same way as the standard "fork", "forkserver", or "spawn" execution contexts, Dragon permits multiprocessing to scale to orders of magnitude more processes on a single node (think tens-of-thousands instead of hundreds). For single nodes with a larger number of cores, starting as many processes as there are cores is not always possible with multiprocessing but it is possible when using Dragon with multiprocessing.

This demo program will print a string of the form "Hello World from $PROCESSID with payload=$RUNNING_INT" using every cpu on your system. So beware if you're on a supercomputer and in an allocation, your console will be flooded.

Create a file hello_world.py containing:

 import dragon
 import multiprocessing as mp
 import time


 def hello(payload):

     p = mp.current_process()

     print(f"Hello World from {p.pid} with payload={payload} ", flush=True)
     time.sleep(1) # force all cpus to show up


 if __name__ == "__main__":

     mp.set_start_method("dragon")

     cpu_count = mp.cpu_count()
     with mp.Pool(cpu_count) as pool:
         result = pool.map(hello, range(cpu_count))

and run it by executing dragon hello_world.py. This will result in an output like this:

dir >$dragon hello_world.py
Hello World from 4294967302 with payload=0
Hello World from 4294967301 with payload=1
Hello World from 4294967303 with payload=2
Hello World from 4294967300 with payload=3
+++ head proc exited, code 0

Dragon can run on a supercomputer with a workload manager or on your cluster. The hello world example from the previous section can be run across multiple nodes without any modification. The only requirement is that you have an allocation of nodes (obtained with salloc or qsub on a system with the Slurm workload manager) and then execute dragon within that allocation. Dragon will launch across all nodes in the allocation by default, giving you access to all processor cores on every node. If you don't have Slurm installed on your system or cluster, there are other means of running Dragon multi-node as well. For more details see Running Dragon on a Multi-Node System.

Dragon seeks to foster an open and welcoming environment - Please see the Dragon Code of Conduct for more details.

We welcome contributions from the community. Please see our contributing guide.

The Dragon team is:

dragon's People

Contributors

kentdlee avatar mendygral avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.