Git Product home page Git Product logo

slurm-gcp's Introduction

Slurm on Google Cloud Platform

Guidance on usage

Caution

Terraform modules in this repo are not meant to be used directly. Instead, Cloud HPC Toolkit is the recommended way to use Slurm in GCP.

Notices

Important

The naming scheme for SchedMD published images has changed with release 5.7.3. This is to ensure no incompatibilities between the terraform modules and old images that could be in the same family as a newer release. From now on, the image family includes the slurm-gcp major and minor version instead of the slurm version.

See Images doc for the latest published images.

FAQ | Troubleshooting | Glossary

Overview

slurm-gcp is an open-source software solution that enables setting up Slurm clusters on Google Cloud Platform with ease. With it, you can create and manage Slurm cluster infrastructure in GCP, deployed in different configurations.

Google's HPC Toolkit, on github, can be used to manage and deploy Slurm clusters and other supporting infrastrucutre via HPC Blueprints.

Image Support

See supported Operating Systems and published Image Family for machine image support.

SchedMD

SchedMD provides professional services and commercial support to help you get up and running and stay running.

Issues and/or enhancement requests can be submitted to SchedMD's Bugzilla.

Also, join community discussions on either the Slurm User mailing list or the Google Cloud & Slurm Community Discussion Group.

Cluster Configurations

slurm-gcp can be deployed and used in different configurations and methods to meet your computing needs.

See HPC Blueprints for HPC Toolkit example cluster configurations that are production ready.

Cloud

All Slurm cluster resources will exist in the cloud.

See the Cloud Cluster Guide for details.

Hybrid

Only Slurm compute nodes will exist in the cloud. The Slurm controller and other Slurm components will remain in the onprem environment.

See the Hybrid Cluster Guide for details.

Multi-Cluster/Federation

Two or more clusters are connected, allowing for jobs to be submitted from and ran on different clusters. This can be a mix between onprem and cloud clusters.

See the Federated Cluster Guide for details.

Upgrade to v6

See the Upgrade to v6 Guide for details.

TPU support

slurm-gcp supports using TPU-vm nodes. See TPU guide for details.

Help and Support

Please reach out to us here. We will be happy to support you!

slurm-gcp's People

Contributors

bsngardner avatar gaijin03 avatar skylermalinowski avatar jvilarru avatar snickny avatar kdbinder avatar tpdownes avatar mr0re1 avatar wyattgorman avatar wkharold avatar fluidnumerics-joe avatar mmm avatar naterini avatar danielahlin avatar harshthakkar01 avatar dependabot[bot] avatar nick-stroud avatar honogbolu avatar cdunbar13 avatar wardharold avatar yarikoptic avatar alyssa-sm avatar ziggerzz avatar zefdelgadillo avatar middelkoopt avatar mcmult avatar theacodes avatar jmbr avatar elliotwaite avatar douglasjacobsen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.