Git Product home page Git Product logo

azlustre's Introduction

azlustre

Deploy to Azure

This is a project to provision a Lustre cluster as quickly as possible. All the Lustre setup scripting is taken from the AzureHPC but the difference in this project is the Lustre cluster is provisioned through an ARM template using a custom image.

This project includes the following:

  • A packer script to build an image with the Lustre packages installed.
  • An ARM template to deploy a Lustre cluster using the image.

The ARM template performs the installation with cloud init where the installation scripts are embedded. The azuredeploy.json includes the embedded scripts but the repo includes script to create this from the azuredeploy_template.json.

Getting Started

Check out the repository:

git clone https://github.com/Azure/azlustre

Building the image

Packer is required for the build so download the latest version for your operating system from https://www.packer.io. It is distributed as a single file so just put it somewhere that is in your PATH. Go into the packer directory:

cd azlustre/packer

The following options are required to build:

Variable Description
var_subscription_id Azure subscription ID
var_tenant_id Tenant ID for the service principal
var_client_id Client ID for the service principal
var_client_secret Client password for the service principal
var_resource_group The resource group to put the image in (must exist)
var_image The image name to create

These can be read by packer from a JSON file. Use this template to create options.json and populate the fields:

{
    "var_subscription_id": "",
    "var_tenant_id": "",
    "var_client_id": "",
    "var_client_secret": "",
    "var_resource_group": "",
    "var_image": "lustre-7.8-lustre-2.13.5"
}

Use the following command to build with packer:

packer build -var-file=options.json centos-7.8-lustre-2.12.5.json

Once this successfully completes the image will be available.

Deploying the Lustre cluster

The "Deploy to Azure" button can be used once the image is available (alternatively the CLI can be used with az deployment group create). Below is a description of the parameters:

Parameter Description
name The name for the Lustre filesystem
mdsSku The SKU for the MDS VMs
ossSku The SKU for the OSS VMs
instanceCount The number of OSS VMs
rsaPublicKey The RSA public key to access the VMs
imageResourceGroup The name of the resource group containing the image
imageName The name of the Lustre image to use
existingVnetResourceGroupName The resource group containing the VNET where Lustre is to be deployed
existingVnetName The name of the VNET where Lustre is to be deployed
existingSubnetName The name of the subnet where Lustre is to be deployed
mdtStorageSku The SKU to use for the MDT disks
mdtCacheOption The caching option for the MDT disks (e.g. None or ReadWrite)
mdtDiskSize The size of each MDT disk
mdtNumDisks The number of disks in the MDT RAID (set to 0 to use the VM ephemeral disks)
ostStorageSku The SKU to use for OST disks
ostCacheOption The caching option for the OST disks (e.g. None or ReadWrite)
ostDiskSize The size of each OST disk
ostNumDisks The number of OST disks per OSS (set to 0 to use the VM ephemeral disks)
ossDiskSetup Either separate where each disk is an OST or raid to combine into a single OST

Options for Lustre Hierarchical Storage Management (HSM)

The additional parameters can be used to enable HSM for the Lustre deployment.

Parameter Description
storageAccount The storage account to use for HSM
storageContainer The container name to use
storageKey The key for the storage account

Options for Logging with Log Analytics

The additional parameters can be used to log metrics for the Lustre deployment.

Parameter Description
logAnalyticsWorkspaceId The log analytics workspace id to use
logAnalyticsKey The key for the log analytics account

Example configurations

When creating a Lustre configuration you pay attention to the following:

This section provides options for three types of setup:

  1. Ephemeral This is the cheapest option and uses local disks to the VMs. This can also provide the lowest latency as the physical storage resides on the host. Any VM failure will result in data loss but is a good option for scratch storage.

    Size: 7.6 GB per OSS

    Expected performance: 1600 MB/s per OSS (limited by NIC on VM)

  2. Persistent Premium This option uses premium disks attached to the VMs. A VM failing will not result in data loss.

    Size: 6 GB per OSS

    Expected performance: 1152 MB/s per OSS (limited by uncached disk throughput)

  3. Persistent Standard This option uses standard disks attached to the VMs. This requires relatively higher storage per OSS since the larger disks are needed in order to maximise the bandwidth to storage for a VM.

    Size: 32 GB per OSS

    Expected performance: 1152 MB/s per OSS (limited by uncached disk throughput)

These are the parameters that can be used when deploying:

Parameter Ephemeral Persistent Premium Persistent Standard
mdsSku Standard_L8_v2 Standard_D8_v3 Standard_D8_v3
ossSku Standard_L48_v2 Standard_D48_v3 Standard_D48_v3
mdtStorageSku Premium_LRS Premium_LRS Standard_LRS
mdtCacheOption None ReadWrite ReadWrite
mdtDiskSize 0 1024 1024
mdtNumDisks 0 2 2
ostStorageSku Premium_LRS Premium_LRS Standard_LRS
ostCacheOption None None None
ostDiskSize 0 1024 8192
ostNumDisks 0 6 4

Generating the embedded ARM template

This is only required when making changes to the scripts.

The scripts are placed in a self-extracting compressed tar archive and embedded into the ARM template to be executed by cloud-init. The cloud-ci.sh script performs this step and the build.sh executes this with the parameters used for the currently distributed ARM template in the repository.

Note: The makeself tool is required for this step.

azlustre's People

Contributors

edwardsp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.