Duplicate entries for the same node appear in demand output

Azure CycleCloud Autoscaling Library

The cyclecloud-scalelib project provides Python helpers to simplify autoscaler development for any scheduler in Azure using Azure CycleCloud and the Azure CycleCloud REST API to orchestrate resource creation in Microsoft Azure.

Autoscale Example

The primary use-case of this library is to facilitate and standardize scheduler autoscale integrations. An example of such an integration with Celery is included in this project.

Building the project

The cyclecloud-scalelib project is generally used in a Python 3 virtualenv and has several standard python dependencies, but it also depends on the Azure CycleCloud Python Client Library.

Pre-requisites

The instructions below assume that:

you have python 3 available on your system
you have access to an Azure CycleCloud installation

Before attempting to build the project, obtain a copy of the Azure CycleCloud Python Client library. You can get the wheel distribution from the /opt/cycle_server/tools/ directory in your Azure CycleCloud installation or you can download the wheel from the CycleCloud UI following the instructions here.

The instructions below assume that you have copied the cyclecloud-api.tar.gz to your working directory.

Creating the virtualenv

    # If Cyclecloud is installed on the current machine:
    # cp /opt/cycle_server/tools/cyclecloud_api*.whl .

    python3 -m venv ~/.virtualenvs/autoscale/
    . ~/.virtualenvs/autoscale/bin/activate
    pip install -r ./dev-requirements.txt
    pip install ./cyclecloud_api*.whl
    python setup.py build
    pip install -e .

Testing the project:

The project includes several helpers for contributors to validate, test and format changes to the code.

    # OPTIONAL: use the following to type check / reformat code
    python setup.py types
    python setup.py format
    python setup.py test

Resources

Default Resources

The cyclecloud-scalelib application matches scheduler resources to azure cloud resources to provide rich autoscaling and cluster configuration tools. We call these default_resources because they exist even for nodes that have not been materialized, versus resources that are defined after the node has joined the cluster, and potentially overrides these.

Here is an example for matching PBSPro:

{"default_resources": [
   {
      "select": {},
      "name": "ncpus",
      "value": "node.vcpu_count"
   },
   {
      "select": {},
      "name": "group_id",
      "value": "node.placement_group"
   },
   {
      "select": {},
      "name": "host",
      "value": "node.hostname"
   },
   {
      "select": {},
      "name": "mem",
      "value": "node.memory"
   },
   {
      "select": {},
      "name": "vm_size",
      "value": "node.vm_size"
   },
   {
      "select": {},
      "name": "disk",
      "value": "size::20g"
   }]
}

Note that disk is currently hardcoded to size::20g because of platform limitations to determine how much disk a node will have. In the select statement, we can filter how the resources are applied, i.e. by VM Size or nodearray etc. Here is an example of handling VM Size specific disk size and gpus for a nodearray.

   {
      "select": {"node.vm_size": "Standard_F2"},
      "name": "disk",
      "value": "size::20g"
   },
   {
      "select": {"node.vm_size": "Standard_H44rs"},
      "name": "disk",
      "value": "size::2t"
   },
   {
      "select": {"node.nodearray": "gpuarray"},
      "name": "ngpus",
      "value": 8
   }

Note that these are applied in order , and once a default value is defined for a matching potential node, other defaults are ignored. This means that you should always use your most restrictive select filters first.

In other words, if we want to override the pcpu_count for just one nodearray, doing this will work:

{
 "select": "node.nodearray": "special-nodearray",
 "name": "ncpus",
 "value": 42
},
{"select": {},
"name": "ncpus",
"value": "node.pcpu_count"
}

However, this will ignore the second definition.

{"select": {},
"name": "ncpus",
"value": "node.pcpu_count"
},
{
 "select": "node.nodearray": "special-nodearray",
 "name": "ncpus",
 "value": 42
},

Node Properties

Property	Type	Description
node.bucket_id	BucketId	UUID for this combination of NodeArray, VM Size and Placement Group
node.colocated	bool	Will this node be put into a VMSS with a placement group, allowing infiniband
node.cores_per_socket	int	CPU cores per CPU socket
node.create_time	datetime	When was this node created, as a datetime
node.create_time_remaining	float	How much time is remaining for this node to reach the `Ready` state
node.create_time_unix	float	When was this node created, in unix time
node.delete_time	datetime	When was this node deleted, as datetime
node.delete_time_unix	float	When was this node deleted, in unix time
node.exists	bool	Has this node actually been created in CycleCloud yet
node.gpu_count	int	GPU Count
node.hostname	Optional[Hostname]	Hostname for this node. May be `None` if the node has not been given one yet
node.hostname_or_uuid	Optional[Hostname]	Hostname or a UUID. Useful when partitioning a mixture of real and potential nodes by hostname
node.infiniband	bool	Does the VM Size of this node support infiniband
node.instance_id	Optional[InstanceId]	Azure VM Instance Id, if the node has a backing VM.
node.keep_alive	bool	Is this node protected by CycleCloud to prevent it from being terminated.
node.last_match_time	datetime	The last time this node was matched with a job, as datetime.
node.last_match_time_unix	float	The last time this node was matched with a job, in unix time.
node.location	Location	Azure location for this VM, e.g. `westus2`
node.memory	Memory	Amount of memory, as reported by the Azure API. OS reported memory will differ.
node.name	NodeName	Name of the node in CycleCloud, e.g. `execute-1`
node.nodearray	NodeArrayName	NodeArray name associated with this node, e.g. `execute`
node.pcpu_count	int	Physical CPU count
node.placement_group	Optional[PlacementGroup]	If set, this node is put into a VMSS where all nodes with the same placement group are tightly coupled
node.private_ip	Optional[IpAddress]	Private IP address of the node, if it has one.
node.spot	bool	If true, this node is taking advantage of unused capacity for a cheaper rate
node.state	NodeStatus	State of the node, as reported by CycleCloud.
node.vcpu_count	int	Virtual CPU Count
node.version	str	Internal version property to handle upgrades
node.vm_family	VMFamily	Azure VM Family of this node, e.g. `standardFFamily`
node.vm_size	VMSize	Azure VM Size of this node, e.g. `Standard_F2`

Constraints

And

The logical 'and' operator ensures that all of its child constraints are met.

{"and": [{"ncpus": 1}, {"mem": "memory::4g"}]}

Note that and is implied when combining multiple resource definitions in the same dictionary. e.g. the following have identical semantic meaning, the latter being shorthand for the former.

{"and": [{"ncpus": 1}, {"mem": "memory::4g"}]}

{"ncpus": 1, "mem": "memory::4g"}

ExclusiveNode

Defines whether, when allocating, if a node will exclusively run this job.

{"exclusive": true}

-> One and only one iteration of the job can run on this node.

{"exclusive-task": true}

-> One or more iterations of the same job can run on this node.

InAPlacementGroup

Ensures that all nodes allocated will be in any placement group or not. Typically this is most useful to prevent a job from being allocated to a node in a placement group.

{"in-a-placement-group": true}

{"in-a-placement-group": false}

MinResourcePerNode

Filters for nodes that have at least a certain amount of a resource left to allocate.

{"ncpus": 1}

{"mem": "memory::4g"}

{"ngpus": 4}

Or, shorthand for combining the above into one expression

{"ncpus": 1, "mem": "memory::4g", "ngpus": 4}

Never

Rejects every node. Most useful when generating a complex node constraint that cannot be determined to be satisfiable until it is generated. For example, say a scheduler supports an 'excluded_users' list for scheduler specific "projects". When constructing a set of constraints you may realize that this user will never be able to run a job on a node with that project.

{"or":
    [{"project": "open"},
     {"project": "restricted",
      "never": "User is denied access to this project"}
    ]
}

NodePropertyConstraint

Similar to NodeResourceConstraint, but these are constraints based purely on the read only node properties, i.e. those starting with 'node.'

{"node.vm_size": ["Standard_F16", "Standard_E32"]}

{"node.location": "westus2"}

{"node.pcpu_count": 44}

Note that the last example does not allocate 44 node.pcpu_count, but simply matches nodes that have a pcpu_count of exactly 44.

NodeResourceConstraint

These are constraints that filter out which node is matched based on read-only resources.

{"custom_string1": "custom_value"}

{"custom_string2": ["custom_value1", "custom_value2"]}

For read-only integers you can programmatically call NodeResourceConstraint("custom_int", 16) or use a list of integers

{"custom_integer": [16, 32]}

For shorthand, you can combine the above expressions

{
    "custom_string1": "custom_value",
    "custom_string2": ["custom_value1", "custom_value2"],
    "custom_integer": [16, 32]
}

Not

Logical 'not' operator negates the single child constraint.

Only allocate machines with GPUs

{"not": {"node.gpu_count": 0}}

Only allocate machines with no GPUs available

{"not": {"ngpus": 1}}

Or

Logical 'or' for matching a set of child constraints. Given a list of child constraints, the first constraint that matches is the one used to decrement the node. No further constraints are considered after the first child constraint has been satisfied. For example, say we want to use a GPU instance if we can get a spot instance, otherwise we want to use a non-spot CPU instance.

{"or": [{"node.vm_size": "Standard_NC6", "node.spot": true},
        {"node.vm_size": "Standard_F16", "node.spot": false}]
}

SharedConsumableConstraint

Represent a shared consumable resource, for example a queue quota or number of licenses. Please use the SharedConsumableResource object to represent this resource.

While there is a json representation of this object, it is up to the author to create the SharedConsumableResources programmatically so programmatic creation of this constraint is recommended.

# global value
SHARED_RESOURCES = [SharedConsumableConstraint(resource_name="licenses",
                                               source="/path/to/license_limits",
                                               initial_value=1000,
                                               current_value=1000)]
    
def make_constraint(value: int) -> SharedConsumableConstraint:
    return SharedConsumableConstraint(SHARED_RESOURCES, value)

SharedNonConsumableConstraint

Similar to a SharedConsumableConstraint, except that the resource that is shared is not consumable (like a string etc). Please use the SharedNonConsumableResource object to represent this resource.

While there is a json representation of this object, it is up to the author to create the SharedNonConsumableResource programmatically so programmatic creation of this constraint is recommended.

# global value
SHARED_RESOURCES = [SharedNonConsumableResource(resource_name="prodversion",
                                                source="/path/to/prod_version",
                                                current_value="1.2.3")]
    
def make_constraint(value: str) -> SharedConstraint:
    return SharedConstraint(SHARED_RESOURCES, value)

XOr

Similar to the or operation, however one and only one of the child constraints may satisfy the node. Here is a trivial example where we have a failover for allocating to a second region but we ensure that only one of them is valid at a time.

{"xor": [{"node.location": "westus2"},
         {"node.location": "eastus"}]
}

Timeouts

By default we set idle and boot timeouts across all nodes.

   "boot_timeout": 3600

You can also set these per nodearray.

   "boot_timeout": {"default": 3600, "nodearray1": 7200, "nodearray2": 900},

Incorrect VM Size Information

In some regions or subscriptions, CycleCloud cannot get the proper VM Size information for all VM Sizes. Often this results in an incorrect number of GPUs being reported, otherwise all attributes are incorrect. By default, as of 1.0.2, an internal record of all public regions and vm_sizes - hpc/autoscale/node/vm_sizes.json - will fallback on a common US region for US Gov/DOD regions.

At the top of this file, you will find the following

{
  "proxied-locations": {
    "_comment_": "This is a mapping of locations that are not available in the Azure API, but are proxied to another location.",
    "usdodcentral": "southcentralus",
    "usdodeast": "southcentralus",
    "usdodtexas": "southcentralus",
    "usgovarizona": "southcentralus",
    "usgoviowa": "southcentralus",
    "usgovtexas": "southcentralus",
    "usgovvirginia": "southcentralus",
    "usseceast": "southcentralus",
    "ussecwest": "southcentralus"
  },

This states that for these locations we should just use the data on hand for southcentralus. A user can modify this however they want after installation, there is no requirement that these map to southcentralus.

Note Please remember that you can also always define a default_resource for your gpu resource with an explicit integer. This is the preferred way to deal with this issue, however this can become painful when dealing with many VM Sizes that are simply missing basic information.

Lastly, it should be noted that if the GPUs defined in this file are higher than CycleCloud reports, then this file takes precedence. This is due to the subscription that CycleCloud is using is being told that the VM Size has 0 GPUs in some locked down regions when the GPU count should be higher.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

	if node_id not in rows_by_id:
	# first time we see it, just put an entry
	rows_by_id[node_id] = tuple([node_id, node.hostname, now, now])

	if node.required:
	rec = list(rows_by_id[node_id])
	rec[-2] = now
	rows_by_id[node_id] = tuple(rec)

azure / cyclecloud-scalelib Goto Github PK

cyclecloud-scalelib's Introduction