A terraform module to create an Azure Scale-Set using Spot VMs with GPUs. Designed for use with this azure-tf-starter guide.
Azure Scale-Sets are load-ballanced pools of identical VMs that can be manually or automatically scaled horizontally uo to 1000 VMs or down to 0. Scale-Sets can be created using Spot VMs for significant price discounts (usually 40-80%).
Azure uses two hypervsisor types. Gen1
which is based on legacy BIOS, and Gen2
which is based on UEFI. Many VM families only support one or the other, though some support both. You will need to check here which type is required by the VM family you want to use. You can find the list of Azure's Instance Families here.
Once you know what instance type you're looking for, you can check the current spot pricing here. Like all the other major clouds Azure uses quotas. These may be too low for you to create certain types of virtual machines, GPUs, Spot instances, or Low-Priority VMs. You can request quota changes via the portal here.
Not every Azure datacenter has every type of machine. You will need to check if the machine you want is availbe in the datacenter you will be using. The example query below can help you check, just change the
size
field to the SKU you want to check.az vm list-skus --location "westeurope" \ --size Standard_N \ --output table
az vm image list --all \
--publisher Canonical \
--query "[?contains(sku,'22_04-daily-lts-gen2')].{version:version,sku:sku,architecture:architecture}" --output table
module "scale-set" {
source = "github.com/cloudymax/modules-azure-tf-scale-set"
# Project settings
environment = local.environment
location = local.location
resource_group = local.resource_group
allowed_ips = local.allowed_ips
# Scale Set VM settings
scale_set_name = "scale-set"
vm_sku = "Standard_NV6ads_A10_v5"
vm_instances = 1
priority = "Spot"
spot_restore_enabled = true
spot_restore_timeout = "PT1H30M"
eviction_policy = "Deallocate"
max_bid_price = "0.24"
overprovision = false
ultra_ssd_enabled = false
scale_in_rule = "NewestVM"
scale_in_force_deletion_enabled = true
cloud_init_path = "cloud-init.txt"
vm_admin_username = local.admin_identity
vm_name_prefix = "${local.environment}-"
vm_network_interface = "vm-nic"
# Network options
vnet_name = module.environment-base.vnet_name
vnet_subnet_name = "scale-set-subnet"
subnet_prefixes = ["192.168.1.0/24"]
# OS Disk options
vm_os_disk_caching = "ReadWrite"
vm_os_storage_account_type = "Premium_LRS"
vm_os_disk_size_gb = "32"
vm_os_disk_write_accelerator_enabled = false
# Storage Disk options
vm_data_disk_caching = "None"
vm_data_storage_account_type = "PremiumV2_LRS"
vm_data_disk_size_gb = "32"
vm_data_disk_write_accelerator_enabled = false
vm_data_disk_create_option = "Empty"
# OS Images settings
vm_source_image_publisher = "Canonical"
vm_source_image_offer = "0001-com-ubuntu-server-focal-daily"
vm_source_image_sku = "20_04-daily-lts-gen2"
vm_source_image_verson = "20.04.202303090"
# Storage account
storage_account_url = module.environment-base.storage_account.primary_blob_endpoint
# Key Vault
keyvault_id = module.environment-base.kv_id
# Managed Identity
admin_users = ["${module.environment-base.managed_identity_id}"]
# Network Settings
vm_net_iface_name = "vm-nic"
vm_net_iface_ipconfig_name = "vm-nic-config"
vm_net_iface_private_ip_address_allocation = "Dynamic"
}
- The terraform documentation for
azurerm_linux_virtual_machine_scale_set
can be found here.
No requirements.
Name | Version |
---|---|
azurerm | n/a |
random | n/a |
template | n/a |
No modules.
Name | Type |
---|---|
azurerm_key_vault_secret.vm_admin_password | resource |
azurerm_linux_virtual_machine_scale_set.scale_set | resource |
azurerm_network_security_group.scaleset_security_group | resource |
azurerm_network_security_rule.ssh | resource |
azurerm_subnet.vm_subnet | resource |
random_password.vm_admin_password | resource |
random_pet.vm_name | resource |
template_cloudinit_config.config | data source |
template_file.cloudconfig | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
admin_users | n/a | list(string) |
n/a | yes |
allowed_ips | n/a | list(string) |
n/a | yes |
cloud_init_path | n/a | string |
n/a | yes |
environment | n/a | string |
n/a | yes |
eviction_policy | n/a | string |
n/a | yes |
keyvault_id | n/a | string |
n/a | yes |
location | n/a | string |
n/a | yes |
max_bid_price | n/a | string |
n/a | yes |
overprovision | n/a | string |
n/a | yes |
priority | n/a | string |
n/a | yes |
resource_group | n/a | string |
n/a | yes |
scale_in_force_deletion_enabled | n/a | string |
n/a | yes |
scale_in_rule | n/a | string |
n/a | yes |
scale_set_name | n/a | string |
n/a | yes |
spot_restore_enabled | n/a | string |
n/a | yes |
spot_restore_timeout | n/a | string |
n/a | yes |
storage_account_url | n/a | string |
n/a | yes |
subnet_prefixes | n/a | list(string) |
n/a | yes |
ultra_ssd_enabled | n/a | string |
n/a | yes |
vm_admin_username | n/a | string |
n/a | yes |
vm_data_disk_caching | n/a | string |
n/a | yes |
vm_data_disk_create_option | n/a | string |
n/a | yes |
vm_data_disk_size_gb | n/a | string |
n/a | yes |
vm_data_disk_write_accelerator_enabled | n/a | string |
n/a | yes |
vm_data_storage_account_type | n/a | string |
n/a | yes |
vm_instances | n/a | string |
n/a | yes |
vm_name_prefix | n/a | string |
n/a | yes |
vm_net_iface_ipconfig_name | n/a | string |
n/a | yes |
vm_net_iface_name | n/a | string |
n/a | yes |
vm_net_iface_private_ip_address_allocation | n/a | string |
n/a | yes |
vm_network_interface | n/a | string |
n/a | yes |
vm_os_disk_caching | n/a | string |
n/a | yes |
vm_os_disk_size_gb | n/a | string |
n/a | yes |
vm_os_disk_write_accelerator_enabled | n/a | string |
n/a | yes |
vm_os_storage_account_type | n/a | string |
n/a | yes |
vm_sku | n/a | string |
n/a | yes |
vm_source_image_offer | n/a | string |
n/a | yes |
vm_source_image_publisher | n/a | string |
n/a | yes |
vm_source_image_sku | n/a | string |
n/a | yes |
vm_source_image_verson | n/a | string |
n/a | yes |
vnet_name | n/a | string |
n/a | yes |
vnet_subnet_name | n/a | string |
n/a | yes |
No outputs.