From 1deb37c553a27fbb065c89c593fd7197deb35599 Mon Sep 17 00:00:00 2001 From: Nathan Mannall Date: Wed, 7 Aug 2024 12:00:28 +0100 Subject: [PATCH] Add documentation for the high memory node --- docs/index.md | 8 ++++--- docs/user-guide/batch.md | 46 ++++++++++++++++++++++++++++++++++------ 2 files changed, 45 insertions(+), 9 deletions(-) diff --git a/docs/index.md b/docs/index.md index 118bf288..7e3cc57f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -12,9 +12,11 @@ information on how to get access to the system please see the [Cirrus website](http://www.cirrus.ac.uk). The Cirrus facility is based around an SGI ICE XA system. There are 280 -standard compute nodes and 38 GPU compute nodes. Each standard compute -node has 256 GiB of memory and contains two 2.1 GHz, 18-core Intel Xeon -(Broadwell) processors. Each GPU compute node has 384 GiB of memory, +standard compute nodes, 1 high memory compute node and 38 GPU compute +nodes. Each standard compute node has 256 GiB of memory and contains two +2.1 GHz, 18-core Intel Xeon (Broadwell) processors. Each high memory +compute node has 3 TiB of memory and contains four 2.7 GHz, 28-core Intel +Xeon (Platinum) processors. Each GPU compute node has 384 GiB of memory, contains two 2.4 GHz, 20-core Intel Xeon (Cascade Lake) processors and four NVIDIA Tesla V100-SXM2-16GB (Volta) GPU accelerators connected to the host processors and each other via PCIe. All nodes are connected diff --git a/docs/user-guide/batch.md b/docs/user-guide/batch.md index a2ff3ad7..ca1303af 100644 --- a/docs/user-guide/batch.md +++ b/docs/user-guide/batch.md @@ -199,16 +199,49 @@ you request 1 GPU card, then you will be assigned a maximum of 384/4 = +### Primary resources on high memory (CPU) compute nodes + +The *primary resource* you request on the high memory compute node is CPU +cores. The maximum amount of memory you are allocated is computed as the +number of CPU cores you requested multiplied by 1/112th of the total +memory available (as there are 112 CPU cores per node). So, if you +request the full node (112 cores), then you will be allocated a maximum +of all of the memory (3 TB) available on the node; however, if you +request 1 core, then you will be assigned a maximum of 3000/112 = 26.8 GB +of the memory available on the node. + +!!! Note + + Using the `--exclusive` option in jobs will give you access to the full + node memory even if you do not explicitly request all of the CPU cores + on the node. + + +!!! Warning + + Using the `--exclusive` option will charge your account for the usage of + the entire node, even if you don't request all the cores in your + scripts. + +!!! Note + + You will not generally have access to the full amount of memory resource + on the the node as some is retained for running the operating system and + other system processes. + + + ### Partitions On Cirrus, compute nodes are grouped into partitions. You will have to specify a partition using the `--partition` option in your submission script. The following table has a list of active partitions on Cirrus. -| Partition | Description | Total nodes available | Notes | -|-----------|--------------------------------------------------------------------------------|-----------------------|-------| -| standard | CPU nodes with 2x 18-core Intel Broadwell processors | 352 | | -| gpu | GPU nodes with 4x Nvidia V100 GPU and 2x 20-core Intel Cascade Lake processors | 36 | | +| Partition | Description | Total nodes available | Notes | +|-----------|-----------------------------------------------------------------------------------------------|-----------------------|-------| +| standard | CPU nodes with 2x 18-core Intel Broadwell processors, 256 GB memory | 352 | | +| highmem | CPU node with 4x 28-core Intel Xeon Platinum processors, 3 TB memory | 1 | | +| gpu | GPU nodes with 4x Nvidia V100 GPU and 2x 20-core Intel Cascade Lake processors, 384 GB memory | 36 | | Cirrus Partitions @@ -232,12 +265,13 @@ resource limits. The following table has a list of active QoS on Cirrus. | QoS Name | Jobs Running Per User | Jobs Queued Per User | Max Walltime | Max Size | Applies to Partitions | Notes | |--------------|-----------------------|----------------------|--------------|-----------------------------------------|-----------------------|-------| | standard | No limit | 500 jobs | 4 days | 88 nodes (3168 cores/25%) | standard | | +| highmem | 1 job | 2 jobs | 24 hours | 1 node | highmem | | | largescale | 1 job | 4 jobs | 24 hours | 228 nodes (8192+ cores/65%) or 144 GPUs | standard, gpu | | | long | 5 jobs | 20 jobs | 14 days | 16 nodes or 8 GPUs | standard, gpu | | -| highpriority | 10 jobs | 20 jobs | 4 days | 140 nodes | standard | charged at 1.5 x normal rate | +| highpriority | 10 jobs | 20 jobs | 4 days | 140 nodes | standard | charged at 1.5 x normal rate | | gpu | No limit | 128 jobs | 4 days | 64 GPUs (16 nodes/40%) | gpu | | | short | 1 job | 2 jobs | 20 minutes | 2 nodes or 4 GPUs | standard, gpu | | -| lowpriority | No limit | 100 jobs | 2 days | 36 nodes (1296 cores/10%) or 16 GPUs | standard, gpu | usage is not charged | +| lowpriority | No limit | 100 jobs | 2 days | 36 nodes (1296 cores/10%) or 16 GPUs | standard, gpu | usage is not charged | #### Cirrus QoS