From 1deb37c553a27fbb065c89c593fd7197deb35599 Mon Sep 17 00:00:00 2001
From: Nathan Mannall <n.mannall@epcc.ed.ac.uk>
Date: Wed, 7 Aug 2024 12:00:28 +0100
Subject: [PATCH] Add documentation for the high memory node

---
 docs/index.md            |  8 ++++---
 docs/user-guide/batch.md | 46 ++++++++++++++++++++++++++++++++++------
 2 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/docs/index.md b/docs/index.md
index 118bf288..7e3cc57f 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -12,9 +12,11 @@ information on how to get access to the system please see the [Cirrus
 website](http://www.cirrus.ac.uk).
 
 The Cirrus facility is based around an SGI ICE XA system. There are 280
-standard compute nodes and 38 GPU compute nodes. Each standard compute
-node has 256 GiB of memory and contains two 2.1 GHz, 18-core Intel Xeon
-(Broadwell) processors. Each GPU compute node has 384 GiB of memory,
+standard compute nodes, 1 high memory compute node and 38 GPU compute
+nodes. Each standard compute node has 256 GiB of memory and contains two
+2.1 GHz, 18-core Intel Xeon (Broadwell) processors. Each high memory
+compute node has 3 TiB of memory and contains four 2.7 GHz, 28-core Intel
+Xeon (Platinum) processors. Each GPU compute node has 384 GiB of memory,
 contains two 2.4 GHz, 20-core Intel Xeon (Cascade Lake) processors and
 four NVIDIA Tesla V100-SXM2-16GB (Volta) GPU accelerators connected to
 the host processors and each other via PCIe. All nodes are connected
diff --git a/docs/user-guide/batch.md b/docs/user-guide/batch.md
index a2ff3ad7..ca1303af 100644
--- a/docs/user-guide/batch.md
+++ b/docs/user-guide/batch.md
@@ -199,16 +199,49 @@ you request 1 GPU card, then you will be assigned a maximum of 384/4 =
 
 
 
+### Primary resources on high memory (CPU) compute nodes
+
+The *primary resource* you request on the high memory compute node is CPU
+cores. The maximum amount of memory you are allocated is computed as the
+number of CPU cores you requested multiplied by 1/112th of the total
+memory available (as there are 112 CPU cores per node). So, if you
+request the full node (112 cores), then you will be allocated a maximum
+of all of the memory (3 TB) available on the node; however, if you
+request 1 core, then you will be assigned a maximum of 3000/112 = 26.8 GB
+of the memory available on the node.
+
+!!! Note
+
+	Using the `--exclusive` option in jobs will give you access to the full
+	node memory even if you do not explicitly request all of the CPU cores
+	on the node.
+
+
+!!! Warning
+
+	Using the `--exclusive` option will charge your account for the usage of
+	the entire node, even if you don't request all the cores in your
+	scripts.
+
+!!! Note
+
+	You will not generally have access to the full amount of memory resource
+	on the the node as some is retained for running the operating system and
+	other system processes.
+
+
+
 ### Partitions
 
 On Cirrus, compute nodes are grouped into partitions. You will have to
 specify a partition using the `--partition` option in your submission
 script. The following table has a list of active partitions on Cirrus.
 
-| Partition | Description                                                                    | Total nodes available | Notes |
-|-----------|--------------------------------------------------------------------------------|-----------------------|-------|
-| standard  | CPU nodes with 2x 18-core Intel Broadwell processors                           | 352                   |       |
-| gpu       | GPU nodes with 4x Nvidia V100 GPU and 2x 20-core Intel Cascade Lake processors | 36                    |       |
+| Partition | Description                                                                                   | Total nodes available | Notes |
+|-----------|-----------------------------------------------------------------------------------------------|-----------------------|-------|
+| standard  | CPU nodes with 2x 18-core Intel Broadwell processors, 256 GB memory                           | 352                   |       |
+| highmem   | CPU node with 4x 28-core Intel Xeon Platinum processors, 3 TB memory                          | 1                     |       |
+| gpu       | GPU nodes with 4x Nvidia V100 GPU and 2x 20-core Intel Cascade Lake processors, 384 GB memory | 36                    |       |
 
 Cirrus Partitions
 
@@ -232,12 +265,13 @@ resource limits. The following table has a list of active QoS on Cirrus.
 | QoS Name     | Jobs Running Per User | Jobs Queued Per User | Max Walltime | Max Size                                | Applies to Partitions | Notes |
 |--------------|-----------------------|----------------------|--------------|-----------------------------------------|-----------------------|-------|
 | standard     | No limit              | 500 jobs             | 4 days       | 88 nodes (3168 cores/25%)               | standard              |       |
+| highmem      | 1 job                 | 2 jobs               | 24 hours     | 1 node                                  | highmem               |       |
 | largescale   | 1 job                 | 4 jobs               | 24 hours     | 228 nodes (8192+ cores/65%) or 144 GPUs | standard, gpu         |       |
 | long         | 5 jobs                | 20 jobs              | 14 days      | 16 nodes or 8 GPUs                      | standard, gpu         |       |
-| highpriority | 10 jobs               | 20 jobs              | 4 days       | 140 nodes                               | standard              |        charged at 1.5 x normal rate |
+| highpriority | 10 jobs               | 20 jobs              | 4 days       | 140 nodes                               | standard              | charged at 1.5 x normal rate |
 | gpu          | No limit              | 128 jobs             | 4 days       | 64 GPUs (16 nodes/40%)                  | gpu                   |       |
 | short        | 1 job                 | 2 jobs               | 20 minutes   | 2 nodes or 4 GPUs                       | standard, gpu         |       |
-| lowpriority  | No limit              | 100 jobs             | 2 days       | 36 nodes (1296 cores/10%) or 16 GPUs    | standard, gpu         |        usage is not charged |
+| lowpriority  | No limit              | 100 jobs             | 2 days       | 36 nodes (1296 cores/10%) or 16 GPUs    | standard, gpu         | usage is not charged |
 
 #### Cirrus QoS