Collection of various machine learning benchmarks together with Slurm scripts for CSC's supercomputers.
The benchmarks themselves (Python code) can be found in the benchmarks
directory. Main run scripts are in the root directory as *.sh
files. The Slurm
settings have been separated into their own scripts in the slurm
directory.
Typical usage would be to first select a benchmark (e.g., PyTorch synthetic) and then appropriate Slurm settings (e.g., Mahti with 4 GPUs on Mahti, single node, no MPI). The command would then be:
sbatch slurm/mahti-gpu4.sh pytorch-synthetic.sh
Slurm run scripts can be found in the slurm
directory, these are named as
[puhti|mahti]-[cpu|gpu]N.sh
where N
is the number of CPUs or GPUs reserved.
Scripts are all single-node, single MPI task unless it ends with -mpi.sh
.
Tasks with the -mpi.sh
ending launch a separate MPI task for each GPU,
assuming 4 GPUs per node. For example mahti-gpu8-mpi.sh
reserves two nodes,
with 4 GPUs (and thus 4 MPI tasks) per node, giving a total of 8 GPUs (and 8 MPI
tasks).
Benchmark | Script name | Data |
---|---|---|
PyTorch synthetic | pytorch-synthetic.sh |
synthetic |
PyTorch DDP | pytorch-ddp.sh |
synthetic/ImageNet |
PyTorch DDP Lightning | pytorch-ddp-lightning.sh |
synthetic/ImageNet |
PyTorch DeepSpeed | pytorch-deepspeed.sh |
synthetic/ImageNet |
run_clm | pytorch-clm.sh |
WikiText-2 |
TensorFlow CNN | tensorflow-cnn.sh |
synthetic/ImageNet |
The different benchmarks are described below in more detail.
Originally based on Horovod's example script with the same name. Note that the original script used a single fixed random batch which was feed to the network again and again. Some systems and setups are able to optimize this scenario giving very unrealistic results. We have modified the script to generate a new random batch each time.
Runs with "resnet50" model by default, but also supports "inception_v3" and other models from torchvision.models.
Run example with single GPU:
sbatch slurm/mahti-gpu1.sh pytorch-synthetic.sh
Run example with 4 GPUs. Note that you can also add arguments to be given to the Python script:
sbatch slurm/mahti-gpu4.sh pytorch-synthetic.sh --batch-size=32
Using 8 GPUs (i.e., 2 nodes) with Horovod and MPI (not supported in newer PyTorch installations):
sbatch slurm/mahti-gpu8-mpi.sh pytorch-synthetic.sh
PyTorch benchmark using Distributed Data Parallel for handling multiple GPUs.
Run example with 4 GPUs on Puhti using synthetic data:
sbatch slurm/puhti-gpu4.sh pytorch-ddp.sh
Run example with 8 GPUs (on 2 nodes) using real ImageNet data:
sbatch slurm/puhti-gpu8.sh pytorch-ddp.sh --data
Run example with 8 GPUs (2 nodes) with fp16:
sbatch slurm/puhti-gpu8.sh pytorch-ddp.sh --fp16
PyTorch Lightning example using DDP. Runs with "resnet50" model by default, but also supports "inception_v3" and other models from torchvision.models.
DDP on Lightning (as of PyTorch 1.13) needs to be run as single task per GPU:
sbatch slurm/puhti-gpu4-mpi.sh pytorch-ddp-lightning.sh # single node
sbatch slurm/puhti-gpu8-mpi.sh pytorch-ddp-lightning.sh # two nodes
The scripts supports --data
option to use real ImageNet data instead
of synthetic data and --fp16
to enable 16-bit precision for some
operations.
DeepSpeed example, 4 GPUs with synthetic data (note: one node = one task):
sbatch slurm/puhti-gpu4.sh pytorch-deepspeed.sh
8 GPUs, 2 nodes with ImageNet data (note one GPU = one task):
sbatch slurm/puhti-gpu8-mpi.sh pytorch-deepspeed.sh --data
Fine-tuning GPT-like model on WikiText-2, directly from Huggingface Language modeling examples.
Run example with a full node GPUs (in this case 8 GPUs on LUMI):
sbatch slurm/lumi-gpu8.sh pytorch-clm.sh
Run example with two full nodes GPUs (in this case 16 GPUs on LUMI):
sbatch slurm/lumi-gpu16.sh pytorch-clm.sh
Uses tf_cnn_benchmarks.py
directly from TensorFlow's GitHub (as a git
submodule here).
Run example:
sbatch slurm/mahti-gpu1.sh tensorflow-cnn.sh
Horovod:
sbatch slurm/mahti-gpu8-mpi.sh tensorflow-cnn.sh
With real data:
sbatch slurm/mahti-gpu1.sh tensorflow-cnn.sh --data
Horovod with real data:
sbatch slurm/mahti-gpu8-mpi.sh tensorflow-cnn.sh --data