Feature: parallel inference without slurm #112

cathalobrien · 2025-01-23T07:47:41Z

Is your feature request related to a problem? Please describe.

Currently Anemoi Inference requires Slurm to run in parallel (see PR #108).

Slurm is used in 3 ways:

You must call 'srun anemoi-inference run ...' to launch the parallel processes
When running in parallel, Anemoi Inference reads Slurm env vars 'SLURM_LOCALID' (to match a GPU to a process), SLURM_PROCID (to determine process 0), and SLURM_NTASKS (to determine the total number of parallel processes)
If a 'MASTER_ADDR' env var is not set, slurms 'scontrol' program is used to determine the master address

Adding the option to run in parallel without slurm on a single node would simplify debugging and make it possible to run on systems without Slurm installed (i.e. some cloud setups)

Describe the solution you'd like

To run in parallel without Slurm on a single node should be straightforward:

Instead of launching the parallel processes with 'srun anemoi-inference run ...', we could launch one anemoi-inference process as normal and have it spawn the required number of subprocesses
We could pass world size by the config and determine LOCALID/PROCID based on the pids of the spawned processes
MASTER_ADDR can just be 'localhost' when running on a single node

Describe alternatives you've considered

No response

Additional context

No response

Organisation

ECMWF

rosinaderks · 2025-01-30T15:48:33Z

Additional context: At KNMI we are also interested in the option to run parallel inference without slurm due to our cloud setup (AWS). In the coming weeks we will test slurm and, if possible, the combination with AWS ParallelCluster. However, the option to run parallel inference without slurm would be much more interesting and easier to implement for us.

cathalobrien · 2025-01-30T16:41:22Z

thanks for the use-case! I'll move this up the list of things to do

cathalobrien · 2025-02-03T15:53:17Z

Hi @rosinaderks We have a PR for this now at #121 :)

rosinaderks · 2025-02-03T16:11:13Z

Thanks for the quick response and PR!

cathalobrien added the enhancement New feature or request label Jan 23, 2025

cathalobrien self-assigned this Jan 23, 2025

cathalobrien moved this to Under Review in Anemoi-dev Feb 4, 2025

cathalobrien added this to Anemoi-dev Feb 4, 2025

cathalobrien changed the title ~~Add the option to run parallel inference without slurm~~ parallel inference without slurm Feb 4, 2025

cathalobrien changed the title ~~parallel inference without slurm~~ Feature: parallel inference without slurm Feb 4, 2025

cathalobrien removed this from Anemoi-dev Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: parallel inference without slurm #112

Feature: parallel inference without slurm #112

cathalobrien commented Jan 23, 2025

rosinaderks commented Jan 30, 2025

cathalobrien commented Jan 30, 2025

cathalobrien commented Feb 3, 2025

rosinaderks commented Feb 3, 2025

Feature: parallel inference without slurm #112

Feature: parallel inference without slurm #112

Comments

cathalobrien commented Jan 23, 2025

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Organisation

rosinaderks commented Jan 30, 2025

cathalobrien commented Jan 30, 2025

cathalobrien commented Feb 3, 2025

rosinaderks commented Feb 3, 2025