Prototype of MPIEvaluator for multi-node workloads #292

EwoutH · 2023-08-31T17:46:50Z

Very raw prototype of a multi-node evaluator that uses the MPIPoolExecutor from mpi4py.futures.

Ignore all the scripts, the interesting stuff is the new MPIEvaluator class in the evaluators.py file.

It currently is tested with pure Python models, on multiple nodes of the DelftBlue HPC system.

For todo and discussion, see: #266

Migrate the Python script from mpi4py.MPI to mpi4py.futures. Note that mpi4py.futures requires a separate launcher script to create the MPI environment and launch the main script. When using mpi4py.futures, the MPIPoolExecutor needs to be started in a separate process. This is because the MPIPoolExecutor must be created and submitted to tasks within an if __name__ == '__main__': block to prevent recursive creation of MPIPoolExecutor instances. Also, in the bash commands: - Modify the bash code to remove or clear the python-test directory, if it already exists. This ensures a clean environment for every test run. - Simplify the scp file transfer code, to one line.

Make sure that the scripts also work when the number of cores (in this case 10) is smaller than the number of jobs (20). In the EMAworkbench, when testing a million experiments, there won't be always a million cores available.

The error you're encountering is related to how mpi4py's `MPIPoolExecutor` works under the hood. When you're launching your script with `mpirun`, it's spawning multiple MPI processes. If you use `MPIPoolExecutor`, it tries to spawn additional processes (or threads, in this case) for each MPI process. This is why you see a conflict: you're essentially trying to spawn processes on cores that are already allocated, leading to the "All nodes which are allocated for this job are already filled" error. Here's how to address this: 1. **Avoid Nested Parallelism**: Don't combine `mpirun` with `MPIPoolExecutor`. Either use the typical MPI approach (using send/receive) or use the `MPIPoolExecutor`. 2. **Using `MPIPoolExecutor` without `mpirun`**: The way the `MPIPoolExecutor` works is that you run your Python script normally (i.e., without `mpirun`), and the `MPIPoolExecutor` will manage the creation and distribution of tasks across the MPI processes. Adjust your SLURM script: ```bash #!/bin/bash # Other directives... # You don't need mpirun here python my_model.py > py_test.log ``` 3. **Adjust the Code**: In your code, since you're launching without `mpirun`, you don't have to worry about the world size or checking against the number of jobs. The `MPIPoolExecutor` will automatically manage the tasks for you. 4. **Optionally, use `MPI.COMM_WORLD.Spawn`**: If you want more control, you can consider using `MPI.COMM_WORLD.Spawn()` to launch the worker processes instead of using the `MPIPoolExecutor`. Lastly, always be cautious when working on an HPC environment. Nested parallelism can exhaust resources and potentially harm other users' jobs. Always test on a smaller subset of cores/nodes and make sure to monitor your jobs to ensure they're behaving as expected.

for more information, see https://pre-commit.ci

quaquel · 2023-09-08T13:46:03Z

ema_workbench/em_framework/evaluators.py

+            callback(experiment, outcomes)
+
+
+def run_experiment_mpi(packed_data):


You now instantiate an ExperimentRunner for each separate experiment. In principle, you should have to instantiate this just once by making it a global. Not sure about the performance overhead of this. Related, you NWO pack all models for each experiment, again this might not be the most efficient. Using an initializer function can solve both problems (see MultiprocessingEvaluator and ema_multiprocessing.py)

I implemented the global ExperimentRunner in b8bf0e7. Is this what you had in mind?

yes, this is now also consistent with how it is handled in multiprocessing

Create some commands to use locally using WSL2 with Ubuntu

coveralls · 2023-09-14T10:30:26Z

coverage: 80.667% (-0.2%) from 80.893% when pulling 89817ae on multi-node-development into b76b487 on master.

…tion This commit introduces two primary architectural modifications to the MPIEvaluator class, aimed at improving efficiency when running experiments on HPC systems: 1. **Singleton ExperimentRunner**: Previously, the `ExperimentRunner` was instantiated for every individual experiment. This approach could introduce unnecessary overhead, especially when dealing with a large number of experiments. Instead, we've adopted a pattern where the `ExperimentRunner` is instantiated once and shared among all the worker processes in the MPI pool. This is achieved using an initializer function `mpi_initializer` which sets up a global `ExperimentRunner` for all the worker processes. 2. **Optimized Model Packing**: Before this change, all models were packed and sent with each experiment. This was potentially inefficient, especially when the size of the model objects was large. Now, we've altered the architecture to send only the model name with each experiment. Since the `ExperimentRunner` already has access to all models (being initialized once with all of them), it can easily fetch the necessary model using the provided model name. The primary motivation behind these changes is to reduce the overhead related to object instantiation and data transfer, especially when running experiments on large-scale HPC systems with SLURM.

6.2.2 works, 6.3.0 give very strange errors pointing to Java ASM incompatibility. Possibly an error in PyNetLogo. See NetLogo/NetLogo#2171

for more information, see https://pre-commit.ci

Identical to the current DelftBlue commands.txt, to be able to track changes

Rewrite for NetLogo. Current problem: Extracting with tar -xzf NetLogo-6.2.2-64.tgz takes extremely long.

Update paths used in NetLogo.py for DelftBlue

for more information, see https://pre-commit.ci

If you only need it in one method of one class, just import it in that one method of that one class (and not in the __init__). Makes sure mpi4py is only required when the MPIEvaluator is used. Challenging part will be conditionally importing MPI to establish the rank based on which evaluator is used.

The logging statements in the run_experiment_mpi() function don't adhere to the model logger level and formatting. TODO: Research and fix this.

- Introduced logging level propagation to the `MPIEvaluator` to ensure consistent logging across MPI processes. - Adjusted the `mpi_initializer` function to accept a logger level and configure the logging for each worker accordingly. - Changed the logging statements within `MPIEvaluator` and `run_experiment_mpi` from DEBUG to INFO for better visibility. - Removed a redundant TODO regarding logging in `run_experiment_mpi`. - Modified the `log_to_stderr` method in `ema_logging.py` to allow for setting logging levels on all root module loggers. - Simplified the logging setup in the `ema_model.py` script, and ensured proper propagation of logging levels.

EwoutH · 2023-10-27T10:31:17Z

@quaquel Finally fixed the propagation of logging levels in 89817ae. Could review the changes to the evaluators.py and ema_logging.py. Especially please especially check the set_root_logger_levels argument, and if you find it necessary at all (or if it can be default behaviour).

From my perspective, the function changes are done. So running on multiple nodes and logging are implemented successfully, but non-Python models is (might still work, but I didn't manage).

My current plan is to spend next week writing some documentation, examples, maybe a short tutorial and doing some quick performance measurements. Then, I will clean-up all commits on a new branch and open a PR.

EwoutH · 2023-11-15T21:53:01Z

Closing this PR, since it's succeeded by #299.

Future discussion about the new MPIEvaluator is welcomed at #311!

EwoutH and others added 16 commits May 31, 2023 14:08

Upload initial scripts for the multi-node evaluator

88f0a7e

remove second MPIPoolExecutor(), use mpirun, update commands

eae8f31

scripts: Update stack, time share, use unix line endings (LF)

63ce62c

script: Use mpiexec

0a61510

Run ema-workbench model sequentially on DelftBlue

09c3f02

evaluators: Add initial MPIEvaluator

597de6a

MPIEvaluator: pass model

00d2c1f

MPIEvaluator: Model passing

c72fc6c

MPIEvaluator: Model passing

469ffe6

[pre-commit.ci] auto fixes from pre-commit.com hooks

6fb15e3

for more information, see https://pre-commit.ci

Scripts: Update SBATCH and MPIEvaluator test scripts

806dbb8

[pre-commit.ci] auto fixes from pre-commit.com hooks

1c6289f

for more information, see https://pre-commit.ci

evaluators: Only import mpi4py if MPIEvaluator is called

8abf05c

quaquel reviewed Sep 8, 2023

View reviewed changes

EwoutH added 2 commits September 14, 2023 12:16

Update DelftBlue commands.txt

2f9b934

Create WSL commands.txt

247c576

Create some commands to use locally using WSL2 with Ubuntu

EwoutH and others added 10 commits September 18, 2023 09:48

Scripts: Getting NetLogo working on WSL

c09deda

6.2.2 works, 6.3.0 give very strange errors pointing to Java ASM incompatibility. Possibly an error in PyNetLogo. See NetLogo/NetLogo#2171

[pre-commit.ci] auto fixes from pre-commit.com hooks

bd75003

for more information, see https://pre-commit.ci

Create DelftBlue commands netlogo.txt

85b46fb

Identical to the current DelftBlue commands.txt, to be able to track changes

Update DelftBlue commands netlogo.txt

49e02f3

Rewrite for NetLogo. Current problem: Extracting with tar -xzf NetLogo-6.2.2-64.tgz takes extremely long.

workbench: Add MPIEvaluator to init

bae7a77

model.py: Improve ValueError by printing model_file

b17dd76

Update model.py

b0f9eaa

Update netlogo.py

23fb912

Update paths used in NetLogo.py for DelftBlue

[pre-commit.ci] auto fixes from pre-commit.com hooks

9bbecc6

for more information, see https://pre-commit.ci

EwoutH added 3 commits October 9, 2023 17:56

em_framework/points: Add string representation to Experiment class

33165bc

evaluators.py: Add logging statements to MPIEvaluator

1c986e6

The logging statements in the run_experiment_mpi() function don't adhere to the model logger level and formatting. TODO: Research and fix this.

EwoutH mentioned this pull request Oct 13, 2023

em_framework/points: Add string representation to Experiment class #297

Merged

EwoutH added 2 commits October 27, 2023 11:28

scripts: Update WSL installs, disable model prints

e8a11f0

EwoutH force-pushed the multi-node-development branch from 82ad53c to 89817ae Compare October 27, 2023 10:26

EwoutH mentioned this pull request Oct 29, 2023

Introducing MPIEvaluator: Run on multi-node HPC systems using mpi4py #299

Merged

EwoutH closed this Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype of MPIEvaluator for multi-node workloads #292

Prototype of MPIEvaluator for multi-node workloads #292

EwoutH commented Aug 31, 2023

quaquel Sep 8, 2023

EwoutH Sep 18, 2023

quaquel Sep 18, 2023

coveralls commented Sep 14, 2023 •

edited

Loading

EwoutH commented Oct 27, 2023

EwoutH commented Nov 15, 2023

		callback(experiment, outcomes)


		def run_experiment_mpi(packed_data):

Prototype of MPIEvaluator for multi-node workloads #292

Prototype of MPIEvaluator for multi-node workloads #292

Conversation

EwoutH commented Aug 31, 2023

quaquel Sep 8, 2023

Choose a reason for hiding this comment

EwoutH Sep 18, 2023

Choose a reason for hiding this comment

quaquel Sep 18, 2023

Choose a reason for hiding this comment

coveralls commented Sep 14, 2023 • edited Loading

EwoutH commented Oct 27, 2023

EwoutH commented Nov 15, 2023

coveralls commented Sep 14, 2023 •

edited

Loading