Remove MPI from multi-GPU example #268

jwallwork23 · 2025-01-30T13:46:33Z

Closes #253.

I decided to close #258 and split it into two separate PRs. Here's the first one that removes MPI from the multi-GPU example. (The second one will introduce a CPU-only example that uses MPI.)

Note that I decided to switch the names of the multigpu.py module / MultiGPU class back to simplenet.py / SimpleNet in this example. This is because the class is a direct copy. (Unlike in MultiIONet, where it's modified.)

jwallwork23 · 2025-01-30T14:03:09Z

Tested on a laptop with 1 CUDA GPU device with the patch

diff --git a/examples/3_MultiGPU/multigpu_infer_fortran.f90 b/examples/3_MultiGPU/multigpu_infer_fortran.f90
index 297844e..cfba096 100644
--- a/examples/3_MultiGPU/multigpu_infer_fortran.f90
+++ b/examples/3_MultiGPU/multigpu_infer_fortran.f90
@@ -27,7 +27,7 @@ program inference
    type(torch_tensor), dimension(1) :: out_tensors

    ! Variables for multi-GPU setup
-   integer, parameter :: num_devices = 2
+   integer, parameter :: num_devices = 1
    integer :: device_index, i

    ! Get TorchScript model file as a command line argument
diff --git a/examples/3_MultiGPU/multigpu_infer_python.py b/examples/3_MultiGPU/multigpu_infer_python.py
index 1b49398..c504063 100644
--- a/examples/3_MultiGPU/multigpu_infer_python.py
+++ b/examples/3_MultiGPU/multigpu_infer_python.py
@@ -53,7 +53,7 @@ def deploy(saved_model: str, device: str, batch_size: int = 1) -> torch.Tensor:
 if __name__ == "__main__":
     saved_model_file = "saved_multigpu_model_cuda.pt"

-    for device_index in range(2):
+    for device_index in range(1):
         device_to_run = f"cuda:{device_index}"

         batch_size_to_run = 1

jwallwork23 added 5 commits January 30, 2025 13:34

Drop ENABLE_MPI CMake argument

b82775c

Remove MPI from example 3

7c15ea2

lint

62b853d

Consistent naming; comments

30a6e67

Mention GPU/CUDA as dependency

aa79743

jwallwork23 added documentation Improvements or additions to documentation testing Related to FTorch testing labels Jan 30, 2025

jwallwork23 self-assigned this Jan 30, 2025

jwallwork23 force-pushed the 253_multigpu-no-mpi branch from aa79743 to 8be7e77 Compare January 30, 2025 14:02

jwallwork23 marked this pull request as ready for review January 30, 2025 14:03

jwallwork23 requested a review from jatkinson1000 January 30, 2025 14:03

Don't require MPI for example 3

88e836e

jwallwork23 force-pushed the 253_multigpu-no-mpi branch from 8be7e77 to 88e836e Compare January 30, 2025 14:09

jwallwork23 mentioned this pull request Jan 30, 2025

MPI example #270

Draft

jwallwork23 added the gpu Related to buiding and running on GPU label Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove MPI from multi-GPU example #268

Remove MPI from multi-GPU example #268

jwallwork23 commented Jan 30, 2025

jwallwork23 commented Jan 30, 2025 •

edited

Loading

Remove MPI from multi-GPU example #268

Are you sure you want to change the base?

Remove MPI from multi-GPU example #268

Conversation

jwallwork23 commented Jan 30, 2025

jwallwork23 commented Jan 30, 2025 • edited Loading

jwallwork23 commented Jan 30, 2025 •

edited

Loading