Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove MPI from multi-GPU example #268

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

jwallwork23
Copy link
Contributor

Closes #253.

I decided to close #258 and split it into two separate PRs. Here's the first one that removes MPI from the multi-GPU example. (The second one will introduce a CPU-only example that uses MPI.)

Note that I decided to switch the names of the multigpu.py module / MultiGPU class back to simplenet.py / SimpleNet in this example. This is because the class is a direct copy. (Unlike in MultiIONet, where it's modified.)

@jwallwork23 jwallwork23 added documentation Improvements or additions to documentation testing Related to FTorch testing labels Jan 30, 2025
@jwallwork23 jwallwork23 self-assigned this Jan 30, 2025
@jwallwork23
Copy link
Contributor Author

jwallwork23 commented Jan 30, 2025

Tested on a laptop with 1 CUDA GPU device with the patch

diff --git a/examples/3_MultiGPU/multigpu_infer_fortran.f90 b/examples/3_MultiGPU/multigpu_infer_fortran.f90
index 297844e..cfba096 100644
--- a/examples/3_MultiGPU/multigpu_infer_fortran.f90
+++ b/examples/3_MultiGPU/multigpu_infer_fortran.f90
@@ -27,7 +27,7 @@ program inference
    type(torch_tensor), dimension(1) :: out_tensors

    ! Variables for multi-GPU setup
-   integer, parameter :: num_devices = 2
+   integer, parameter :: num_devices = 1
    integer :: device_index, i

    ! Get TorchScript model file as a command line argument
diff --git a/examples/3_MultiGPU/multigpu_infer_python.py b/examples/3_MultiGPU/multigpu_infer_python.py
index 1b49398..c504063 100644
--- a/examples/3_MultiGPU/multigpu_infer_python.py
+++ b/examples/3_MultiGPU/multigpu_infer_python.py
@@ -53,7 +53,7 @@ def deploy(saved_model: str, device: str, batch_size: int = 1) -> torch.Tensor:
 if __name__ == "__main__":
     saved_model_file = "saved_multigpu_model_cuda.pt"

-    for device_index in range(2):
+    for device_index in range(1):
         device_to_run = f"cuda:{device_index}"

         batch_size_to_run = 1

@jwallwork23 jwallwork23 marked this pull request as ready for review January 30, 2025 14:03
@jwallwork23 jwallwork23 mentioned this pull request Jan 30, 2025
@jwallwork23 jwallwork23 added the gpu Related to buiding and running on GPU label Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation gpu Related to buiding and running on GPU testing Related to FTorch testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move Example 3 to not require MPI
1 participant