Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate out MPI and multi-GPU examples #258

Closed
wants to merge 24 commits into from

Conversation

jwallwork23
Copy link
Contributor

@jwallwork23 jwallwork23 commented Jan 27, 2025

Closes #257.
Closes #253.
Note merges into the branch for PR #222.

Checklist

  • Remove MPI from example 3.
  • Test revised example 3 on two GPU devices.
  • Write CPU-only MPI example.
  • Add MPI example to test suite.

@jwallwork23 jwallwork23 added documentation Improvements or additions to documentation testing Related to FTorch testing gpu Related to buiding and running on GPU labels Jan 27, 2025
@jwallwork23 jwallwork23 self-assigned this Jan 27, 2025
@jwallwork23 jwallwork23 force-pushed the 257_separate-mpi-multigpu branch from 06223fb to 23ad6e2 Compare January 27, 2025 17:52
@jwallwork23
Copy link
Contributor Author

jwallwork23 commented Jan 27, 2025

The new MPI example is working fine locally for me, but crashes on the CI.

See

The Windows workflow will need MPI installing, too:
https://github.com/Cambridge-ICCS/FTorch/actions/runs/12995413835/job/36242065775?pr=258

@jwallwork23
Copy link
Contributor Author

The order of the examples is no longer particularly aligned in terms of increasing complexity. We should follow this PR by reordering them to give a better flow when read in series. Perhaps
simplenet -> looping -> resnet -> multi I/O -> MPI -> multi-GPU -> autograd.

@jwallwork23 jwallwork23 changed the base branch from 208_multi-gpu-build to main January 29, 2025 12:50
This was referenced Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation gpu Related to buiding and running on GPU testing Related to FTorch testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Separate MPI out of example 3 Move Example 3 to not require MPI
1 participant