Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: use larger mempool parameters to avoid running out of pinnables #3810

Merged
merged 4 commits into from
May 16, 2024

Conversation

ericjbohm
Copy link
Contributor

Very wide (i.e., 128 pes per node) nonsmp runs of message intensive applications ran into message allocation errors with the prior memory pool settings. These appear as failures to fi_enable with an error message that maps to "address already in use". The latter appears to be a misleading error message as the actual bug appears to be an exhaustion of the number of pinned segments that can be enabled onto the fabric.

This can be avoided by increasing the expand size we use in the memory pool and related quantities for the overall sizing of the pool. Thereby reducing the total quantity of base addresses being pinned, bound, and enabled.

@ericjbohm ericjbohm added Cray OFI The Ofi machine layer CMake The CMake build system Buildold The old non-CMake build system. labels May 8, 2024
@ericjbohm ericjbohm added this to the 8.1 milestone May 8, 2024
@ericjbohm ericjbohm self-assigned this May 8, 2024
Copy link
Contributor

@ritvikrao ritvikrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this pr with the ofi-linux-x86 build on delta, and it fixed all my bugs, so it looks good to me

@ericjbohm ericjbohm modified the milestones: 8.1, 8.0 May 16, 2024
@ericjbohm ericjbohm merged commit ede740a into main May 16, 2024
23 checks passed
@ericjbohm ericjbohm deleted the ericjbohm/increase-mempool-defaults branch May 16, 2024 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Buildold The old non-CMake build system. CMake The CMake build system Cray OFI The Ofi machine layer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants