Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-BFB behavior with ne30pg2_ne30pg2.F2010-SCREAMv1 cases on pm-cpu when changing NTASKS #3025

Open
ndkeen opened this issue Oct 1, 2024 · 1 comment
Labels
Non-B4B Not bit for bit pm-cpu CPU nodes of Perlmutter

Comments

@ndkeen
Copy link
Contributor

ndkeen commented Oct 1, 2024

I'm seeing that I get different results when I change the number of MPI tasks for CPU jobs of scream. Only tested on pm-cpu (and muller-cpu). I've been running scaling tests for both e3sm/scream. All e3sm cases are BFB, but it looks like, every different node count used for a scream case results in a different set of hashes. For a given MPI task count, re-running the case looks BFB as expected.

And, just now, I tried PEM.ne30pg2_ne30pg2.F2010-SCREAMv1.pm-cpu_intel which does fail.
/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/PEM.ne30pg2_ne30pg2.F2010-SCREAMv1.pm-cpu_intel.r00

Looks like it passes with DEBUG
PEM_D_P1024_Ld1.ne30pg2_ne30pg2.F2010-SCREAMv1.pm-cpu_intel

@ndkeen ndkeen added Non-B4B Not bit for bit pm-cpu CPU nodes of Perlmutter labels Oct 1, 2024
@ambrad
Copy link
Member

ambrad commented Oct 1, 2024

This might be specific to Intel. We have PEM_Ln90.ne30pg2_ne30pg2.F2010-SCREAMv1.pm-cpu_gnu.scream-spa_remap--scream-output-preset-4 in our nightly, which is GNU on pm-cpu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Non-B4B Not bit for bit pm-cpu CPU nodes of Perlmutter
Projects
None yet
Development

No branches or pull requests

2 participants