Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal compiler error with some tests using nvidia compiler (version 24.5) on pm-cpu #6931

Open
ndkeen opened this issue Jan 22, 2025 · 1 comment
Labels
nvidia compiler nvidia compiler (formerly PGI) pm-cpu Perlmutter at NERSC (CPU-only nodes)

Comments

@ndkeen
Copy link
Contributor

ndkeen commented Jan 22, 2025

After recent PR, I see a few build fails on pm-cpu with nvidia compiler. The fail is an internal compiler error.

I think these 4 have same fail:

SMS_Lh4.ne4pg2_ne4pg2.F2010-SCREAMv1.pm-cpu_nvidia.eamxx-output-preset-1--eamxx-prod 
SMS_Lh4.ne4_ne4.F2010-SCREAMv1.pm-cpu_nvidia.eamxx-output-preset-1
ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.pm-cpu_nvidia.mpaso-ocn_glcshelf
ERS.f09_g16_g.MALISIA.pm-cpu_nvidia

and it may be something in this file:

core_landice/mode_forward/mpas_li_subglacial_hydro.f90
NVFORTRAN-S-0000-Internal compiler error. items were added to sem.p_dealloc but not freed       0  (/ascratch/sd/n/ndk/e3sm_scratch/alvarez/SMS_D.f09_g16_g.MALISIA.alvarez_nvidia.gh6931/bld/cm\
ake-bld/core_landice/mode_forward/mpas_li_subglacial_hydro.f90: 1813)
  0 inform,   0 warnings,   1 severes, 0 fatal for calc_pressure_diag_vars
Target CMakeFiles/glc.dir/__/__/core_landice/mode_forward/mpas_li_subglacial_hydro.f90.o built in 0.982074 seconds

I don't see a more recent version of compiler, but am asking if one is coming.
May be better to wait for a newer compiler version before trying to find a work-around via compiler flags.

Note, we run one test suite every other night on pm-cpu using the nvidia compiler.
Other than one nagging failing ERS test (ERS_D.ne4pg2_oQU480.F2010.pm-cpu_nvidia.eam-hommexx) the tests pass.

@ndkeen ndkeen added pm-cpu Perlmutter at NERSC (CPU-only nodes) nvidia compiler nvidia compiler (formerly PGI) labels Jan 22, 2025
@ndkeen
Copy link
Contributor Author

ndkeen commented Feb 13, 2025

Verified still see the internal compiler error. Also tried simpler versions of tests above.
For example, these hit error:

SMS.f09_g16_g.MALISIA --compiler=nvidia 
SMS_D.f09_g16_g.MALISIA --compiler=nvidia
SMS.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5 --compiler=nvidia
SMS_D.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5 --compiler=nvidia
SMS.ne4_ne4.F2010-SCREAMv1  --compiler=nvidia
SMS_D.ne4_ne4.F2010-SCREAMv1  --compiler=nvidia

Where I also tested DEBUG and OPT -- which makes me think it may not be something easily handled by compiler flag change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nvidia compiler nvidia compiler (formerly PGI) pm-cpu Perlmutter at NERSC (CPU-only nodes)
Projects
None yet
Development

No branches or pull requests

1 participant