Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install BUFR 12.1.0 #10

Open
DavidHuber-NOAA opened this issue Sep 18, 2024 · 27 comments
Open

Install BUFR 12.1.0 #10

DavidHuber-NOAA opened this issue Sep 18, 2024 · 27 comments

Comments

@DavidHuber-NOAA
Copy link

BUFR 12.1.0 has optimizations in it that make it possible to maintain operational runtime performance of the GSI. Installing this version on WCOSS2 is required for migration to spack-stack 1.8.0 on all other RDHPCS machines as it requires code changes to the GSI.

FYI @jbathegit @RussTreadon-NOAA

@jbathegit
Copy link

Hi David, as I noted yesterday, I understand and appreciate that this needs to be done, but once I release and announce a new version, then it's out of my hands at that point w.r.t. how quickly it gets installed on all of the various machines, and there's nothing I can do to change that reality.

For the record, a new install request for v12.1.0 was never opened in this particular WCOSS2-requests repository because the library update was released on July 10th, but this repository wasn't even set up until 3 weeks later on July 31st, and I didn't learn about its existence until several weeks after that.

I've already made a note to open an issue in this repository for all future releases. But that shouldn't be preventing this v12.1.0 release from moving forward, because as you're already aware there's been a lot of back-and-forth discussion about getting this installed within various threads in several different repositories including spack-stack#1060, spack-stack#1194, spack#45459, JCSDA-spack#463 and others. The v12.1.0 release is part of the spack-stack-1.8.0 recipe, so everyone's aware of the need and things are supposedly moving in the right direction. But again, there's nothing I can do personally to speed up that process.

@DavidHuber-NOAA
Copy link
Author

@jbathegit Understood, but this is the official request form to notify Hang and NCO parties that an installation is needed, hence this open issue -- not a request for any action from you (though I would appreciate testing once the library is installed on Acorn and later Cactus/Dogwood), just keeping you in the loop.

@edwardhartnett
Copy link
Contributor

@jbathegit you are doing everything perfectly. Indeed, you should release new versions when you are ready and not worry about installing them on WCOSS2. Nor do you need to have each release installed on WCOSS2, unless you know of a reason to install it. It's perfectly reasonable for applications to skip releases. (For example, UFS is using netcdf-c-4.7.4 and is going to jump to netcdf-4.9.2, skipping a bunch of netcdf-c releases.)

As @DavidHuber-NOAA notes this is the place to request installs and @Hang-Lei-NOAA will ensure that this is installed.

@jbathegit
Copy link

As @DavidHuber-NOAA notes this is the place to request installs and @Hang-Lei-NOAA will ensure that this is installed.

Understood going forward. But as I noted above, v12.1.0 was released on July 10th, and this WCOSS2-requests repository wasn't even established until 3 weeks later, at which point the wheels were already in motion for this particular release and so (I thought?) opening up a new issue for this particular release would have been unnecessary at that point.

@jbathegit
Copy link

As far as testing is concerned, and unless I'm missing something, that should all be done using ctest whenever the library is initially built on any given platform. We've done a lot of work to include CI testing in the repository with more than 96% code coverage including all of the utilities, so IMO that should already be done as part of the automated process for all future builds, rather than asking me to go back later and manually build and run a bunch of tests to verify an already-existing build. In other words, if the ctest is successful before doing the make install, then that should be all that's needed to verify the integrity of any new build.

Or am I missing something here?

@edwardhartnett
Copy link
Contributor

Yes, make test is being run and will be run on all WCOSS2 installs. @AlexanderRichert-NOAA is running automated weekly tests on the spack-based install. (And soon this will be the only install.)

While we transition to spack, when @Hang-Lei-NOAA arranges manual installs, he will ensure that make test is run.

@DavidHuber-NOAA
Copy link
Author

Great, thanks for the info about testing @jbathegit!

@edwardhartnett
Copy link
Contributor

Can we close this ticket? @Hang-Lei-NOAA ?

@edwardhartnett
Copy link
Contributor

@Hang-Lei-NOAA ?

@Hang-Lei-NOAA
Copy link

Yes

@jbathegit
Copy link

Hmm, unless I'm missing something, I don't see where v12.1.0 has been installed on WCOSS2 yet. So should we really be closing this ticket before that happens?

@Hang-Lei-NOAA Hang-Lei-NOAA reopened this Oct 7, 2024
@Hang-Lei-NOAA
Copy link

@jbathegit This has been installed on acorn. But to deliver it to WCOSS2 operational machines, we have to have an operational model request and testing. Currently, no operational model using this version. So, as stated above, we can wait for spack-stack taking all these to NCO.

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Oct 7, 2024

Please test on acorn:
module use /lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/modulefiles/compiler/intel/19.1.3.304
module load bufr/12.1.0
module show bufr/12.1.0

/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/modulefiles/compiler/intel/19.1.3.304/bufr/12.1.0.lua:

help([[]])
conflict("bufr")
setenv("bufr_ROOT","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0")
setenv("bufr_VERSION","12.1.0")
setenv("BUFR_INC4","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/lib64/include_4")
setenv("BUFR_INC8","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/lib64/include_8")
setenv("BUFR_INCd","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/lib64/include_d")
setenv("BUFR_LIB4","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/lib64/libbufr_4.a")
setenv("BUFR_LIB8","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/lib64/libbufr_8.a")
setenv("BUFR_LIBd","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/lib64/libbufr_d.a")
prepend_path("PATH","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/bin")
whatis("Name: bufr")
whatis("Version: 12.1.0")
whatis("Category: library")
whatis("Description: bufr library")

@DavidHuber-NOAA
Copy link
Author

@Hang-Lei-NOAA Is there an approximate timeline for spack-stack delivery to NCO for installation on Cactus/Dogwood?

I will test this installation out with the GSI on Acorn.

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Oct 7, 2024

Thanks @DavidHuber-NOAA
Not sure of the exact timeline.
I reopen this ticket. Let's wait David to have it incorporated into the GSI. If any adjustment to the library is needed during your testing, please let Jeff and I know. I will adjust the installation.

@jbathegit
Copy link

Please test on acorn: module use /lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/modulefiles/compiler/intel/19.1.3.304 module load bufr/12.1.0 module show bufr/12.1.0

/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/modulefiles/compiler/intel/19.1.3.304/bufr/12.1.0.lua:
help([[]]) conflict("bufr") setenv("bufr_ROOT","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0") setenv("bufr_VERSION","12.1.0") setenv("BUFR_INC4","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/lib64/include_4") setenv("BUFR_INC8","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/lib64/include_8") setenv("BUFR_INCd","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/lib64/include_d") setenv("BUFR_LIB4","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/lib64/libbufr_4.a") setenv("BUFR_LIB8","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/lib64/libbufr_8.a") setenv("BUFR_LIBd","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/lib64/libbufr_d.a") prepend_path("PATH","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/bin") whatis("Name: bufr") whatis("Version: 12.1.0") whatis("Category: library") whatis("Description: bufr library")

Thanks @Hang-Lei-NOAA, but could you please update whatever modulefile template you're using for this library on WCOSS2?Since v12.0.0 there's only one single _4 build of the library, so there should no longer be any BUFR_INC8, BUFR_INCd, BUFR_LIB8, or BUFR_LIBd envvars being set for any version beyond that. Instead, the only such envvars that still exist are BUFR_INC4 and BUFR_LIB4, so those should be the only ones being set.

Here's what the modulefile currently looks like for v12.0.0, at least on cactus and dogwood:

% module show bufr/12.0.0
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   /apps/ops/prod/libs/modulefiles/compiler/intel/19.1.3.304/bufr/12.0.0.lua:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
help([[]])
conflict("bufr")
setenv("bufr_ROOT","/apps/ops/prod/libs/intel/19.1.3.304/bufr/12.0.0")
setenv("bufr_VERSION","12.0.0")
setenv("BUFR_INC4","/apps/ops/prod/libs/intel/19.1.3.304/bufr/12.0.0/include/bufr_4")
setenv("BUFR_LIB4","/apps/ops/prod/libs/intel/19.1.3.304/bufr/12.0.0/lib64/libbufr_4.a")
prepend_path("PATH","/apps/ops/prod/libs/intel/19.1.3.304/bufr/12.0.0/bin")
whatis("Name: bufr")
whatis("Version: 12.0.0")
whatis("Category: library")
whatis("Description: bufr library")

So we should only see these same envvars defined for v12.1.0 as well.

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Oct 7, 2024

@jbathegit Thanks, Jeff, updated it.


/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/modulefiles/compiler/intel/19.1.3.304/bufr/12.1.0.lua:

help([[]])
conflict("bufr")
setenv("bufr_ROOT","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0")
setenv("bufr_VERSION","12.1.0")
setenv("BUFR_INC4","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/include/bufr_4")
setenv("BUFR_LIB4","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/lib64/libbufr_4.a")
prepend_path("PATH","/lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel/19.1.3.304/bufr/12.1.0/bin")
whatis("Name: bufr")
whatis("Version: 12.1.0")
whatis("Category: library")
whatis("Description: bufr library")

@edwardhartnett
Copy link
Contributor

Is this issue complete?

@DavidHuber-NOAA
Copy link
Author

No, apologies for not giving an update recently. I was out sick for a week and I let it fall off my plate.

I was able to build the GSI with BUFR 12.1.0, but I could not get 1-to-1 reproducibility against a GSI compiled with BUFR 11.7.0. I was waiting on official support of the GSI on Acorn to be merged (NOAA-EMC/GSI#793), which it now is, before looking into this further. I will look into this further over the next couple of days.

@jbathegit
Copy link

Hi All - just checking in to see if there any updates on this?

@DavidHuber-NOAA
Copy link
Author

Hi Jeff, I wasn't able to access Acorn last week while Cactus was down for maintenance and now it is switching to production. I am hoping to get somewhere with this today when Dogwood opens up for developers.

@DavidHuber-NOAA
Copy link
Author

@RussTreadon-NOAA @jbathegit I performed another round of testing on Acorn and I am still seeing significant differences in the global 4DEnVar, RRFS 3DEnVar, and HAFS 3DEnVar and 4DEnVar tests between BUFR 11.7 and 12.1.

I believe that the calls to the ufbqcd subroutine in the BUFR 12.1.0 version of the GSI (feature/bufr12_acorn) have all been updated correctly. And there are no calls to ufbqcp. These are the only subroutines I see listed in the release notes as having undergone significant changes. Are there any others that I am missing?

Focusing on the global_4denvar test, initial differences are reported in temperature. The ufbqcd subroutine is used during the extraction of virtual temperature from the prepbufr file (i.e. in the read_prepbufr.f90 module), and so seems like a likely place where things may differ. I would appreciate it if either/both of you could take a look at my code changes in read_prepbufr.f90 and verify that I am handling the changes to ufbqcd correctly.

@DavidHuber-NOAA
Copy link
Author

Printing off values, it looks like the issue doesn't lie with the BUFR 12.1.0 version, but in develop. Printing off values of vtcd and glcd here and here, respectively, returns the following values for develop (BUFR 11.7.0) and BUFR 12.1.0:

vtcd, develop:

74.0980989933014
74.6707918643951
75.1140902042389
77.4317171573639
77.5594637393951
77.6920320987701
77.7783968448639
81.3507845401764

vtcd, BUFR 12.1.0:

8
8
8
8
8
8
8
8

glcd, develop:

5.431915495183320E-315
5.431915495183320E-315
5.431915495183320E-315
5.431915495183320E-315
5.431915495183320E-315
5.431915495183320E-315
5.431915495183320E-315
5.431915495183320E-315

glcd, BUFR 12.1.0:

17
17
17
17
17
17
17
17

I'm going to repeat this test with BUFR 11.7.0 on Cactus and Hera to check if these values are present there as well. As first mentioned by @RussTreadon-NOAA here, the values in develop used to be floating point representation of the integers shown above.

@DavidHuber-NOAA
Copy link
Author

DavidHuber-NOAA commented Nov 13, 2024

This issue does not exist on WCOSS2. I tested on Acorn again using a different version of BUFR 11.7.0. The one that I was using is located in /apps/ops/prod/libs/intel/19.1.3.304/bufr/11.7.0. I tried again using the one in /lfs/h1/emc/nceplibs/noscrub/hpc-stack/libs/hpc-stack/intel-19.1.3.304/bufr/11.7.0 and it also produces the spurious values.

It seems that the bufr 11.7.0 library on Acorn has an issue. However, I believe the newly installed BUFR 12.1.0 library is working properly. I am OK with proceeding with the installation of 12.1.0 on Cactus/Dogwood.

@DavidHuber-NOAA
Copy link
Author

@Hang-Lei-NOAA I have tested this version of BUFR 12.1.0 to my satisfaction on Acorn. Could you please forward this request to GDIT for installation on Cactus/Dogwood?

@Hang-Lei-NOAA
Copy link

@DavidHuber-NOAA I will prepare the code delivery soon. But please be patient. Due to the wcoss2 issues in the recent months. NCO has not started the delivery request sent last month.

@DavidHuber-NOAA
Copy link
Author

Thanks @Hang-Lei-NOAA!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants