Replies: 5 comments 3 replies
-
Taking a closer look at the log file:
It looks like curl had some difficulty. Running grep to find where curl was referenced I saw:
That led me to search for the URL in the base package:
which led me to:
Looking in ./test/CMakeLists.txt, I see:
and looking at https://ftp.emc.ncep.noaa.gov/static_files/public, I can see that bufr.tar does not exist: I had previously attempted to compile version 11.7.0 of BUFRLIB, but ran into difficult at a later step. However, I should note that test/CMakeLists.txt for version 11.7.0 still points to 11.6.0:
although perhaps it doesn't matter since neither bufr-11.7.0.tgz nor bufr.tar exist at that URL. I don't fully understand why only bufr-11.6.0.tgz versions exist at that URL but a generic bufr.tar is being referenced there. The problem in compiling BUFRLIB version 11.7.0 on Derecho lies in the ctest step:
Please see: Realizing I hadn't tried installing version 11.6.0 of BUFRLIB, I went ahead and tried that, and fortunately got a successful compilation of BUFRLIB using the new oneAPI compilers on Derecho. @jbathegit, I see you are the code manager for this repo. Is this Discussions forum the appropriate place to post for help with BUFRLIB or is there another preferred way to request help? |
Beta Was this translation helpful? Give feedback.
-
Hello @jprestop. Thanks for reaching out to us, and I'm sorry to hear that you're encountering difficulties! First off, and to answer your last question, we typically use the "Discussions" forum to bandy about ideas internally among all of the library developers, and we ask users to reach out to us with any questions or concerns by opening a new issue under the "Issues" tab. That will typically get it seen by a wider audience and therefore hopefully a more timely response, but no worries and I'll give you some initial feedback here. At this point, we didn't expect that anyone would still be trying to build v11.5.0, because it's more than two and a half years old and we've fixed a number of bugs and added a number of useful new features to the library in a number of new releases since then. So we're really trying to wean people off of those older versions, and we have reached out to the WCOSS2 folks several times to ask them to link their "default" label to a newer version, but it's been slow trying to get any movement on that. Along those lines, the bufr.tar was only intended for versions 11.5.0 and older, and any newer library version beyond 11.6.0 only needs the bufr-11.6.0.tgz. In other words, if you're trying to build 11.7.0 or later, then it should indeed be trying to pull the bufr-11.6.0.tgz file (i.e. that's not a mistake, and there really is no corresponding .tgz file for 11.7.0 ;-) That said, and looking at your LastTest.log file, it looks like there was a mistake on our end in omitting the testoutput file "sinv.out" from the latest bufr-11.6.0.tgz. This happened because we recently changed the sinv utility and its associated tests in a more recent version of the library, and we apparently forgot to include the old "sinv.out" in the latest tarfile for backwards compatibility with testing earlier versions. I apologize for that, and I've now gone ahead and fixed that on the server, so it should work now for your 11.7.0 build if you go ahead and pull that tarfile again and rerun that test for that build. However, that still leaves 3 other tests of your 11.7.0 build that apparently segfaulted, and I wasn't able to reproduce those locally so I'm a bit stuck. Admittedly we only have Intel 19.1.3.304 on our WCOSS2 system, whereas you apparently have a much newer version of Intel that you're trying to build with, so I'd probably need to be able to access the artifacts of those failed tests (including the .x files) on that particular machine in order to hook them up to a debugger or stack tracer on our end to try and narrow down where the segfaults are actually happening. And I'd probably also need you to try rebuilding your library with I'm sorry I don't have a better answer at this point. We do test the library extensively on a number of platforms and OS variants including Intel, MacOS, Linux, etc., but again we don't have access to any machines running more recent versions of Intel, so that may well at least partially explain the problem. If you really need it, I can try to reconstruct a bufr.tar from some old records and put it back on the server, but again that wouldn't allow you to build and test anything beyond v11.5.0 of the library, and we are really trying to focus our limited support resources on newer versions. |
Beta Was this translation helpful? Give feedback.
-
Sorry I got sidetracked on a different task, and thanks for the heads-up about the intel-oneapi/2022.2.0.262 compiler! I wasn't aware that was even an option on WCOSS2, so I just tried it myself with a clone of 11.7.0, but it almost immediately conked out with a weird error in the make step:
The only thing I changed here was swapping out the newer Intel compiler module for the older 19.1.3.304 version, so not sure if maybe there's something else I also need to change. At least based on a quick Google search, it looks like it may have been a compiler bug that wasn't fixed until a later version 2022.12.30.005, but that was also using ifx and I'm still only using ifort. So not sure if that's really the same thing, and I'll have to dig some more to try to figure that out. In the meantime, one of the artifacts that might be useful is if you could maybe somehow share copies of your debufr and test_OUT_6_8.x executables - the ones that you were able to build with the oneAPI compiler but which were failing in ctest. Those executables should be in your build/utils and build/test directories, respectively, and maybe you could share them as email attachments, or post them somewhere on public ftp or some other server that I could access? My thinking was that if you were able to build those with the |
Beta Was this translation helpful? Give feedback.
-
Follow-up - I reached out to one of our WCOSS2 GDIT support folks, and he confirmed that the above error is something internal to the compiler, so not much we could do there other than open a ticket with Intel. But instead, he pointed me to a test installation of a newer version 2023.2.0 of the oneAPI compiler that he'd already built on acorn (i.e. the WCOSS2 test and development machine), and he asked me to try that. I did (with bufr v11.7.0), and everything built and ran fine with that version of the oneAPI compiler, and with no failed tests. Could you please clarify what version of the oneAPI compiler you've been using in your tests? I'd definitely like to help you nail this down if we can - thanks! |
Beta Was this translation helpful? Give feedback.
-
I know this has become a stale thread, but for the record any oneAPI compilation issues should now be fixed in the latest v12.1.0 release of the library. See #536 for more details, and so long as you're using OneAPI 2024.2 or later, you should be good to go now! |
Beta Was this translation helpful? Give feedback.
-
I am a software engineer on the METplus team. I am working on getting our MET code to compile on the Derecho supercomputer using the new Intel LLVM-based compilers Intel® oneAPI C++/Fortran Compiler (ICX/IFX). I am having trouble compiling bufr_v11.5.0.tar.gz. I first tried using 11.7.0, but also had problems. Then, I switched to using 11.5.0 because that is the default version being used on WCOSS2.
Have you compiled bufr_v11.5.0.tar.gz using the new intel compilers yet? Can you help me resolve these problems? Please see the attached bufr.make.log file from Derecho.
bufr.make.log
Beta Was this translation helpful? Give feedback.
All reactions