Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for open PRs when checking missing installations in build script #493

Closed

Conversation

ocaisa
Copy link
Member

@ocaisa ocaisa commented Mar 8, 2024

As mentioned in #479 , with the changes we've made to checking for missing installations the errors from the bot are cryptic when a referenced PR is not merged in EasyBuild. This PR allows for open PRs during the build, but still fails them in GitHub Actions (with a bit more helpful text).

Copy link

eessi-bot bot commented Mar 8, 2024

Instance eessi-bot-mc-aws is configured to build:

  • arch x86_64/generic for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/generic for repo eessi-hpc.org-2023.06-software
  • arch x86_64/generic for repo eessi.io-2023.06-compat
  • arch x86_64/generic for repo eessi.io-2023.06-software
  • arch x86_64/intel/haswell for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/intel/haswell for repo eessi-hpc.org-2023.06-software
  • arch x86_64/intel/haswell for repo eessi.io-2023.06-compat
  • arch x86_64/intel/haswell for repo eessi.io-2023.06-software
  • arch x86_64/intel/skylake_avx512 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/intel/skylake_avx512 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/intel/skylake_avx512 for repo eessi.io-2023.06-compat
  • arch x86_64/intel/skylake_avx512 for repo eessi.io-2023.06-software
  • arch x86_64/amd/zen2 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen2 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen2 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen2 for repo eessi.io-2023.06-software
  • arch x86_64/amd/zen3 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen3 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen3 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen3 for repo eessi.io-2023.06-software
  • arch aarch64/generic for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/generic for repo eessi-hpc.org-2023.06-software
  • arch aarch64/generic for repo eessi.io-2023.06-compat
  • arch aarch64/generic for repo eessi.io-2023.06-software
  • arch aarch64/neoverse_n1 for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/neoverse_n1 for repo eessi-hpc.org-2023.06-software
  • arch aarch64/neoverse_n1 for repo eessi.io-2023.06-compat
  • arch aarch64/neoverse_n1 for repo eessi.io-2023.06-software
  • arch aarch64/neoverse_v1 for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/neoverse_v1 for repo eessi-hpc.org-2023.06-software
  • arch aarch64/neoverse_v1 for repo eessi.io-2023.06-compat
  • arch aarch64/neoverse_v1 for repo eessi.io-2023.06-software

Copy link
Collaborator

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this interactively using an EasyStack file containing:

easyconfigs:
  - GCC-13.2.0.eb
  - foss-2023b.eb
  - SciPy-bundle-2023.11-gfbf-2023b.eb
  - netCDF-4.9.2-gompi-2023b.eb:
      options:
        from-pr: 19534
  - matplotlib-3.8.2-gfbf-2023b.eb:
      options:
        from-pr: 19552
  - CDO-2.2.2-gompi-2023b.eb:
      options:
        from-pr: 19792
  - CFITSIO-4.3.1-GCCcore-13.2.0.eb:
      options:
        from-pr: 19840

I downloaded wget https://github.com/EESSI/software-layer/pull/479.diff to get a diff that would create an exception for CFITSIO.

Then, I ran

./check_missing_installations.sh easystacks/software.eessi.io/2023.06/eessi-2023.06-eb-4.9.0-2023b.yml 479.diff

interactively. That resulted in a succesful run, i.e. it did not print anything with the message (are you sure all PRs referenced have been merged in EasyBuild?).

I'm not sure if this was your intention? I can't imagine it is, the only moment in which this error would be printed is if one of the software listed in the EasyStack file from previous PR had a --from-pr that wasn't merged yet. That should not have happened if we did our reviews well... :)

Also, you mentioned that this will fail in Github Actions, but I don't see any change to the github CI. It also made me think: that might be too late, as we do the deploy even when github CI is failing (we are used to it, since one of the CI checks is 'was this PR deployed' for every architecture).

As mentioned in #479 (comment) I don't disagree with the fact that it results in a failure (so maybe we should) not make an exception, but I would like to give a more clear message (reported back to the PR by the bot) as to why it failed.

I'm currently working on something that'll add a "Reason" to the failure, similar to what I did for the test step in https://github.com/EESSI/software-layer/pull/467/files . I think the work you did here is going to allign well with that (I'd feel bad for telling you it wouldn't after you spend a day on this ;-)). But, it would require some slight modification:

  • create a 2nd EasyStack file in which the exception for the --from-pr is made
  • do your original thing first, i.e. run only against develop.
  • If the exist status is 1 and if the output file contains a pattern like ERROR: One or more files not found: AOFlagger-3.4.0-foss-2023b.eb (search paths: /tmp/tmp.d83nFSn6v6/easyconfigs/easybuild/easyconfigs), then we want to run again with your exception.
  • If that 2nd succeeds, we know the problem was an umerged PR, and we print the failure message you envisioned ...(are you sure all PRs referenced have been merged in EasyBuild?)

In my own PR (that modifies check-build, I can pick up on that pattern (which is very specific), and have the bot print it as "Reason" in it's response in the PR. My own PR isn't there yet, but you can take a sneak peek in https://github.com/casparvl/software-layer/tree/improve_error_on_unmerged_pr

@casparvl
Copy link
Collaborator

casparvl commented Mar 9, 2024

N.B. After finishing my part, I'll pull in your changes to my PR, make the changes I suggest above, and test with the CFITSIO test case above to demonstrate the behaviour...

@ocaisa
Copy link
Member Author

ocaisa commented Mar 9, 2024

This does work as intended, when the script is called by the bot it will allow the open PR and pass. There is another GitHub Action that we call that will only check against develop...so this will stop us from merging something thay has not been integrated upstream

@casparvl
Copy link
Collaborator

casparvl commented Mar 9, 2024

This does work as intended, when the script is called by the bot it will allow the open PR and pass.

Ok, but then when does that error message ever get printed? Only in the case where past PRs have added software with --from-pr that still isn't merged?

There is another GitHub Action that we call that will only check against develop...so this will stop us from merging something thay has not been integrated upstream

You mean that already exists? Or do you have another PR for that? The biggest issue there is that it would still allow us, reviewers, to add the deploy label and get stuff deployed that is not merged. In that sense, the CI-check is 'too late', isn't it?

Anyway, this was my idea #494 . It integrates your awk-logic from this PR, to figure out if an unmerged PR is indeed the reason for the failure. I still need to debug it, it's not actually printing the Reason: section, I need to figure out why :P

@casparvl
Copy link
Collaborator

casparvl commented Mar 9, 2024

To answer my own first question: I see you add that to the message if the check_missing_installation.sh script is called without a PR diff, i.e. without exceptions. Ok, then I see what your intention was with this, it's just a different one than I anticipated :) You wanted to make it pass (provided the callee passes the prdiff to allow for exceptions). My idea would still be to make it fail, to prevent Reviewers from deploy-ing the PR. (does the bot do a hard-fail on deploying tarballs if their latest status was FAILURE? I don't know, I assumed/hoped so, but maybe not...)

@casparvl
Copy link
Collaborator

casparvl commented Mar 9, 2024

Ok, #494 (comment) does what I want now - see the new "Reason" header in there.

Honestly, I think we should try to more extensively grep stuff so that we can expose better to the end-user what went wrong during a build without the end-user immediately having to ask us (for context from the slum-XYZ.log) or having to immediately try to replicate locally with https://www.eessi.io/docs/adding_software/debugging_failed_builds/ . Both add quite a bit of latency to figuring out what's wrong... :)

@ocaisa
Copy link
Member Author

ocaisa commented Mar 11, 2024

This has been rolled into #494 so closing this

@ocaisa ocaisa closed this Mar 11, 2024
@ocaisa ocaisa deleted the tweak_missing_installs branch March 11, 2024 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants