Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically build and test RDASPP (Hera: using role.rrfs-fv3-cam, Jet: using role.wrfruc) #177

Open
guoqing-noaa opened this issue Sep 18, 2024 · 12 comments
Assignees

Comments

@guoqing-noaa
Copy link
Collaborator

guoqing-noaa commented Sep 18, 2024

I would expect it will take at least 6 months (or even longer) to establish CI tests for RDASApp (through Jenkins, I think).

Before that, we still need some kind of automatic build and test of RDASApp to eliminate mergers' manual testing before a PR merge. (if a PR does NOT change the build behavior nor change the codes, we can skip this build_and_test process).

And we can establish this functionality right now.

The general idea is to set a crontab job on Hera/Jet/Hercules. It runs every 5 minutes, use the Github command line tool to check current available PRs which has already been marked with a label "read_to_build_test" and then clone that PR, build, run rrfs tests, post the test results to the PR webpage. If succeeded, add a label "hera_passed"; if failed, add a label "hera_failed" (similar things for jet, hercules).

@SamuelDegelia-NOAA Are you interested in working on this together?

@ShunLiu-NOAA
Copy link

@guoqing-noaa I remember that inherited CI function from GDASApp. @TingLei-daprediction, @CoryMartin-NOAA and @delippi used to work on this. We need to add this function to role account. Let's discuss this later.

@guoqing-noaa
Copy link
Collaborator Author

@guoqing-noaa I remember that inherited CI function from GDASApp. @TingLei-daprediction, @CoryMartin-NOAA and @delippi used to work on this. We need to add this function to role account. Let's discuss this later.

@ShunLiu-NOAA Thanks for the information! It looks like we do have some scripts there under the ci/ directory. We can start from there. And we need this as soon as possible. It is preferred that we do a thorough fresh tests on at least Hera/Jet/Hercules before merging a PR which changes the codes or the build behaviors.

@CoryMartin-NOAA Could you help us understand how GDASApp launches CI tests on RDHPCS? Through a cron job or through other mechanism like Jenkins? Thanks!

@SamuelDegelia-NOAA
Copy link
Contributor

@guoqing-noaa I'm open to helping with this but I will wait for others to chime in with thoughts.

@guoqing-noaa
Copy link
Collaborator Author

Update on this: The RDHPCS admin confirmed that we don't have Jenkins server on any RDHPCS and no CI/CD through github allowed on on-prem systems (although RDHPCS cloud is allowed)

RDHPCS IMT Decision
It is the decision of the RDHPCS Integrated Management Team (IMT) to deny this request for on-premise HPC systems. Although the IMT recognizes the need to have Continuous Integration code development on HPC resources, it has been deemed too risky to allow this functionality to exist on a shared on-premise HPC resource, including Jet, Hera, Niagara, and PPAN. Due to the transient and isolated nature of HPC in the Cloud, the RDHPCS IMT authorizes the use of these GitHub Runners on RDHPCS Cloud resources.

Although we may use https://github.com/jenkinsci/jenkinsfile-runner but that loses the benefits of a Jenkins server which can trigger CI/CD automatically.

So, @SamuelDegelia-NOAA let's go ahead to implement the cron-job based CI/CD.

@guoqing-noaa
Copy link
Collaborator Author

I can understand that we do ctests inside the RDASApp. But why we want to put the git clone part inside RDASApp which we intend to clone and test? Does it mean we use a previous version of RDASApp to clone the PRs' repo and branches?

@ShunLiu-NOAA
Copy link

@guoqing-noaa Let's discuss this after I collect more information from EMC collogues.

@guoqing-noaa
Copy link
Collaborator Author

I've set up cron jobs (every 5 minutes) on the following platforms using the corresponding role accounts:

hera        <->   role.rrfs-fv3-cam
jet         <->   role.wrfruc
hercules    <->   role-wrfruc

Once a PR gets two approvals and is ready for a potential merge, the repo maintainers manually add 'test_hera', test_jet, and test_hercules labels to this PR, and then the cron jobs on those platforms will be triggered.

I tried this automatic build_and_test on a fork:
https://github.com/comgsi/RDASApp/pulls
It worked as expected. I will implement this to this authoritative repo soon.

NOTE: there are NO changes to the RDASApp repo itself.
It works as if there is one person who "manually" clones/builds/tests RDASApp on the above platforms respectively.

@guoqing-noaa
Copy link
Collaborator Author

guoqing-noaa commented Sep 23, 2024

To clarify, this is NOT the normal CI/CD we would usually expect (i.e. no Jenkins step at the moment). But we will use this to automatically build and test every PR until a streamlined CI/CD is in place.

@guoqing-noaa guoqing-noaa changed the title Automatically build and test RDASPP (Hera: using role.rrfs-fv3-cam, Jet: using role.rtrr) Automatically build and test RDASPP (Hera: using role.rrfs-fv3-cam, Jet: using role.wrfruc) Sep 23, 2024
@ShunLiu-NOAA
Copy link

@guoqing-noaa great progress.

@guoqing-noaa
Copy link
Collaborator Author

update on this: @ShunLiu-NOAA @SamuelDegelia-NOAA and I had a tag up today and we decided to test drive the CI tests on RDASApp. The first try was on PR #175 and it worked well.

Also, as Sam suggested, the cron job at different HPCs will automatically remove the corresponding testing directory once a PR is merged.
To compensate the HPC downtime, there is an extra mechanism which will remove all testing directories older than 14 days.
All these will ensure that the CI tests will NOT consume too much disk space.

The code management policy was updated accordingly.

@CoryMartin-NOAA
Copy link
Contributor

Just to chime in here, but I think you all figured most of it out. The capability is using a cron job with GitHub CLI that checks for labels attached to open pull requests, and if the labels match, run tests, if the tests pass or fail, a new label is applied. I'm happy to discuss details on how it works for the global.

@guoqing-noaa
Copy link
Collaborator Author

@CoryMartin-NOAA Thanks for the information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants