Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit to one simultaneous job per user #109

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/pylint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ jobs:
pip install matplotlib
pip install bokeh
pip install pylint
pip install fasteners

- name: Validate
run: pylint --errors-only *.py
Expand Down
19 changes: 17 additions & 2 deletions regtest.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@
import time
import re
import json
import fasteners
import tempfile
import getpass

import params
import test_util
Expand Down Expand Up @@ -1293,5 +1296,17 @@ def email_developers():


if __name__ == "__main__":
n = test_suite(sys.argv[1:])
sys.exit(n)
temp_dir = tempfile.gettempdir()
temp_file = os.path.join(temp_dir, 'amrex_regtest_lock_file_'+getpass.getuser())
test_lock = fasteners.InterProcessLock(temp_file)
gotten = test_lock.acquire(blocking=False)
status = 255
try:
if gotten:
status = test_suite(sys.argv[1:])
else:
print("Wait please! Another process is running regression tests.")
finally:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if we never reach finally, e.g., if the try: block calls sys.exit or if some other reason aborts mission. Should we somehow clean up the file lock on fresh runs?

I am also a bit confused with the tempdir above: would multiple users not receive different temp dirs and thus never see the same locks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a process dies without calling release, a fresh run will still be able to acquire the lock, because the lock is on the number of file handles not on the existence of the file.

This is meant to limit multiple runs by the same user, not among different users. At CCSE, we use a shared account on our regression testing machine. Multiple people use that account to run jobs manually. People often started a run without realizing another person had already started a run. The second run would kill the first one making the first person wonder what had just happened.

if gotten:
test_lock.release()
sys.exit(status)