-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add pidlock to public functions #45
base: main
Are you sure you want to change the base?
Conversation
Do you know if this will work on a cluster if the shared filesystem does not support unix file locking? The "pidfile" approach, which stores the hostname and process ID of the acquirer in the lock file, does not depend on filesystem support for locking. I don't think |
Filelock advertises itself as platform-independent, and has windows and unix specific functions. I haven't been able to test this on windows, however. |
One question I have about this PR is where to write the lock. I have it being written to the source directory of pyjuliapkg, but I suspect it would be better to use some location which is specified by the STATE dict. I just don't know which one makes sense to use here. |
It can be platform independent and still not work well across multiple hosts on a shared filesystem. |
I see what you're asking. The filelock library seems to advise the use of the |
Yeah, I understand this is a nontrivial choice. I shopped around for a Python package that ticked all the boxes (basically replicating the Julia solution) but didn't find one that worked well out of the... well, box. Any locking is definitely better than none here and I appreciate your taking this on! Edit: And I don't mean to sound like I'm a maintainer here. I'm not! Just a user quite invested in making it easier to have python packages that use julia (including for use on clusters). |
Looking at package managers in python:
|
In case it's useful, I used a somewhat modified version of pidlock to externally lock Also the github version has a defect, in that it checks for file existence and then, if the file exists, opens the file without exception handling. The delay between existence check and open allows other processes or hosts to delete the file and cause a crash. This can be fixed through handling exceptions on open instead of checking for existence separately. |
Looks like the conda approach is quite similar to pidlock, but using a directory instead of a file. It also doesn't seem to include the hostname at the moment. Maybe worth stealing? |
I'm happy to choose whichever pidlock we would like, but it seems out of scope to try to do better than e.g. virtualenv. @cjdoris, do you have a preference for which file lock you prefer? Can we take on the maintenance burden of maintaining our own lock here, or is this best effort choice of FileLock satisfactory? |
I suppose I would also like to see an example of a failing test with the current setup before I roll my own file lock, so that we can verify that a different choice of file lock fixes the test. @amilsted, do you have additional tests to contribute to the PR? |
Very fair. This is tricky, because I think it's hard to break FileLock on a single host and a non-network filesystem. I will think about it. I suppose one could argue that |
Counterpoint: Julia core uses https://github.com/vtjnash/Pidfile.jl to do hostname/PID-based locking to protect precompilation and Pkg operations, because this type of locking is robust on HPC clusters. |
This PR locks the basic functions of pyjuliapkg. I used FileLock instead of pidlock because I needed a reentrant lock. I also used a thread lock. There is a test which I have verified does show that this fix avoids redundant resolves. fixes #19.