-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More thread-safe GC #529
More thread-safe GC #529
Conversation
BTW, could be worth trying this benchmark: main...ericphanson:PythonCall.jl:eph/timing |
@ericphanson @lassepe @MilesCranmer I think this is the version I'm going to merge. I'd be grateful if you could give it a try to check it solves any issues, and also review the code. Some timings on my machine:
I just need to copy over the extra tests from #520. |
PySR test suite is passing 👍 |
Cool, this also fixes avik-pal/Wandb.jl#27! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the docs also need to be updated with the warning about multithreading (and suggestion that GC
be turned off).
Also, how does this interact with the signal handling, does the warning in the docs still apply?
src/GC/GC.jl
Outdated
end | ||
|
||
function unsafe_free_queue() | ||
lock(QUEUE_LOCK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the lock(...) do ...
API since it will still unlock in the case of an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I considered it but (a) it adds 30ns and (b) the code in between should not throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's the additional function call from the closure. In that case I would try
lock(QUEUE_LOCK)
try
#= do stuff =#
finally
unlock(QUEUE_LOCK)
end
And hopefully that's a bit faster.
If anything it's just a good habit. I think there's always a possibility for some kind of error in Julia code (until the day we can actually impose these types of constraints on code), so locks should always be safely handled. e.g., perhaps the user could <Ctrl-C>
while the lock is in place, or there's an OOM error, or something else we haven't thought of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW Base.@lock
(which is what I used in my PR) expands to exactly the code you wrote there @MilesCranmer, and is exported on more recent julia versions
src/GC/GC.jl
Outdated
else | ||
lock(QUEUE_LOCK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer lock(...) do ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(As above)
I think we need SpinLock since we are in a finalizer! We aren’t allowed to yield like ReentrantLock does |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One out of date comment, otherwise lgtm (though I should say I’m not expert and was just learning as I went in #520)
Regarding the iterate(items) === nothing This is how Base checks for an array being empty. I think what can happen is: thread A can (1) call the In other words, use of Therefore I think all uses of Basically anytime |
I commented elsewhere that you're looking at the wrong method of isempty(a::AbstractArray) = (length(a) == 0) which I believe is safe. It might give wrong answers occasionally in the presence of a race condition, but it's safe. |
Oh yeah, you are totally right, sorry. |
Ok, no other comments from me. I think it's ready to go 👍 |
Thanks for your help reviewing! |
No problem 👍 (By the way, could we get a release so we can fix the Wandb.jl issue?) |
Yeah I'll make a release once #535 is merged (going to sleep on it and bikeshed the new function names). |
Sounds good. In case #535 seems like it will take some more discussion to settle on an API I wonder if we could have a patch release just so we can get the bug fixes through? |
It's released |
Thanks! |
A draft of an alternative to #520 to ensure Python GC only runs on thread 1.
This version avoids using tasks. Instead, when a Python object is finalized:
I also added
PythonCall.GC.gc()
to explicitly clear the queue (if possible).Furthermore I added an immortal object
GCHook()
which callsPythonCall.GC.gc()
whenever it is finalized, then adds its own finalizer back on. This means the queue can be cleared periodically when Julia GC runs, because it keeps trying to finalizeGCHook()
.I used
PyGILState_Check() == 1
to check that the GIL is currently held. This should make it safe to interact with Python from any thread in the future, provided you use the C APIPyGILState_*
andPyThreadState_*
functions correctly.Very much a draft. Need to use
Channel
s etc (like #520) to make this totally thread-safe.