-
-
Notifications
You must be signed in to change notification settings - Fork 924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hold a hard reference to Ruby threads #6143
Conversation
The second commit here may actually be the only fix needed. We hold a hard reference to the thread but once we are no longer using that local variable the JVM may consider it dereferenced. Doing the Thread.start via the WeakReference opens up the possibility that GC will have claimed the object. Instead, I modified the logic to start the native Thread directly, so it remains referenced until (at least) the Thread#start call has completed. |
I got some confirmation from our JVM friends that this theory could very well be true. Even though we hold a reference to the Thread object in a local variable, by the time we call start we are no longer using that local variable. The start call itself happens after traversing the WeakReference, so if the JVM happens to GC before that point but after the last use of the local variable, we might see the WR get cleared. Incredible "luck" to catch this on film. |
Strange failure in concurrent-ruby here makes me reluctant to merge this just yet. The CyclicBarrier test tries to wait for all threads to finish, but at least one of them seems to still be waiting with this stack trace:
I'm not sure if this is a problem related to my change (which seems unlikely) or something flaky about the concurrent-ruby tests. I will make the simpler change on master (ensure the thread remains hard referenced until |
The old logic maintained a weak reference to the native thread associated with a Ruby thread, in order to avoid keeping its resources alive longer than the thread's lifecycle. However in the case shown in jruby#6142, it seems likely that under heavy load the native thread gets collected before it starts running. Because of the logic we have to silently ignore collected references, the NativeThread.start method simply returns if the thread has gone away, giving us no indication that the thread never actually ran. This patch splits our NativeThread into two types: * RubyNativeThread, which holds a hard reference to the native Thread and implements start to call Thread#start. * AdoptedNativeThread, which holds a weak reference to the native Thread and errors (BUG!) if we attempt to call call start on that adopted thread. I believe this will fix the issue in jruby#6142. This may explain other "one in a million" failures we have never quite tracked down.
This also calls start on the native thread directly, rather than by traversing the native thread object.
d36f744
to
0c8b19d
Compare
Also some misc cleanup surrounding the new ThreadLike impls.
The concurrent-ruby failure appears to be a bad test, which I filed in ruby-concurrency/concurrent-ruby#862 |
The old logic maintained a weak reference to the native thread
associated with a Ruby thread, in order to avoid keeping its
resources alive longer than the thread's lifecycle. However in
the case shown in #6142, it seems likely that under heavy load the
native thread gets collected before it starts running. Because of
the logic we have to silently ignore collected references, the
NativeThread.start method simply returns if the thread has gone
away, giving us no indication that the thread never actually ran.
This patch splits our NativeThread into two types:
Thread and implements start to call Thread#start.
Thread and errors (BUG!) if we attempt to call call start on
that adopted thread.
I believe this will fix the issue in #6142. This may explain other
"one in a million" failures we have never quite tracked down.