Hold a hard reference to Ruby threads #6143

headius · 2020-03-27T09:00:11Z

The old logic maintained a weak reference to the native thread
associated with a Ruby thread, in order to avoid keeping its
resources alive longer than the thread's lifecycle. However in
the case shown in #6142, it seems likely that under heavy load the
native thread gets collected before it starts running. Because of
the logic we have to silently ignore collected references, the
NativeThread.start method simply returns if the thread has gone
away, giving us no indication that the thread never actually ran.

This patch splits our NativeThread into two types:

RubyNativeThread, which holds a hard reference to the native
Thread and implements start to call Thread#start.
AdoptedNativeThread, which holds a weak reference to the native
Thread and errors (BUG!) if we attempt to call call start on
that adopted thread.

I believe this will fix the issue in #6142. This may explain other
"one in a million" failures we have never quite tracked down.

headius · 2020-03-27T09:13:40Z

The second commit here may actually be the only fix needed. We hold a hard reference to the thread but once we are no longer using that local variable the JVM may consider it dereferenced. Doing the Thread.start via the WeakReference opens up the possibility that GC will have claimed the object. Instead, I modified the logic to start the native Thread directly, so it remains referenced until (at least) the Thread#start call has completed.

headius · 2020-03-27T09:17:24Z

I got some confirmation from our JVM friends that this theory could very well be true. Even though we hold a reference to the Thread object in a local variable, by the time we call start we are no longer using that local variable. The start call itself happens after traversing the WeakReference, so if the JVM happens to GC before that point but after the last use of the local variable, we might see the WR get cleared.

Incredible "luck" to catch this on film.

headius · 2020-03-27T19:41:01Z

Strange failure in concurrent-ruby here makes me reluctant to merge this just yet. The CyclicBarrier test tries to wait for all threads to finish, but at least one of them seems to still be waiting with this stack trace:

  #number_waiting
    without any waiting thread
      should be equal to zero
    with waiting threads
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:460)
org.jruby.RubyThread$SleepTask.run(RubyThread.java:1531)
org.jruby.RubyThread.executeBlockingTask(RubyThread.java:1584)
org.jruby.RubyThread.wait_timeout(RubyThread.java:2132)
com.concurrent_ruby.ext.SynchronizationLibrary$JRubyLockableObject.nsWait(SynchronizationLibrary.java:254)
com.concurrent_ruby.ext.SynchronizationLibrary$JRubyLockableObject$INVOKER$i$0$1$nsWait.call(SynchronizationLibrary$JRubyLockableObject$INVOKER$i$0$1$nsWait.gen)
org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:815)
org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:203)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:172)
org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:316)
org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:72)
org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:86)
org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:156)
org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:143)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:182)
org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:191)
org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:337)
org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:72)
org.jruby.ir.interpreter.Interpreter.INTERPRET_BLOCK(Interpreter.java:116)
org.jruby.runtime.MixedModeIRBlockBody.commonYieldPath(MixedModeIRBlockBody.java:137)
org.jruby.runtime.IRBlockBody.doYield(IRBlockBody.java:172)
org.jruby.runtime.BlockBody.yield(BlockBody.java:108)
org.jruby.runtime.Block.yield(Block.java:184)
com.concurrent_ruby.ext.SynchronizationLibrary$JRubyLockableObject.rubySynchronize(SynchronizationLibrary.java:232)

I'm not sure if this is a problem related to my change (which seems unlikely) or something flaky about the concurrent-ruby tests. I will make the simpler change on master (ensure the thread remains hard referenced until start returns) and use this PR to work on additional cleanup and refactoring of this thread lifecycle.

The old logic maintained a weak reference to the native thread associated with a Ruby thread, in order to avoid keeping its resources alive longer than the thread's lifecycle. However in the case shown in jruby#6142, it seems likely that under heavy load the native thread gets collected before it starts running. Because of the logic we have to silently ignore collected references, the NativeThread.start method simply returns if the thread has gone away, giving us no indication that the thread never actually ran. This patch splits our NativeThread into two types: * RubyNativeThread, which holds a hard reference to the native Thread and implements start to call Thread#start. * AdoptedNativeThread, which holds a weak reference to the native Thread and errors (BUG!) if we attempt to call call start on that adopted thread. I believe this will fix the issue in jruby#6142. This may explain other "one in a million" failures we have never quite tracked down.

This also calls start on the native thread directly, rather than by traversing the native thread object.

Also some misc cleanup surrounding the new ThreadLike impls.

headius · 2020-04-14T21:50:00Z

The concurrent-ruby failure appears to be a bad test, which I filed in ruby-concurrency/concurrent-ruby#862

headius added this to the JRuby 9.3.0.0 milestone Mar 27, 2020

headius linked an issue Mar 27, 2020 that may be closed by this pull request

Thread didn't execute? #6142

Closed

headius added 2 commits April 14, 2020 15:42

Remove start from ThreadLike since only Ruby threads can start

0c8b19d

This also calls start on the native thread directly, rather than by traversing the native thread object.

headius force-pushed the hard_ref_ruby_threads branch from d36f744 to 0c8b19d Compare April 14, 2020 20:46

Remove last vestiges of future threading

bfc2216

Also some misc cleanup surrounding the new ThreadLike impls.

headius merged commit a38cc23 into jruby:master Apr 14, 2020

headius deleted the hard_ref_ruby_threads branch April 14, 2020 22:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hold a hard reference to Ruby threads #6143

Hold a hard reference to Ruby threads #6143

headius commented Mar 27, 2020

headius commented Mar 27, 2020

headius commented Mar 27, 2020

headius commented Mar 27, 2020

headius commented Apr 14, 2020

Hold a hard reference to Ruby threads #6143

Hold a hard reference to Ruby threads #6143

Conversation

headius commented Mar 27, 2020

headius commented Mar 27, 2020

headius commented Mar 27, 2020

headius commented Mar 27, 2020

headius commented Apr 14, 2020