Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do nonblocking socket read before select for packet receive #104

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

camlow325
Copy link

I've periodically run into some cases where the receive_packet method hangs indefinitely on an IO.select even though the underlying network socket has actually received some bytes from the network for incoming MQTT packets. I've only been able to consistently reproduce this when running with JRuby 9k for some reason but I believe the underlying condition affects MRI Ruby as well.

Specifically, I believe the condition that I'm running into is the same as the buffering issue documented for OpenSSL::SSL::SSLSocket and IO.select calls at http://ruby-doc.org/core-2.2.0/IO.html#method-c-select.

Especially, the combination of nonblocking methods and IO.select is preferred for IO like objects such as OpenSSL::SSL::SSLSocket. It has to_io method to return underlying IO object. IO.select calls to_io to obtain the file descriptor to wait. This means that readability notified by IO.select doesn’t mean readability from OpenSSL::SSL::SSLSocket object. Most possible situation is OpenSSL::SSL::SSLSocket buffers some data. IO.select doesn’t see the buffer. So IO.select can block when OpenSSL::SSL::SSLSocket#readpartial doesn’t block.

I experimented with changing the logic in receive_packet to use a combination of a read_nonblock call (reading only a single byte at most) followed by an IO.select call in the event that no data is available to be read and an IO.WaitReadable-derived exception is thrown. This PR shows the approach that I used. It does make the logic a bit more messy in that for cases that the first byte in the packet is read before MQTT:Packet.read() is called, it passes along the byte so that the Packet.read() method can use it.

With the approach on this PR, I'm no longer able to reproduce the indefinite IO.select() hangs, on MRI Ruby or JRuby. The read throughput also seems like it might be a bit better with a heavy amount of publish packets coming into the socket since available data seems to be received much more quickly.

Thanks for your consideration for this PR and thanks much for all of the great work that you've put into this library!

@camlow325 camlow325 force-pushed the handle-buffered-read-data-for-select branch from 59edb0f to d23a669 Compare November 15, 2017 00:11
@camlow325
Copy link
Author

The original commit that I had on the PR, b6cda35, failed the Travis run for 1.8.7. I hadn't remembered that the non-blocking APIs that the commit was using did not exist until 1.9. In a new commit, d23a669, I added some logic which allows for the non-blocking APIs to only be used if available. The tests appear to be passing across Ruby versions and continue to no longer trigger the indefinite hang on select that I have been seeing for JRuby 9k.

jbarlow-mcafee added a commit to opendxl-community/opendxl-client-ruby that referenced this pull request Oct 18, 2018
This commit changes the ruby-mqtt dependency to be pinned to the 0.5.0
version instead of a branched version. An exact pin was used to guard
against potential breaking changes in a newer version of the upstream
project (specifically around client connection handling). The previous
branched version included a change which avoids frequent hangs when
running under JRuby -- see njh/ruby-mqtt#104.
By reverting to the upstream release, the JRuby hangs would be
reintroduced for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant