Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate race conditions and fix tests #3

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

vrurg
Copy link

@vrurg vrurg commented Mar 7, 2021

This PR solves two problems:

  1. Stopping a service results in X::AdHoc produced by Net::ZMQ::poll_one because of a closed socket.
  2. Polling threads started by Cro::ZeroMQ::Source::Impure and Cro::ZeroMQ::Source::Pure are not always ready to accept messages by the time the first ones are submitted. It was breaking t/zeromq-xpub-xsub.t test.

vrurg added 2 commits March 6, 2021 18:55
Since `poll_one` is a blocking call, when service is stopped it throws
with `X::AdHoc` due to -1 returned by `zmq_poll`. Unfortunately,
`poll_one` doesn't use `zmq_die` and therefore there is no way to
determine if the error is caused by socket closing or not. For this
reason this PR relies on `$closer` value to find out if the exception
must be rethrows or dropped.

PR arnsholt/Net-ZMQ#22 is submitted to replace plain `die` with
`zmq_die`. But until then this PR should be ok get things straighten
out.
`Cro::ZeroMQ::Source::Impure` and `Cro::ZeroMQ::Source::Pure` both
suffer from a race where their polling threads are started too late to
actually receive first submitted messages. In particular, this caused
t/zeromq-xpub-xsub.t to fail with nearly 100% probability.

This commit makes sure that the polling thread is actually ready to
receive data before completing a supply.
start {
$poller-ready.keep;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure I forgot a lot of this codebase, but shouldn't the keep call be closer to the poll_one call? This PR makes the race gap smaller, but it can be even smaller, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would. But here is how I see it. The closest point is right before the call itself. But it would have to be like this:

unless ($already-set) { $poller-ready.keep; $already-set = True; }

I don't feel really good about sticking in a piece of code which is basically nop 99.99999% of time. Sure, moar can optimize it away, but it's not guaranteed.

Another way is to

FIRST $poller-ready.keep;

But the only difference this makes with the PR code is that it would be done right after the loop which makes too little difference to make sense.

The last option is once. It could be placed right before poll_one. But there is risk of immediate closing of supply which would result in last activated before once is ever reached – and therefore await will stuck.

Ultimately, I wonder why such a complex approach is used? Why not simple supply { loop { emit poll_one } }?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, I wonder why such a complex approach is used? Why not simple supply { loop { emit poll_one } }?

I am quite sure it's just that this code was not written with best practices on mind. If you have time and mind to rewrite this piece in a better way (not just patch) with something simpler and more robust - you are welcome.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do it straightforward way, but the thing I missed is that when poll_one blocks – it blocks the whole supply chain. Combined with non-guaranteed message delivery by zmq in certain scenarios, I currently don't see a completely risk-free solution.

Unfortunately, I only study zmq for I possibly will have tasks for it in the future. It's unlikely I will be able to find a better solution until then. And even then it feels to me that Net::ZMQ would need to be changed too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have no more comments on this one, maybe @jnthn has.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants