You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The clone pattern describes how to get an out-of-band snapshot for a late subscriber. For this it uses a second, independent socket (different port number). Since TCP does not guarantee the order of packets across sockets, there is no (obvious) guarantee that the subscription is made before the snapshot is taken. If indeed the snapshot is taken before the subscription is ready, there is again the chance of missing an update.
I have implemented a similar pattern, but I have replaced the out-of-band snapshot communication with a different (non-ZeroMQ) protocol. When writing a test for this, I discovered that the race condition is actually quite likely. The test is sending an update through the PUB-SUB channel immediately after the snapshot has been read. This update is lost with about 50% probability in my setup.
The title of the example "Reliable Pub-Sub (Clone Pattern)" suggests that such race conditions would not take place. If indeed there is a true guarantee somehow, which only works if also the out-of-band communication is done via ZeroMQ, this should IMO be mentioned in the description. If the guarantee cannot be given, I would recommend to mention this as well.
There may be situations where such loss of information is acceptable, but in other situations it is not. A simple example for this would be a rarely and irregularly changing value, which was by chance changed exactly in the moment the late subscriber joins. Maybe the next update is done a week later. The late subscriber then sees for one week an outdated value. It would be at least necessary to know about such potential issue, so one can think of a work around (can someone point me to the best option here, please?).
The text was updated successfully, but these errors were encountered:
Hi @mhier,
you are making very valid points here. It would be awesome if you could supply some code that shows the race condition(s).
Currently we're working on a different solution for the late subscriber which heavily copies from kafka. The protocol and reference implementation is called dafka (https://github.com/zeromq/dafka). The protocol is almost finished but the user API is missing some finishing touches.
Thanks for confirming my assumptions :-) I am relatively new to ZeroMQ and hence could overlook something easily.
I have unfortunately no pure ZeroMQ code to reproduce the race condition (only code which uses our proprietary protocol for the out-of-band communication, but it has heavy depndencies). I can cook something up later, unfortunately I have some time pressure right now so please be a bit patient (could be a few weeks...).
What I can offer you right now is proof-of-concept code that does not have the race condition:
It uses an XPUB server, which sends out an inband snapshot as soon as it detects a new subscription. The code needs refining, though, since all clients receive the snapshot, not just the one which made the subscription. A possible implementation would be to use a special prefix for the snapshot which is subscribed after doing the main subscription and unsubscribed again after the snapshot has been received.
The clone pattern describes how to get an out-of-band snapshot for a late subscriber. For this it uses a second, independent socket (different port number). Since TCP does not guarantee the order of packets across sockets, there is no (obvious) guarantee that the subscription is made before the snapshot is taken. If indeed the snapshot is taken before the subscription is ready, there is again the chance of missing an update.
I have implemented a similar pattern, but I have replaced the out-of-band snapshot communication with a different (non-ZeroMQ) protocol. When writing a test for this, I discovered that the race condition is actually quite likely. The test is sending an update through the PUB-SUB channel immediately after the snapshot has been read. This update is lost with about 50% probability in my setup.
The title of the example "Reliable Pub-Sub (Clone Pattern)" suggests that such race conditions would not take place. If indeed there is a true guarantee somehow, which only works if also the out-of-band communication is done via ZeroMQ, this should IMO be mentioned in the description. If the guarantee cannot be given, I would recommend to mention this as well.
There may be situations where such loss of information is acceptable, but in other situations it is not. A simple example for this would be a rarely and irregularly changing value, which was by chance changed exactly in the moment the late subscriber joins. Maybe the next update is done a week later. The late subscriber then sees for one week an outdated value. It would be at least necessary to know about such potential issue, so one can think of a work around (can someone point me to the best option here, please?).
The text was updated successfully, but these errors were encountered: