-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug related to Reliable resend of old sample after emission of a new one #344
Comments
Thank you for such a detailed report and a brilliant reproduction example! This does indeed look like a bug in RustDDS's Reliable Reader logic, so we'll investigate. Initial findings so far:
I am able to reproduce the issue, but it is not the same! Now receiver reports "Received 2" instead of "Received 3", like you were getting. And sample number 3 is not received. We will continue looking into this. |
This was indeed a bug in DataReader's caching and metadata generation logic. It was possible for a Reliable DataReader to deliver samples to the application in the wrong order, if they are received out-of-order, which is what you are testing here. I just released RustDDS 0.10.2, where this is fixed. However, the result that you are receiving only sample My test results with History depth=2 are as follows:
Naturally, in order to cope with more severe data reordering, the History depth would have to be increased more. I am still uncertain if this is the intention of the DDS spec. Is a DDS implementation allowed to lose samples with QoS Reliability::Reliable, if the QoS History does not provide a buffer large enough to reorder the samples?
If you have any thoughts on this, or manage to find a part of the DDS Spec to clarify this, please leave a comment. |
I think the DDS Spec part
IMO:
|
Thanks a lot for your answer and for the fix, I'm grateful to be able to participate to this open-source project ! |
Sorry, but I cannot follow what is your logic here. The term "send" here is a bit ambiguous. Also the word "it" after "receive" is ambiguous, but I assume it refers to the "1st sample". If we send samples 1,2,3,.., we can send them in order, with Reliability=Reliable, without waiting for an acknowledgement from all readers in between, provided that we can buffer them in the Writer for possible retransmit. In my reading, the two spec paragraphs have a different meaning. First paragraph says that a Reliable DataWriter may (or even should) block until timeout, if the write operation would cause data to be lost. This is a mechanism to eventually throttle writing, if a DataWriter tries to write faster than the matched Reliable DataReaders can acknowledge that they have the data. If the write call results in a timeout, then that indicates the write did not succeed. The second paragraph states that a Reliable Datareader should receive the samples in order. If a RTPS Reader receives 1 and 3, it should deliver only 1 to DataReader and request retransmit of 2 from Writer. Note that a Writer may respond either by sending 2, or by a Gap message indicating that 2 is no longer available. In the latter case DataReader will deliver 3, and 2 is never delivered. There is no explicit indication to the application about this, but it can be detected by e.g. inspecting sample lost status in a DataReader or DomainParticipant. This is where DDS reliability is different from e.g. TCP, which will absolutely refuse to continue while there is a gap in the received data. |
Summary
In Reliable mode, if sample
A
is sent first but not received, then sampleB
is sent, the middleware will drop sampleB
, waiting for sampleA
, to make sure to have a proper history. Then, sampleB
should be re-emitted and received by the DataReader. This doesn't seem to be the case.It seems that sample
A
is never received, even if it is re-sent.Steps to reproduce
Test setup configuration
Tested on Ubuntu 22.04. The test uses
iptables
to drop packets.We give iptables rights to the user executing the test:
/etc/sudoers
Minimal reproducible example
minimal_reproducible_example.zip
The following diagram explains the minimal reproducible example:
On the network capture, we see that samples are re-sent correctly, and ACK packets are good too, but we don't see any logs about receiving Sample 2 in the application.
What is the expected correct behavior?
Receiving application receives all samples.
Relevant logs and/or screenshots
Minimal reproducible example logs:
Minimal reproducible example commented network capture
Complete test-plan sent by e-mail (Test
Reliability_QOS_3
).Analysis
There must be a problem in the user API of RustDDS, between the RTPS reception and the add to the sample queue. We may want to take a look at
src/dds/with_key/datasample_cache.rs
.The text was updated successfully, but these errors were encountered: