Fix/dbus call issue #2838

YJDoc2 · 2024-06-30T11:19:07Z

This fixes the issues with dbus method call not receiving reply. The RC was that the current implementation stopped after receiving one message, and did not continue. Thus in some cases the signal type message would be returned, and we would discard it as its not the method reply type. As we did not look for further messages, it seemed that there was no reply from dbus. Now if the sent message was of type method call, we keep looping until we get either a method reply or method error type of message. Ideally the correct way would be to keep collecting messages in a stateful buffer and return when we get a message corresponding to the serial number we sent, but it can be tricky to implement that correctly, and at least for now we do not need that.

This should still be considered WIP, I needed review so opened this, but I need to add some related shell scripts I used for future needs.

Apart from that this also has fix for when intermediate process would error and leave a broken socket, without any error. Now it properly attempts to send the error message over the socket so that main process can correctly identify and report it.

Signed-off-by: Yashodhan Joshi <[email protected]>

YJDoc2 · 2024-06-30T13:35:21Z

crates/libcgroups/src/systemd/dbus_native/dbus.rs

+            // in Youki, we only ever do method call apart from initial auth
+            // in case it is, we don't really have a specific message to look
+            // out of, so we take the buffer and break
+            if mtype != MessageType::MethodCall {
+                break;
+            }
+
+            // check if any of the received message is method return or error type
+            let return_message_count = ret
+                .iter()
+                .filter(|m| {
+                    m.preamble.mtype == MessageType::MethodReturn
+                        || m.preamble.mtype == MessageType::Error
+                })
+                .count();
+
+            if return_message_count > 0 {
+                break;
+            }


I'd like suggestion for other ways to implement this. Ideally I feel that send_message function should not have to deal with the specific message type, but as we collect the response here, we need to account for it. Other way I can think is if we separate the send message and get response functions, so that the higher level calling functions can choose to loop over the get response. But that means we would have to do two function calls at each place we are doing a send message call. Would that be a better way?

But that means we would have to do two function calls at each place we are doing a send message call. Would that be a better way?

I do not think it is. The complexity of should be contained in send_message. The caller should not have to think about it.

YJDoc2 · 2024-07-04T06:44:55Z

As per the #2826 (comment) , this PR likely solves the issue, so requesting a review of the code now (not urgent).

Furisto · 2024-07-05T15:35:34Z

crates/libcgroups/src/systemd/dbus_native/dbus.rs


+            let reply_rcvd = match reply_res {


Furisto · 2024-07-05T15:51:09Z

crates/libcgroups/src/systemd/dbus_native/dbus.rs

+            // in Youki, we only ever do method call apart from initial auth
+            // in case it is, we don't really have a specific message to look
+            // out of, so we take the buffer and break
+            if mtype != MessageType::MethodCall {
+                break;
+            }
+
+            // check if any of the received message is method return or error type
+            let return_message_count = ret
+                .iter()
+                .filter(|m| {
+                    m.preamble.mtype == MessageType::MethodReturn
+                        || m.preamble.mtype == MessageType::Error
+                })
+                .count();
+
+            if return_message_count > 0 {
+                break;
+            }


But that means we would have to do two function calls at each place we are doing a send message call. Would that be a better way?

I do not think it is. The complexity of should be contained in send_message. The caller should not have to think about it.

Furisto · 2024-07-05T15:53:48Z

crates/libcgroups/src/systemd/dbus_native/dbus.rs

+        // for method calls, we need to have an error or method return type message, so
+        // we keep looping until we get either of these. see https://github.com/containers/youki/issues/2826
+        // for more detailed analysis.
+        loop {


I am not sure I understand why this loop is necessary. Shouldn't receive_complete_response have read everything from the socket the first time?

The way receive_complete_response is implemented, it loops internally to read from socket into a buffer, and if read_bytes < buffer_size, it returns the buffer. This is because dbus uses same socket for communications, and the read will block if there is no further data to read. Hence we return once the received data is less than buffer.
Unfortunately, the dbus messages can come in multiple bursts, so we can have two messages that get delivered together, or they can have few milliseconds between them, and thus one read call might only give single message < size of buffer. We need to loop here to keep consuming messages until we find one that we are looking for, which in this case is the method_return or error type.

Furisto · 2024-07-05T15:54:28Z

crates/libcgroups/src/systemd/dbus_native/dbus.rs

-            buf = &buf[ctr..];
-            ret.push(msg);
+
+        // it is possible that while receiving messages, we get some extra/previous message


Do we know why we get these extra messages?

As of now the only extra messages that are observed are of type signal, which basically broadcast some event to all connected clients. The thing is we used to get these before as well, but usually these would get consumed in buffers of some other method calls or such, and get ignored. In few cases that would not happen, which lead to the dbus error in question. To solve this, now we keep consuming all messages until we get method return or error message (in case of method call message).

Furisto · 2024-07-09T21:53:26Z

Thanks @YJDoc2 !

loop in recieving messages for dbus method call

56753c3

Signed-off-by: Yashodhan Joshi <[email protected]>

YJDoc2 added the kind/bug label Jun 30, 2024

send error if intermediate fails proeprly to the main process

1d94b9b

Signed-off-by: Yashodhan Joshi <[email protected]>

YJDoc2 force-pushed the fix/dbus_call_issue branch from c1843c2 to 1d94b9b Compare June 30, 2024 12:18

YJDoc2 mentioned this pull request Jun 30, 2024

bug: unreliable behavior in stress tests failed to delete container #2826

Closed

YJDoc2 commented Jun 30, 2024

View reviewed changes

YJDoc2 requested a review from a team July 4, 2024 06:45

Furisto reviewed Jul 5, 2024

View reviewed changes

Furisto approved these changes Jul 9, 2024

View reviewed changes

Furisto merged commit d6c5996 into youki-dev:main Jul 9, 2024
28 checks passed

github-actions bot mentioned this pull request Jul 9, 2024

Release for v0.4.0 #2791

Merged

YJDoc2 mentioned this pull request Jul 11, 2024

Youki does not have reply from dbus #2559

Closed

github-actions bot mentioned this pull request Aug 23, 2024

Release for v0.4.0 #2891

Closed

YJDoc2 deleted the fix/dbus_call_issue branch September 18, 2024 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/dbus call issue #2838

Fix/dbus call issue #2838

YJDoc2 commented Jun 30, 2024 •

edited

Loading

YJDoc2 Jun 30, 2024

Furisto Jul 5, 2024

YJDoc2 commented Jul 4, 2024

Furisto Jul 5, 2024

Furisto Jul 5, 2024

Furisto Jul 5, 2024

YJDoc2 Jul 7, 2024

Furisto Jul 5, 2024

YJDoc2 Jul 7, 2024

Furisto commented Jul 9, 2024

Fix/dbus call issue #2838

Fix/dbus call issue #2838

Conversation

YJDoc2 commented Jun 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YJDoc2 commented Jul 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Furisto commented Jul 9, 2024

YJDoc2 commented Jun 30, 2024 •

edited

Loading