-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/dbus call issue #2838
Fix/dbus call issue #2838
Conversation
Signed-off-by: Yashodhan Joshi <[email protected]>
Signed-off-by: Yashodhan Joshi <[email protected]>
c1843c2
to
1d94b9b
Compare
// in Youki, we only ever do method call apart from initial auth | ||
// in case it is, we don't really have a specific message to look | ||
// out of, so we take the buffer and break | ||
if mtype != MessageType::MethodCall { | ||
break; | ||
} | ||
|
||
// check if any of the received message is method return or error type | ||
let return_message_count = ret | ||
.iter() | ||
.filter(|m| { | ||
m.preamble.mtype == MessageType::MethodReturn | ||
|| m.preamble.mtype == MessageType::Error | ||
}) | ||
.count(); | ||
|
||
if return_message_count > 0 { | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like suggestion for other ways to implement this. Ideally I feel that send_message
function should not have to deal with the specific message type, but as we collect the response here, we need to account for it. Other way I can think is if we separate the send message and get response functions, so that the higher level calling functions can choose to loop over the get response. But that means we would have to do two function calls at each place we are doing a send message call. Would that be a better way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that means we would have to do two function calls at each place we are doing a send message call. Would that be a better way?
I do not think it is. The complexity of should be contained in send_message.
The caller should not have to think about it.
As per the #2826 (comment) , this PR likely solves the issue, so requesting a review of the code now (not urgent). |
|
||
let reply_rcvd = match reply_res { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// in Youki, we only ever do method call apart from initial auth | ||
// in case it is, we don't really have a specific message to look | ||
// out of, so we take the buffer and break | ||
if mtype != MessageType::MethodCall { | ||
break; | ||
} | ||
|
||
// check if any of the received message is method return or error type | ||
let return_message_count = ret | ||
.iter() | ||
.filter(|m| { | ||
m.preamble.mtype == MessageType::MethodReturn | ||
|| m.preamble.mtype == MessageType::Error | ||
}) | ||
.count(); | ||
|
||
if return_message_count > 0 { | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that means we would have to do two function calls at each place we are doing a send message call. Would that be a better way?
I do not think it is. The complexity of should be contained in send_message.
The caller should not have to think about it.
// for method calls, we need to have an error or method return type message, so | ||
// we keep looping until we get either of these. see https://github.com/containers/youki/issues/2826 | ||
// for more detailed analysis. | ||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand why this loop is necessary. Shouldn't receive_complete_response
have read everything from the socket the first time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way receive_complete_response
is implemented, it loops internally to read from socket into a buffer, and if read_bytes < buffer_size, it returns the buffer. This is because dbus uses same socket for communications, and the read will block if there is no further data to read. Hence we return once the received data is less than buffer.
Unfortunately, the dbus messages can come in multiple bursts, so we can have two messages that get delivered together, or they can have few milliseconds between them, and thus one read call might only give single message < size of buffer. We need to loop here to keep consuming messages until we find one that we are looking for, which in this case is the method_return or error type.
buf = &buf[ctr..]; | ||
ret.push(msg); | ||
|
||
// it is possible that while receiving messages, we get some extra/previous message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know why we get these extra messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now the only extra messages that are observed are of type signal, which basically broadcast some event to all connected clients. The thing is we used to get these before as well, but usually these would get consumed in buffers of some other method calls or such, and get ignored. In few cases that would not happen, which lead to the dbus error in question. To solve this, now we keep consuming all messages until we get method return or error message (in case of method call message).
Thanks @YJDoc2 ! |
This fixes the issues with dbus method call not receiving reply. The RC was that the current implementation stopped after receiving one message, and did not continue. Thus in some cases the signal type message would be returned, and we would discard it as its not the method reply type. As we did not look for further messages, it seemed that there was no reply from dbus. Now if the sent message was of type method call, we keep looping until we get either a method reply or method error type of message. Ideally the correct way would be to keep collecting messages in a stateful buffer and return when we get a message corresponding to the serial number we sent, but it can be tricky to implement that correctly, and at least for now we do not need that.
This should still be considered WIP, I needed review so opened this, but I need to add some related shell scripts I used for future needs.
Apart from that this also has fix for when intermediate process would error and leave a broken socket, without any error. Now it properly attempts to send the error message over the socket so that main process can correctly identify and report it.