Expected Behavior of Cluster Providers with Inaccessible Members #2034
Replies: 2 comments 2 replies
-
After delving deeper into the Proto.Actor ecosystem, I took a closer look at the Kubernetes Cluster Provider. I observed that its member monitor appears to be functioning as expected: it provides a list of cluster members that are actively running. Furthermore, it seems to rely on Kubernetes itself to furnish information regarding the status of the pods, whether they are running or not. Drawing a parallel, it might be beneficial for the Azure Container Apps cluster provider to adopt a similar approach. Perhaps, in addition to relying on Azure's internal member store, the provider could implement a mechanism to ping each member to verify its reachability. This would ensure a more accurate representation of active members and potentially mitigate issues with attempting connections to inaccessible members. Would love to hear thoughts on this approach or if there are other considerations I might be overlooking. |
Beta Was this translation helpful? Give feedback.
-
Proto.Actor Cluster Providers - Insights I had a very insightful conversation on the Slack channel regarding the behavior of cluster providers, specifically around managing inaccessible members. I thought it would be beneficial to share a summarized version here for the broader community's benefit. Summary:
A big shoutout to everyone on Slack for the insights and feedback. Super helpful stuff. For those keen on diving deeper, the Slack thread has all the nitty-gritty details. Cheers |
Beta Was this translation helpful? Give feedback.
-
Hello Proto.Actor community,
I'm currently working with the Azure Container Apps cluster member provider and have encountered a scenario I'd like to clarify.
When the cluster provider fetches members from its internal member store, there's a possibility that some of these members might no longer be accessible (e.g., a revision/pod that has been terminated). In my observations, the Proto.Remote.ServerConnector repeatedly attempts to establish a connection to such members, but eventually discontinues due to connection failures.
This leads me to my main question:
Is it the inherent responsibility of the cluster provider to ensure that it only returns members that are currently online?
Should the cluster provider actively monitor these members and update the cluster member list if any of them go offline?
I'm trying to understand if this behavior is by design or if there's an expectation for the cluster provider to manage the member list more proactively.
Thank you for your insights!
Beta Was this translation helpful? Give feedback.
All reactions