You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So since we get a HOST_UP event for a machine that we don't know about, Xandra crashes. Adding nodes to the cassandra cluster shouldn't cause crashes though.
The text was updated successfully, but these errors were encountered:
harunzengin
changed the title
Xandra crashes when a newly added node gets an :up message
Xandra crashes when a newly added node gets sends :up message
Oct 29, 2024
harunzengin
changed the title
Xandra crashes when a newly added node gets sends :up message
Xandra crashes when a newly added node sends :up message
Oct 29, 2024
Looking closer at this, this looks more like a failure in Cassandra Gossip rather than Xandra. We added nodes before and didn't have any problems like this.
When a new host is added, we are supposed to get a NEW_NODE event in Xandra.Cluster.ControlConnection and consequently refresh the topology. But in this case, it seems like there was an error in Cassandra, so we didn't get that event, and somehow got a HOST_UP event instead, which made Xandra crash. Not sure whether we should actually be fixing this or let it crash? @whatyouhide
@harunzengin the C* gossip protocol seems quite messy with this kind of events. It seems to happen really often that we don't get events we're supposed to get. I think a safer course of action is to refresh the topology if we get a HOST_UP for a host we've never seen before, effectively treating it like a NEW_NODE event?
Recently, we added a new node to
Xandra
which caused some issues. The trigger was this:Looking closer, this is where we get the error: https://github.com/whatyouhide/xandra/blob/main/lib/xandra/cluster/pool.ex#L245
So since we get a
HOST_UP
event for a machine that we don't know about, Xandra crashes. Adding nodes to the cassandra cluster shouldn't cause crashes though.The text was updated successfully, but these errors were encountered: