You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the node goes down it will close client connections (probably not always? I guess if it dies unexpectedly then it has no way to), and the connections in the driver will notice it. The logs look like this:
18:51:41,609 cassandra.io.libevreactor DEBUG libevreactor.py:373 | Connection <LibevConnection(140694180404560) 127.0.10.1:9042> closed by server
18:51:41,609 cassandra.io.libevreactor DEBUG libevreactor.py:287 | Closing connection (140694180404560) to 127.0.10.1:9042
18:51:41,610 cassandra.io.libevreactor DEBUG libevreactor.py:291 | Closed socket to 127.0.10.1:9042
18:51:41,610 cassandra.io.libevreactor DEBUG libevreactor.py:373 | Connection <LibevConnection(140694185696976) 127.0.10.1:9042> closed by server
18:51:41,610 cassandra.io.libevreactor DEBUG libevreactor.py:287 | Closing connection (140694185696976) to 127.0.10.1:9042
18:51:41,610 cassandra.io.libevreactor DEBUG libevreactor.py:291 | Closed socket to 127.0.10.1:9042
18:51:41,610 cassandra.io.libevreactor DEBUG libevreactor.py:373 | Connection <LibevConnection(140694185158224) 127.0.10.1:19042> closed by server
18:51:41,610 cassandra.io.libevreactor DEBUG libevreactor.py:287 | Closing connection (140694185158224) to 127.0.10.1:19042
18:51:41,611 cassandra.io.libevreactor DEBUG libevreactor.py:291 | Closed socket to 127.0.10.1:19042
18:51:41,611 cassandra.io.libevreactor DEBUG libevreactor.py:373 | Connection <LibevConnection(140694180402832) 127.0.10.1:19042> closed by server
18:51:41,611 cassandra.io.libevreactor DEBUG libevreactor.py:287 | Closing connection (140694180402832) to 127.0.10.1:19042
18:51:41,611 cassandra.io.libevreactor DEBUG libevreactor.py:291 | Closed socket to 127.0.10.1:19042
the problem is that the information about those connections closing is not propagated anywhere: driver still thinks it has fully functioning connection pool - and if dead node was the one driver had control connection opened to, then the driver still thinks it has functioning control connection and waits for events.
Driver will notice that those connections are dead only when it tries to use them - send heartbeat / cql query / refresh schema etc.
TCP keep-alive is not the solution here. The connection itself (and by connection I mean instance of Connection class) was closed gracefully and the connection knows that it was closed.
The issue is that the connection doesn't propagate this information to the Cluster object.
Discovered when investigating https://github.com/scylladb/scylla-dtest/issues/4364
When the node goes down it will close client connections (probably not always? I guess if it dies unexpectedly then it has no way to), and the connections in the driver will notice it. The logs look like this:
the problem is that the information about those connections closing is not propagated anywhere: driver still thinks it has fully functioning connection pool - and if dead node was the one driver had control connection opened to, then the driver still thinks it has functioning control connection and waits for events.
Driver will notice that those connections are dead only when it tries to use them - send heartbeat / cql query / refresh schema etc.
This is a problem in the following scenario (this is done in https://github.com/scylladb/scylla-dtest/issues/4364):
What the driver should do is propagate the information from single connection upwards and reopen connections / mark host as down.
The text was updated successfully, but these errors were encountered: