-
-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix IPv6 support in Xandra.Cluster
#328
Fix IPv6 support in Xandra.Cluster
#328
Conversation
…peer_address configuration
We don't just need to validate the IPv6 addresses. As shown here, we need to actually be able to use the RPC addresses because the addresses shown in |
lib/xandra/cluster.ex
Outdated
@@ -345,6 +345,15 @@ defmodule Xandra.Cluster do | |||
*Available since v0.18.0*. | |||
""" | |||
], | |||
use_rpc_address_for_peer_address: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the cases when we'd want to use peer
instead of rpc_address
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peer
was the original implementation. I don't know why you would use peer
instead of rpc_address
and not much information when I google this, so going to phone a friend. @relistan do you know or think this should just solely use rpc_address
and no longer use peer
? (I think that is what @whatyouhide is asking).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's what I’m saying. @jacquelineIO I also recommend doing just what any other C* driver maintained by Datastax is doing. I always reference the Python driver. I don't have much time to look into this until maybe Sunday, so if you want you can just take a look there and use what they're using 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it is using rpc_address
with a fallback to peer
if rpc_address
isn't available
- https://github.com/datastax/python-driver/blob/8ba0a5ccd71b66c77ee58994ed9da9ea34ff4cbe/cassandra/cluster.py#L3782
- https://github.com/datastax/python-driver/blob/8ba0a5ccd71b66c77ee58994ed9da9ea34ff4cbe/cassandra/metadata.py#L3381
So I can remove this flag and have a preference for using rpc_address
, which the code is basically already set up to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's do that so that we don't have to put this decision on users 🙃 Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, catching up here. That sounds right to me, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacquelineIO any news on this and timeline on when I can expect these changes to go in? I'm planning the 0.18.0 release, that's why I ask 🙃
queried_peer_to_host(peer_attrs, use_rpc_address) | ||
end | ||
|
||
defp queried_peer_to_host(%{"rpc_address" => _} = peer_attrs, true = use_rpc_address) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is it that we get rpc_address
as a string? Xandra should return :inet.ip_address/0
for IPs, with a 4-element tuple for IPv4 and 8-element tuple for IPv6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the peers table is queried, it was using the string.
For example this was one of the logs (when address is ipv6 this was another issue because the validation for ipv6 strings fails in xandra)
[] 21:35:58.910 [info] Xandra: [:connected]: %{address: '10.4.0.12', connection_name: nil, port: 11172, protocol_module: Xandra.Protocol.V4, supported_options: %{"COMPRESSION" => ["snappy", "lz4"], "CQL_VERSION" => ["3.4.5"], "PROTOCOL_VERSIONS" => ["3/v3", "4/v4", "5/v5", "6/v6-beta"]}}
Now the log looks like
[] 16:52:41.083 [info] Xandra Cluster: [:change_event]: %{cluster_name: {:local, FastSummariesConsumer.Store}, cluster_pid: #PID<0.5488.0>, event_type: :host_added, host: %Xandra.Cluster.Host{address: {10, 4, 0, 12},
There is still another place where strings are being used for the original host for the control connection if I'm not mistaken but it broke a lot of tests trying to change that out. That might be good for a different PR to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the other location where string is in use https://github.com/Shimmur/xandra/blob/c563d737b99bcdec195a38408db490fec046a5e8/lib/xandra/cluster/pool.ex#L469 and I think might be good to convert this to a tuple as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the peers table is queried, it was using the string.
I’m confused. If you look at the schema for the system.peers
table in C* itself, rpc_address
is of type inet
, so how can the result of the query to that table include rpc_address
as a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also adding the validate_node options. There was only one previously and it took a string https://github.com/lexhide/xandra/blob/main/lib/xandra/options_validators.ex#L51
if it were not a string, it would have failed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about in this particular case, but we were seeing IPv6 addresses getting returned as strings. This was then causing DNS lookups which failed. @jacquelineIO can confirm if this is the reason or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whatyouhide you are right about the query returning the inet format.
It looks like there are some back n forth translations happening, and maybe it shouldn't be.
https://github.com/lexhide/xandra/blob/96303f07bae820ff693d85f2e1585cb343118e00/lib/xandra/cluster/pool.ex#L464-L466 part of the code is formatting the :node
as a string.
e.g.
host = %Xandra.Cluster.Host{address: {10, 4, 0, 12}, port: 1234}
Xandra.Cluster.Host.format_address(host)
"10.4.0.12:1234"
so conn_options
has a string and the id
is a tuple. https://github.com/lexhide/xandra/blob/96303f07bae820ff693d85f2e1585cb343118e00/lib/xandra/cluster/pool.ex#L469-L471
When validate_node
is run on the node address it is fine because it was a string.
So now I'm confused too. I do not completely follow the flow here...
I assumed this was pulling a string and wanted to make sure it was only using a tuple. BUT then also validate node started failing (if my memory is correct) so I had to update that too. So there might be multiple (at least one more) places in the code that a connection is happening, the first one beingstart_pool
...
I got the desired result but the original connection is still a string. I this it makes sense to not use format_address
for code (maybe only for logging things).
(I also feel like I'm talking in circles)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacquelineIO okay yeah this code I have in Xandra is a little bit messy. I'll do another pass and fix this up later on then, so that we can standardize on having a single way of representing addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whatyouhide that went in already 8340b4d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah cool awesome! Will review this soon then
Xandra.Cluster
@whatyouhide those test failures seem to have occurred with ScyllaDB. I am suspicious about that. Is there a way to get it to re-run these without committing? |
@relistan yep I can re-run the CI suite any time, if you can't (top-right on GH Actions) because you don't have commit access to the repo then just ping me and I'll do it for ya 🙃 |
I can't trigger a rebuild. But it looks like @jacquelineIO needs to run |
So these test runs feel sporadic, I was running the formatter and it was down to 1 build failure (not counting the linter) now it's up to 4. And previously I had none before taking out the option to use rpc_address. So I'm not sure what to do here. |
We have some flaky tests (check the issue tracker), so I would suggest to not worry about tests too much for now, let's get the changes in and when we're ready I can take a look at the failures and merge even if they're there, no worries 🙃 |
The change to use |
Thanks @jacquelineIO! 💟 |
rpc_address
when availableXandra will now use tuples for IP addresses instead of strings, which are valid Erlang types. And the tuples are able to run through node validations.
History of why we started working on these changes:
Started this work on 0.16 of Xandra with this PR. Even with these changes we still encountered problems with deploying and Karl opened up an issue with Xandra.
New changes were released for Xandra and reapplying the old PR changes to v0.17 of Xandra.
We are using this lib and the host structs are returning ipv6 addresses for our clusters. The
validate_node
function is failing on ipv6 addresses"The pool supervisor will run the validations again on the ipv6 values returned in https://github.com/lexhide/xandra/blob/v0.16.0/lib/xandra/cluster.ex#L793 and in the logs we see that our hosts are returned with both ipv4 and ipv6 addresses. The ipv6 addresses are being used and fail to validate." - @britto
These changes are tagged v0.17.x-community.1.