-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2.1: excludes deprecated values from pull responses (backport of #3653) #4981
base: v2.1
Are you sure you want to change the base?
Conversation
(cherry picked from commit cba5d47) # Conflicts: # gossip/src/crds_data.rs
Cherry-pick of cba5d47 has failed:
To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
5ed7771
to
d0c4563
Compare
d0c4563
to
5c32f11
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
@@ -435,7 +454,7 @@ impl Version { | |||
|
|||
#[cfg_attr(feature = "frozen-abi", derive(AbiExample))] | |||
#[derive(Clone, Debug, PartialEq, Eq, Deserialize, Serialize)] | |||
pub struct NodeInstance { | |||
pub(crate) struct NodeInstance { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we still need NodeInstance::new()
. Although looks like needed for tests...
can we get bp justification? we're skipping testnet here no? do we have metrics for frequency at which these messages are being propagated? |
justification for removal of NodeInstance is that we need to speed up the removal of NodeInstance in gossip. Vast majority of gossip traffic spikes are from NodeInstance. This value is deprecated. TBH tho, we don't know yet why we are seeing huge spikes in NodeInstance yet. |
I will re-check all the assumptions leading up to this just in case. It may be that this is actually part of the cause and not of the solution. |
Regardless of what is causing the spikes, we need to get the deprecated values out of gossip asap because they are maintenance burden and a waste of bandwidth and can be a source of bugs/issues. I will also assure you a change like this is a lot more impactful and effective than a bounded channel which indiscriminately drops packets. |
@alexpyattaev can you also please help with debugging gossip spikes as well? |
Collected gossip during a few spikes, investigating. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks appropriate for removing deprecated stuff from PullResponses, but I think it will do nothing for the current spikes. Spikes in load are 90% push messages, not pull responses, so my understanding is that these things are unrelated. Though I would prefer similar logic to be applied to push messages first to mitigate ongoing problems.
There are 2 steps we need to take:
We cannot do (2) before the cluster has upgraded to (1) because then whatever is not pushed is returned in pull-responses anyways which adds even more overhead since pull requests are more expensive to process. Also we cannot upgrade the cluster atomically to the next release, and that is why we need to do (1) and (2) in separate releases, and wait for the cluster to adequately upgrade to (1) first. Otherwise during the upgrade process where half of the cluster is running the old code, there will be a lot more packets because of nodes running old version sending the deprecated types to other nodes through pull-responses. |
Problem
Shouldn't waste bandwidth on propagating deperecated values.
Summary of Changes
The commit excludes deprecated values from pull responses.
This is an automatic backport of pull request #3653 done by Mergify.