Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support version skew between Antrea Agent and Flow Aggregator #6777

Open
antoninbas opened this issue Oct 29, 2024 · 1 comment
Open

Support version skew between Antrea Agent and Flow Aggregator #6777

antoninbas opened this issue Oct 29, 2024 · 1 comment
Labels
area/flow-visibility/aggregator Issues or PRs related to Flow Aggregator area/flow-visibility Issues or PRs related to flow visibility support in Antrea kind/feature Categorizes issue or PR as related to a new feature.

Comments

@antoninbas
Copy link
Contributor

antoninbas commented Oct 29, 2024

Describe the problem/challenge you have
At the moment, we do not support version skew between Antrea Agent and Flow Aggregator. Let's take a concrete example:
In Antrea v2.0, we introduced a new Information Element (IE), egressNodeName (#6012). It so happened that this was introduced in a major version release, but we also routinely introduce new IEs in minor version releases. If one tries to update the Antrea Agent (from v1.15 to v2.0) before the Flow Aggregator, the existing Flow Aggregator (v1.15) will reject the new IPFIX templates sent by the new Antrea Agent (v2.0) - see https://github.com/vmware/go-ipfix/blob/main/pkg/collector/process.go. If one tries to update the Flow Aggregator first it will also not work, as the aggregation process reuses the IPFIX data "record" received from the Agent, which will not match the template sent by the Flow Aggregator to the external IPFIX collector.

For large clusters, a rolling update of the antrea-agent DaemonSet can take a while, so a version mismatch between some Agents and the Flow Aggregator is expected and that situation will remain until the update completes.

Describe the solution you'd like
We should tolerate some version skew between the Antrea Agent and the Flow Aggregator (N-2/N+2), for graceful updates and to ensure that connection data can still be exported during the update window.

For example:

  1. The Flow Aggregator could gracefully discards unknown IEs in the records received from the Agents, in order to support "newer" Agents
  2. The Flow Aggregator could add missing IEs using a default value in the records received from the Agents, in order to support "older" Agents

Because item 2) is more problematic than 1) (what's an appropriate default value?), we could specify than in order to achieve graceful update, the Flow Aggregator should be updated last. In that case, we would have version(FlowAggregator) <= version(Agent), and 1) would be sufficient. Once the FlowAggregator is itself updated, it will be able to "forward" the newly introduced IEs.

@antoninbas antoninbas added kind/feature Categorizes issue or PR as related to a new feature. area/flow-visibility Issues or PRs related to flow visibility support in Antrea area/flow-visibility/aggregator Issues or PRs related to Flow Aggregator labels Oct 29, 2024
@antoninbas
Copy link
Contributor Author

@tnqn @heanlan @yuntanghsu for visibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/flow-visibility/aggregator Issues or PRs related to Flow Aggregator area/flow-visibility Issues or PRs related to flow visibility support in Antrea kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

1 participant