Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetworkPolicy Evaluation E2E Tests #6111

Merged
merged 1 commit into from
May 9, 2024
Merged

Conversation

qiyueyao
Copy link
Contributor

The PR will add E2E tests for networkpolicyevaluation introduced in #5740 .

This framework design and test example is for discussion before moving on.

@qiyueyao
Copy link
Contributor Author

Current Design

  • Adds flag --networkpolicy_evaluation="" or "basic" or "extended" when running e2e tests.
    • "": evaluation will be skipped
    • "basic": only a few number of evaluation test cases marked "basic" will be run
    • "extended": all predefined evaluation test cases marked "basic" and "extended" will be run (perhaps periodically on Github)
  • For each Antrea Policy test case, if we want to add evaluation test cases, create an npEvaluation matrix initialized with no test. expect with 1) from, 2) to Pods, 3) the expected NetworkPolicy name, 4) and the level of the test case "basic" or "extended"
  • When executing test cases, go through the evaluation matrix for evaluation tests to run. Use antctl remote access.

Work left

  • Review every Antrea Policy e2e test cases to add suitable evaluation test cases. Similar to the example given.
  • Add flags to scripts.

@qiyueyao qiyueyao force-pushed the e2e-npeval branch 2 times, most recently from 62f5a99 to c8d89cd Compare March 19, 2024 18:57
@qiyueyao
Copy link
Contributor Author

Hi @Dyanngg @tnqn @antoninbas , this is the framework for E2E tests of networkpolicyevaluation, I've included an example in ACNPPriorityOverride. Could you help pre-review the design before I add more test cases based on this? Thanks!

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my understanding is that with this proposal, we "piggy-back" NP evaluation testing on existing e2e policy tests, which sounds fine.
However, I do not understand why we want 2 different levels (basic vs extended). I think that just a boolean toggle would be good enough. We can choose to run e2e policy tests with or without NP evaluation.

@qiyueyao
Copy link
Contributor Author

However, I do not understand why we want 2 different levels (basic vs extended). I think that just a boolean toggle would be good enough. We can choose to run e2e policy tests with or without NP evaluation.

The idea was for "basic" tests, we run them triggered by each PR, for "extended" tests, we run them periodically to reduce test workload (since probing antctl for each evaluation adds to the time cost). How does that sound?

For each antreapolicy test, there are ~9*9 possible pairs to evaluate, we could pick one of the directly matched pair for "basic", some others for "extended" (pairs not matching the NP should return empty, other matched pairs should return correct NP etc.)

@antoninbas
Copy link
Contributor

The idea was for "basic" tests, we run them triggered by each PR, for "extended" tests, we run them periodically to reduce test workload (since probing antctl for each evaluation adds to the time cost). How does that sound?

The problem with periodic tests is that people rarely check them or act on failures.
How much time would it add concretely? I don't think we have to check every single pair for each subtest.

@qiyueyao
Copy link
Contributor Author

The idea was for "basic" tests, we run them triggered by each PR, for "extended" tests, we run them periodically to reduce test workload (since probing antctl for each evaluation adds to the time cost). How does that sound?

The problem with periodic tests is that people rarely check them or act on failures. How much time would it add concretely? I don't think we have to check every single pair for each subtest.

Sure, I'll first make some must test cases, run them on a binary condition. If they cause too much latency, we can then split them for extended.

@qiyueyao qiyueyao changed the title [ForDiscussion] NetworkPolicy Evaluation E2E Test Framework NetworkPolicy Evaluation E2E Tests Apr 9, 2024
@qiyueyao qiyueyao marked this pull request as ready for review April 9, 2024 20:27
@qiyueyao qiyueyao requested a review from Dyanngg April 9, 2024 20:27
test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
@qiyueyao
Copy link
Contributor Author

qiyueyao commented Apr 16, 2024

Depends on #6218 and #6217 . Both merged.

@qiyueyao qiyueyao force-pushed the e2e-npeval branch 3 times, most recently from 6f50b55 to de419fd Compare April 29, 2024 18:50
@qiyueyao qiyueyao requested review from antoninbas, Dyanngg and tnqn April 29, 2024 20:23
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what was the impact on test runtime?

test/e2e/reachability.go Outdated Show resolved Hide resolved
Comment on lines 4304 to 4305
assert.Containsf(t, stdout, eval.NPName, "Failure -- wrong results for evaluation: NetworkPolicy: %s, expected: %s", stdout, eval.NPName)
assert.Contains(t, stdout, eval.Action, "Failure -- wrong results for evaluation: Action: %s, expected: %s", stdout, eval.Action)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should do something better here, doesn't the antctl command support structured output with -o yaml or -o json?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas An issue arose for Isolate again, as currently Isolate action is filled in antctl when processing Table output. For Yaml and Json output, they directly marshel response from API, where action is still nil.

The only ideal way is to fill Isolate in API, but we had this discussion of not creating a real action for isolation. So I'm thinking just adding isolate string to RuleAction? This way it's not a real action but still fits the response.

reason: 1) antctl processes all Yaml and Json response together, so I assume it's not ideal to change that
2) I can also modify here in antreapolicy_test when comparing, but it would be confusing and Yaml and Json issue remains

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I currently implemented this solution, let me know of any other suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot modify the API as part of an "e2e tests" PR. There needs to be a dedicated PR.

What should have been done when designing the API is that the Action field in the RuleRef struct used for the NetworkPolicyEvaluation response should not have been of type crdv1beta1.RuleAction:

Action *crdv1beta1.RuleAction `json:"action,omitempty" protobuf:"bytes,3,opt,name=action,casttype=antrea.io/antrea/pkg/apis/security/v1beta1.RuleAction"`

It should have been either of type string or a different type which would have included the Isolate action.

The API should also have been versioned with v1alpha1, instead of being made beta right away.

I feel like we should not modify the API at this stage, without introducing a new version (e.g. v1beta3). Given that this is quite a process, we should probably not do that simply for the sake of e2e testing. So we can revert back to your initial version and do string matching, or maybe do some more structured parsing with a regex.

That being said, I feel like other consumers of this API (besides antctl) may find it inconvenient that the "Isolate" action is not part of the API itself. We can wait until we get some feedback before making any change to the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thanks for the detailed instruction. I've reverted the change of API, and modified the test case with better processing of Table output.

test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
test/e2e/antctl_test.go Outdated Show resolved Hide resolved
test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
test/e2e/reachability.go Outdated Show resolved Hide resolved
test/e2e/reachability.go Outdated Show resolved Hide resolved
test/e2e/reachability.go Outdated Show resolved Hide resolved
@qiyueyao
Copy link
Contributor Author

qiyueyao commented May 2, 2024

so what was the impact on test runtime?

For individual tests, there is no significant impact. For running Kind E2E tests, it is not identifiable when comparing with other full run all-features-enabled.

// Should not be possible.
action = "Unknown"
}
response.Rule.Action = (*v1beta1.RuleAction)(&action)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not good. We cannot cast to v1beta1.RuleAction knowing that "Isolate" and "Unknown" are not supposed to be possible options for v1beta1.RuleAction

Comment on lines 4304 to 4305
assert.Containsf(t, stdout, eval.NPName, "Failure -- wrong results for evaluation: NetworkPolicy: %s, expected: %s", stdout, eval.NPName)
assert.Contains(t, stdout, eval.Action, "Failure -- wrong results for evaluation: Action: %s, expected: %s", stdout, eval.Action)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot modify the API as part of an "e2e tests" PR. There needs to be a dedicated PR.

What should have been done when designing the API is that the Action field in the RuleRef struct used for the NetworkPolicyEvaluation response should not have been of type crdv1beta1.RuleAction:

Action *crdv1beta1.RuleAction `json:"action,omitempty" protobuf:"bytes,3,opt,name=action,casttype=antrea.io/antrea/pkg/apis/security/v1beta1.RuleAction"`

It should have been either of type string or a different type which would have included the Isolate action.

The API should also have been versioned with v1alpha1, instead of being made beta right away.

I feel like we should not modify the API at this stage, without introducing a new version (e.g. v1beta3). Given that this is quite a process, we should probably not do that simply for the sake of e2e testing. So we can revert back to your initial version and do string matching, or maybe do some more structured parsing with a regex.

That being said, I feel like other consumers of this API (besides antctl) may find it inconvenient that the "Isolate" action is not part of the API itself. We can wait until we get some feedback before making any change to the API.

test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one nit, otherwise LGTM
@Dyanngg could you take a look as well?

}
processResponse := func(stdout string) map[string]string {
// the first row is the header row, the second row contains the actual value
rows := strings.Split(stdout, "\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend just using rows := strings.Split(stdout, "\n").TrimSpace() then. This way we end up with 2 rows, and it's consistent with the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, done.

antoninbas
antoninbas previously approved these changes May 7, 2024
Copy link
Contributor

@Dyanngg Dyanngg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one nit

test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
Comment on lines 4315 to 4316
assert.Equal(t, eval.NPName, gotEval["NAME"], "Failure -- wrong policy rule matched for evaluation")
assert.EqualValues(t, eval.Action, gotEval["ACTION"], "Failure -- wrong policy rule action for evaluation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert.Equal(t, eval.NPName, gotEval["NAME"], "Failure -- wrong policy rule matched for evaluation")
assert.EqualValues(t, eval.Action, gotEval["ACTION"], "Failure -- wrong policy rule action for evaluation")
assert.Equal(t, eval.NPName, gotEval["NAME"], "Policy name mismatch in NetworkPolicyEvaluation response")
assert.EqualValues(t, eval.Action, gotEval["ACTION"], "Policy rule action mismatch in NetworkPolicyEvaluation response")

IMO, "wrong policy rule matched" is not accurate because policy rule != policy. A policy can have multiple rules, and we are checking the policy name here not the policy rule name? I don't think we even include the rule name in the API response, we just include the rule index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so perhaps I can also check rule index. As for Failure -- prefix, I was trying to keep it consistent with other error messages and easier to search if that makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, you can keep the formatting that you want as long as you replace "policy rule" with "policy name" in the first message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the message, as for rule index, since all of them are 0 or maxint32 (even for the test-case that checks rule priority), I'll not complicate the tests.

Adds networkpolicyevaluation test options,
test steps and test cases into the existing
antreapolicy e2e tests.

Signed-off-by: Qiyue Yao <[email protected]>
@antoninbas
Copy link
Contributor

/test-e2e

Copy link
Contributor

@Dyanngg Dyanngg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well

@antoninbas antoninbas merged commit 38e0575 into antrea-io:main May 9, 2024
51 of 57 checks passed
@qiyueyao qiyueyao deleted the e2e-npeval branch May 11, 2024 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants