Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split GEP-713 into a Memorandum and separate GEPs for Direct and Inherited Policy #2813

Merged
merged 6 commits into from
Apr 10, 2024

Conversation

youngnick
Copy link
Contributor

/kind gep

What this PR does / why we need it:

This PR splits up GEP-713 (Metaresources and Policy Attachment), changing GEP-713 into a Memorandum about the general idea of Polkicy Attachment, with GEP-2648 and GEP-2649 being Direct Attached and Inherited Policy designs respectively.

The idea is to allow us to iterate on and graduate each type of Policy separately.

This is a big change, so I'll summarize the changes here:

  • GEP-713 is now a Memorandum GEP
  • GEP-2648 defines Direct Attached Policies and includes some changes to required and recommended status designs for that type
  • GEP-2649 now holds the documentation about defaults and overrides, and what makes an Inherited Policy
  • The definition of what makes a Policy Direct or Inherited has been updated:
    • Direct Policies now must only affect the targeted object
    • Inherited Policies are any Policy that affects objects other than the targeted one

Which issue(s) this PR fixes:
Updates #713
Updates #2648
Updates #2649

Does this PR introduce a user-facing change?:

The recommendations for the Policy Attachment pattern have been substantially updated and clarified.

I'm pretty sure this makes sense, but I've read this over so many times now that I can't see it properly, and I need review. Please feel free to ping me about this to discuss, or bring it to #sig-network-gateway-api or the community meeting.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 26, 2024
@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 26, 2024
geps/gep-713/metadata.yaml Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 27, 2024
@youngnick youngnick force-pushed the policy-attachment-split branch from ecdffa7 to e06fa3d Compare February 27, 2024 06:04
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 27, 2024
@youngnick
Copy link
Contributor Author

I've merged in #2442's changes to the Direct Attached Policy document.

@youngnick youngnick force-pushed the policy-attachment-split branch from e06fa3d to a01cdd6 Compare February 27, 2024 06:05
@youngnick
Copy link
Contributor Author

And also fixed @gcs278's comments. Thanks!

Copy link
Contributor

@gcs278 gcs278 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the revised GEP organization. I couldn't find any comments other than superficial markdown rendering issues and some minor typos. Reviewing these GEP updates helps me prepare for getting the Session Persistence Policy Attachment GEP through. Hope this helps.

geps/gep-713/index.md Outdated Show resolved Hide resolved
geps/gep-2648/index.md Show resolved Hide resolved
geps/gep-2648/index.md Outdated Show resolved Hide resolved
geps/gep-713/index.md Show resolved Hide resolved
geps/gep-2648/index.md Show resolved Hide resolved
geps/gep-2649/index.md Show resolved Hide resolved
geps/gep-2649/index.md Show resolved Hide resolved
geps/gep-2649/index.md Outdated Show resolved Hide resolved
geps/gep-2649/index.md Outdated Show resolved Hide resolved
geps/gep-2648/metadata.yaml Outdated Show resolved Hide resolved
geps/gep-2649/index.md Outdated Show resolved Hide resolved
geps/gep-2648/index.md Outdated Show resolved Hide resolved
@youngnick youngnick force-pushed the policy-attachment-split branch 2 times, most recently from 93cb69e to 40a7589 Compare February 29, 2024 07:28
@youngnick
Copy link
Contributor Author

youngnick commented Feb 29, 2024

I've got two TODOs left that I will get to early next week, both are relatively minor though, so please review away.

- Add changelog including the original PRs plus Flynn's one
- Update example to be BackendTLSPolicy

These TODOs are now done, this is ready for final review.

@youngnick youngnick force-pushed the policy-attachment-split branch from 40a7589 to 87efcb6 Compare March 5, 2024 06:36
using Direct Policy Attachment:

* The number or scope of objects is exactly _one_ object.
* The modifications to be made to the objects don’t have any transitive information -
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think its clear what it means to "only affect the single object". I don't feel like there are any definitions for what the scope of such objects are.

To make things more concrete, what if I apply a BackendTimeoutPolicy to a Gateway. What are the implications here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the backend timeouts are set per-backend, then it's an Inherited Policy, because there's no backend config in Gateway.

The intent here is to tightly constrain what a Direct Attached Policy is so that we don't have to boil the ocean of finding a good way to tell everyone whose objects are affected by a policy about that in the status of all the resources.

It's also about ensuring that we don't have "Direct Attached" Policy objects that effectively set defaults, even if the defaults stanza isn't used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more text here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goals make sense,I am just trying to fully grok the meaning of this since it seems like this is an important aspect

What does it mean to have a timeout set per-backend?

I can say logically that a timeout seems to apply to a backend in my mental model. And maybe in proxy implementations. But I could have an implementation that has one global value "timeout", and Gateway is the only logical place to put that policy in that implementation. Is that ok, or does that violate this?

What if my data plane implementation does happen to have per-backend config, but I don't want to deal with it so I only support gateway level config - is that ok?

More broadly, what exactly defined what is the scope of impact of an object? Is it the ability to attach a single type to multiple places? Is it the implementation details of how a policy is implemented? Is it a users mental model of how it works?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this question overlaps with the session persistence GEP @gcs278 has been working on (#2634). In that case we've found that some implementations (Envoy or GRPC-based in particular) can only attach this config at Route level. So even if a user attaches the config at the Backend layer, the underlying implementation is going to be implementing it at the Routing layer. Would that go against this guideline?

Similarly, that GEP currently proposes that Routes will include inline overrides that could override the values specified on a Service. That's not really policy at that point, but it is some config overriding a direct-attached policy which I believe would be a new concept for us.

To be clear, I actually like the approach that GEP is proposing, I'm just trying to make sure that the corresponding changes we're making here can line up with that so we don't end up with conflicting GEPs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need take in policies that are additive? They don't make sense to have defaults and overrides, but do target multiple layers. Authorization policies are likely to fall info this category

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 that's a good point. Are you referring to policies that essentially just have implicit defaulting behavior all the way down the stack? For example:

policy attached to GW:

foo: 1
bar: 2

policy attached to Route:

bar: 4
baz : 3

computed result:

foo: 1
bar: 4
baz: 3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking:
GatewaY:

rules: [a]

Route:

rules: [b]

Result: rules: [a,b]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end of the day, what I'm trying to do is make it clear that Direct Policies have an easier time with status. If you do any cross object inheritance as mentioned in the previous two comments, those count as inherited, because you have to solve the Inherited status problem to have full status.

"Direct Policies can be attached at exactly one layer" is a good start, but if they have knock-on effects further down the hierarchy, then again, they have the same status problem as Inherited Policies.

Maybe I just make it more explicit here:

  • Direct Policies can be attached at exactly one layer
  • Direct Policies can have effects only on the layer they attach to
  • Direct Policies can effects only on the object withing the layer that they attach to

This stops the "but my Direct Policy affected another Route attached to the same Gateway when attached to a Route" problem which could arise if we only had the first two items.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and @howardjohn, additive policies being Inherited is definitely intended, that's one of the things I was worried that the original definition missed. That kind of implicit default still has all the status problems of a more explicit Inherited policy with a defined defaults section, so I wanted to make sure that we caught that in the definitions.

geps/gep-2649/index.md Outdated Show resolved Hide resolved
@youngnick youngnick force-pushed the policy-attachment-split branch from 87efcb6 to f9acc42 Compare March 6, 2024 04:32
@youngnick
Copy link
Contributor Author

Turned out I had missed some of @guicassolato's great feedback. Fixed now!

geps/gep-2648/index.md Outdated Show resolved Hide resolved
`gateway.networking.k8s.io/PolicyAffected: true` (or with an
implementation-specific domain prefix) added instead.

Because these Conditions or annotations are namespaced per-implementation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotation or label?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated to label, although I originally chose annotation because these are machine-generated and machine-added. I updated because it seems useful to be able to search objects to find policy affected ones (which you can't do with annotations).

// an invalid target resource
PolicyReasonTargetNotFound PolicyConditionReason = "TargetNotFound"
)
```
Copy link
Contributor

@candita candita Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No section on targeted resources? If it's not possible or not recommended, then can you add a note with the same title as GEP-2648? Same for PolicyAncestorStatus or a discussion of why it isn't applicable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to keep this PR as small as possible, since it's already enormous. But I'll add some updates to the status sections to roll your and @robscott's feedback in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some information suggesting the PolicyAffected condition, as in Direct Attached Policy, but I want us all to spend some time considering apiserver fanout load concerns before going ahead with any more recommendations here.

geps/gep-2648/index.md Outdated Show resolved Hide resolved
This pattern enables different conditions to be set for different "Ancestors"
of the target resource. This is particularly helpful for policies that may be
implemented by multiple controllers or attached to resources with different
capabilities. This pattern also provides a clear view of what resources a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "capabilities" mean here? Can you include an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified and added.

@youngnick
Copy link
Contributor Author

As discussed in #2927, I've marked GEP-2658 and GEP-2649 as provisional, and included notices at the top of each file that the file may change soon.

Hopefully this lets us get this one across the line for now, PTAL @robscott @howardjohn @mikemorris @kflynn @guicassolato @kate-osborn.

Comment on lines +192 to +199
When more than one object matches the same object _and_ `sectionName`, the usual
conflict-resolution rules (as defined in GEP-713 should be used). These boil down
to "oldest by creation date wins".
Copy link
Contributor

@mikemorris mikemorris Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@candita I don't think this clarification actually solves for your use case, where, if I understand correctly, you're trying to select multiple backendRef matches of the same named object when referenced by multiple rules in the same HTTPRoute?

Refs #2813 (comment), #2813 (comment) and the implementation of BackendTLSPolicy in #2448.

I'm unclear from the implementation whether that policy is intended to directly target a Service object, or instead as it seems you're implying, to target a Service only in its capacity as a backendRef for a specific HTTPRoute object.

After reading through the merged GEP-1897 it feels pretty clear that the intended usage of BackendTLSPolicy is to target a Service directly, and sectionName would only be for scoping the policy to a specific port on a multi-port Service. If there's a need to narrow the scope of which Gateways, HTTPRoutes or HTTPRouteRules should respect a BackendTLSPolicy, I think we would want to add something like a from clause as suggested in #2813 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a "SHOULD NOT use backendRef" above, which should clear this up, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikemorris I wanted information on what was considered a SectionName in the object. I heard from Rob that we have known SectionNames for each object. I do hope we have this documented somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #2946 to improve documentation for SectionName

Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work on this @youngnick!

I think this is a great update that represents where we are now, but also really appreciate you starting the broader discussion in #2927 to cover what's next (and linking it from all the GEPs here).

hostnames:
- "foo.com"
rules:
- name: bar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I believe the same name can be used in different stanzas that are different objects, which breaks the application of policy to section name. For example, couldn't I have a rule and a backendRef with the same name

Although rule names on Routes are still provisional (x-ref #2593), so far we've clearly defined what sectionName means within each Gateway API resource it's relevant, and I think the same would apply to Routes. I think sectionName would be used exclusively to refer to rule names, and that rule names would need to be unique within a Route when they were specified. I think the discussion in #2927 could also lead to the ability to have multiple targets for the same policy.

The idea of what I'd call multi-dimensional targets (only x within y) is interesting but I'd rather hold off on that until we've got a better handle on policy attachment targeting a single resource.

geps/gep-2648/index.md Show resolved Hide resolved
Comment on lines 224 to 285
Implementations that use Direct Policy objects SHOULD put a Condition into
`status.Conditions` of any objects affected by a Direct Policy, if that field
is present.

If they do, that Condition MUST have a `type` ending in `PolicyAffected` (like
`gateway.networking.k8s.io/PolicyAffected`),
and have the optional `observedGeneration` field kept up to date when the `spec`
of the Policy-attached object changes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we do have nesting/namespacing available like Routes, can we suggest that implementations populate a unique section for their controller instead of adding conditions to an existing set of conditions?

Comment on lines +233 to +290
Implementations SHOULD use their own unique domain prefix for this Condition
`type` - it is recommended that implementations use the same domain as in the
`controllerName` field on GatewayClass (or some other implementation-unique
domain for implementations that do not use GatewayClass).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controllers are supposed to update Conditions lists additively, putting back any Condition they don't understand or update as they found it

This seems to require a controller to be aware of any condition it may have theoretically set in the past instead of assuming full ownership of the conditions and replacing with the current set of conditions it knows about. I generally would prefer some form of namespacing/nesting in status as broadly as possible that allows each controller to clearly own separate lists. This PR doesn't need to solve this, but it would be nice to have some kind of note somewhere in the GEP covering this as a follow up area and/or unresolved.

mkdocs.yml Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 5, 2024
Copy link
Contributor

@mikemorris mikemorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate all the work you've put into this @youngnick, and sorry I haven't been more involved earlier to help support! This feels like it's in adequate shape to get in as provisional while we continue to work on refining the structure over in #2927

@robscott robscott force-pushed the policy-attachment-split branch from 8475c8b to ef824ed Compare April 10, 2024 00:05
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 10, 2024
Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 10, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mikemorris, robscott, shaneutt, youngnick

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [robscott,shaneutt,youngnick]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@robscott
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 10, 2024
@k8s-ci-robot k8s-ci-robot merged commit c1d306b into kubernetes-sigs:main Apr 10, 2024
8 checks passed
mkdocs.yml Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/gep PRs related to Gateway Enhancement Proposal(GEP) lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.