Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(MADR): resource identifier format #12756

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

lobkovilya
Copy link
Contributor

@lobkovilya lobkovilya commented Feb 5, 2025

Motivation

The goal is to improve Inspect API and introduce an identifier as part of the URL path, i.e :5681/_rules/<identifier>. See the discussion

Better to review the rendered version as it contains tables.

@lobkovilya lobkovilya requested a review from a team as a code owner February 5, 2025 10:16
@lobkovilya lobkovilya added the ci/skip-test PR: Don't run unit and e2e tests (maybe this is just a doc change) label Feb 5, 2025
Copy link
Contributor

github-actions bot commented Feb 5, 2025

Reviewer Checklist

🔍 Each of these sections need to be checked by the reviewer of the PR 🔍:
If something doesn't apply please check the box and add a justification if the reason is non obvious.

  • Is the PR title satisfactory? Is this part of a larger feature and should be grouped using > Changelog?
  • PR description is clear and complete. It Links to relevant issue as well as docs and UI issues
  • This will not break child repos: it doesn't hardcode values (.e.g "kumahq" as an image registry)
  • IPv6 is taken into account (.e.g: no string concatenation of host port)
  • Tests (Unit test, E2E tests, manual test on universal and k8s)
    • Don't forget ci/ labels to run additional/fewer tests
  • Does this contain a change that needs to be notified to users? In this case, UPGRADE.md should be updated.
  • Does it need to be backported according to the backporting policy? (this GH action will add "backport" label based on these file globs, if you want to prevent it from adding the "backport" label use no-backport-autolabel label)

| VirtualHost | legacy listeners - `<kuma.io/service>`<br>new outbounds - `<mesh>_<name>_<namespace>_<zone>_<short-name>_<port>` | Mesh*Service (with sectionName to select port) |
| Inbound Cluster | `localhost:<port>` | Dataplane (with sectionName to select port) |
| Outbound Cluster | legacy clusters - `<kuma.io/service>-hash(dst.tags)`<br>legacy clusters cross-mesh - `<kuma.io/service>-hash(dst.tags)_<mesh>`<br>new clusters - `<mesh>_<name>_<namespace>_<zone>_<short-name>_<port>` | Mesh*Service (with sectionName to select port) |
| Route | Routes are set on Listener on VirtualHost.<br>On inbound - `inbound:<kuma.io/service>`<br>On outbound - `<hash_sha256([]Match{...})>` | Correlates with a set of MeshHTTPRoutes |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about matches section in new inbound policies api? We might not have a route as a resource but create route from policy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for outbound routes it says Correlates with a set of MeshHTTPRoutes. So it's not really clear how to name the route without hashing

There is an identifier format from Amazon called [ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html). We can adopt a similar approach, but using `_`:

```
kri_<mesh>_<zone>_<namespace>_<resource-type>_<resource-name>_<section-name>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we considered having some placeholder for missing values? to avoid multiple _ which can be hard to read?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to be explicit about resource-type. Is it the plural/camlCase...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have the resource type before the mesh or at least before the zone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to be explicit about resource-type. Is it the plural/camlCase...?

It should be a lowercased singular name as we use in kumactl, i.e. meshservice or meshtimeout.

Why not have the resource type before the mesh or at least before the zone?

I kind of like how <resource-type> is standing next to <resource-name>, i.e. kri_default___meshservice_backend. Type and name are always present and I think it's easier to catch what identifier is referring to. Compare with kri_meshservice_default__backend. You might think for a sec the meshservice is called default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we considered having some placeholder for missing values? to avoid multiple _ which can be hard to read?

what would you use as a placeholder?

Copy link
Contributor

@johncowen johncowen Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have the resource type before the mesh or at least before the zone?

I kinda agree with is, if the resource-type is before anything else, then you can immediately tell whether to expect say an empty mesh in the case of the kri pointing to a resource type that doesn't have a mesh


#### [Issue #12093](https://github.com/kumahq/kuma/issues/12093): xds configs, outbound listeners should use the clustername instead of an IP/port combo

We name outbounds like `outbound:10.43.205.116:6379` where IP address doesn't give any useful information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even more when there are multiple IPs right?

Co-authored-by: Charly Molter <[email protected]>
Signed-off-by: Ilya Lobkov <[email protected]>
Copy link
Contributor

@lahabana lahabana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff

Also, there was [work](https://docs.google.com/document/d/1OIZK82Tr-4El2FfdlBn7WNRZ7FatkTuEcZKH0FlSTMA/edit?tab=t.0#heading=h.n6cmlf1eel2z) related to Envoy cluster name unification, but it's not finished.
Discoveries in this work helped me to fill the tables.

There are no restriction on the name format from the Envoy's side.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Envoy doesn't specify a length limit. I tried to create a cluster with the max expected length of resource identifier

253(name) + 253(zone) + 63(mesh) + 63(namespace) + 15(sectionName) + 30(resourcetype) + 3(kri) + 6(_) = 686

and it worked as expected.

There is an identifier format from Amazon called [ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html). We can adopt a similar approach, but using `_`:

```
kri_<mesh>_<zone>_<namespace>_<resource-type>_<resource-name>_<section-name>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to be explicit about resource-type. Is it the plural/camlCase...?

There is an identifier format from Amazon called [ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html). We can adopt a similar approach, but using `_`:

```
kri_<mesh>_<zone>_<namespace>_<resource-type>_<resource-name>_<section-name>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have the resource type before the mesh or at least before the zone?

| Inbound Listener | `inbound:10.43.205.116:8080`<br>`inbound:[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080` | Dataplane (with sectionName to select port) |
| Outbound Listener | `outbound:10.43.205.116:8080`<br>`outbound:[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:8080` | Mesh*Service (with sectionName to select port) |
| VirtualHost | legacy listeners - `<kuma.io/service>`<br>new outbounds - `<mesh>_<name>_<namespace>_<zone>_<short-name>_<port>` | Mesh*Service (with sectionName to select port) |
| Inbound Cluster | `localhost:<port>` | Dataplane (with sectionName to select port) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local envoy clusters should have a better name than localhost_

Do we still use localhost ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the function GetLocalClusterName is used in multiple places

func GetLocalClusterName(port uint32) string {


| | Name | Correlated Resources |
|-------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
| Cluster | `meshpassthrough_<protocol>_<match-value>_<port>`<br>when `<port> == 0` Kuma sets port equal to `*`<br>`match-value = CIDR \| IP \| Domain`<br>`CIDR = i.e. "192.0.2.0/24" or "2001:db8::/32"`<br>`IP = i.e. "192.0.2.1", or 2001:db8::68", or ::ffff:192.0.2.1"`<br>`Domain = <dns-name> \| *.<dns-name>`<br>`dns-name = ^([a-zA-Z0-9_]{1}[a-zA-Z0-9_-]{0,62}){1}(\.[a-zA-Z0-9_]{1}[a-zA-Z0-9_-]{0,62})*[\._]?$`<br> | – |
Copy link
Contributor

@Icarus9913 Icarus9913 Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passthroughMode: (Optional) Defines behaviour for handling traffic. Allowed values: All, None and Matched. Default: None

What does the identifier should be if All and None ?

Our previous default allow in/outbound cluster names are:

inbound:passthrough:ipv4
inbound:passthrough:ipv6
outbound:passthrough:ipv4
outbound:passthrough:ipv6

it's not about meshpassthrough resource, so we just keep using these right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we actually removing them when MeshPassthrough is used

func removeDefaultPassthroughCluster(rs *core_xds.ResourceSet) {

Comment on lines +96 to +98
### Places to use resource identifier

#### URL path
Copy link
Contributor

@schogges schogges Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that the resource identifier is also being used in URL search query, i.e. for filtering?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's not a hard requirement, I added as a note what resource identifier and delimiter charset would look like if we wanted to support URL query:

resource-identifier = *(ALPHA / DIGIT / "-" / "." / "_" / "~" )
delimiter           = "_" / "~" 

they're significantly smaller than those without query support.

But the good news is if we go with _ then the resource identifier can be used in a query. So I added this to the Pros list

Copy link
Contributor

@schogges schogges Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thank you 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I misunderstood the conversation here, but would it be the case that if we were using these in a URL anywhere we would URL encode them first anyway? i.e. any non-URL safe chars would be %ified?

Copy link
Contributor

@johncowen johncowen Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(P.S. but one small benefit of not having non-URL safe characters is that it keeps the identifier "pretty" i.e. more recognisable, so a benefit but not super important I would say)

actually I guess if the one of the primary usecases is for people to type these to get things (rather than usage within the GUI), we don't want them having to URL encode things manually.

kinda swung back and forth on opinion here, sorry for the noise! 😅 please ignore me!

Copy link
Contributor

@schogges schogges Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still good point @johncowen, but as you said I also think generally it'd be better to not use any non-URL safe chars.

Also depending on what people/tools are using, there might be a difference on which chars are encoded (i.e. encodeURI vs encodeURIComponent):

encodeURI("http://localhost:1234?filter[foo&bar]=baz") // -> 'http://localhost:1234?filter%5Bfoo&bar%5D=baz'

encodeURIComponent("filter[foo&bar]=baz") // -> 'filter%5Bfoo%26bar%5D%3Dbaz'

encodeURI

encodeURI() escapes all characters except:

A–Z a–z 0–9 - _ . ! ~ * ' ( )

; / ? : @ & = + $ , #

The characters on the second line are characters that may be part of the URI syntax, and are only escaped by encodeURIComponent(). Both encodeURI() and encodeURIComponent() do not encode the characters -.!~*'(), known as "unreserved marks", which do not have a reserved purpose but are allowed in a URI "as is". (See RFC2396)

encodeURIComponent

encodeURIComponent() uses the same encoding algorithm as described in encodeURI(). It escapes all characters except:

A–Z a–z 0–9 - _ . ! ~ * ' ( )

Compared to encodeURI(), encodeURIComponent() escapes a larger set of characters. Use encodeURIComponent() on user-entered fields from forms POST'd to the server — this will encode & symbols that may inadvertently be generated during data entry for character references or other characters that require encoding/decoding. For example, if a user writes Jack & Jill, without encodeURIComponent(), the ampersand could be interpreted on the server as the start of a new field and jeopardize the integrity of the data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah gotcha, couple of follow ups:

I'd say people should always use encodeURIComponent otherwise they are "holding it wrong" and should change implementation, and I suppose this is my point on expecting people to do always do this how we expect/correctly, there will always be cases where some folks might be "holding it wrong". To be fair, all "values" used in a URL should be using encodeURIComponent (or non-JS equivalent) anyway, whether they are expected to be safe or not. So there's also something to be said for not worrying about URL safety, if someone isn't using a correct encoder, they should be. But I'm thinking more about informal "I just want to curl the thing to get a response in my terminal" type of usage. There's benefit in not forcing people to have to encode the thing if for example we chose to use / in this case.

So if we want URL safe if only for reasons of "don't make it hard for people to just curl the thing" all in all according to the MADR, that leaves us with:

delimiter = "_" / "~"

And it sounds like we've landed on a _, which is URL safe which is super duper. It's probably a good idea to note that I have seen instances of people using these in hostnames even though a _ shouldn't be used in hostnames.

@lobkovilya I'm not sure if we validate things like mesh names and zone names to not have _, I might be misremembering but do I remember that at least at one point this was possible? Is it definitely not possible to have a mesh/zone name with a _ in it now?

Just a little side note that's just occurred to me, I'm kinda glad we still have this last "safe character" available ~, which in a past life has been super useful to have as a usable/meaningful character (i.e. similar to ~/johncowen), which kinda means "expand ~ to a common string we know about". You never know we might hit a thing at somepoint where we need the same "trick".

@johncowen
Copy link
Contributor

Maybe not part of the MADR and might just be an informal example, but!

The goal is to improve Inspect API and introduce an identifier as part of the URL path, i.e :5681/_rules/. See #12713 (comment)

When we eventually come to define the endpoint should we include the fact that we are specifically requesting via a kri i.e. :5681/_rules/kri/<identifier> or :5681/kri/_rules/<identifier> or :5681/_rules/<identifier>/kri, basically include kri in another segment. That way if we ever add another way/identifier to request things we can distinguish via normal URL routing rather than having a single route check for the type of identifier.

Super edge case, but who knows if we are ever gonna change the way we define identifiers. but maybe I'm over thinking it 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/skip-test PR: Don't run unit and e2e tests (maybe this is just a doc change)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants