Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting URL encoding/decoding to RFC 3986 #11513

Draft
wants to merge 8 commits into
base: 4.8.x
Choose a base branch
from
Draft

Conversation

graemerocher
Copy link
Contributor

@graemerocher graemerocher commented Jan 16, 2025

Introduce new UrlEncodingKind enum that can be configured from the client configuration for encoding and the router configuration for decoding adapting the URL encoding/decoding to RFC-3986.

Micronaut 5 should likely default to RFC-3986 for everything except form decoding.

Fixes #11434
Fixes #10564

This resolves almost everything, but there appear to be issues with Netty's parameters() method of QueryStringDecoder. Which seems to be designed for decoding application/x-www-form-urlencoded not RFC-3986.

@graemerocher graemerocher added the type: enhancement New feature or request label Jan 16, 2025
@graemerocher
Copy link
Contributor Author

still working on this PR, since trying to address decoding

def result = client.toBlocking().retrieve(req)

expect:"resolved values should using RFC-3986 decoding"
// TODO: investigate why Netty QueryStringDecoder.parameters() doesn't respect RFC-3986
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to netty/netty@270e9d6 the decoding of + into space was supposed to be fixed, but it doesn't seem to be fixed for the parameters() method of QueryStringDecoder. We need to investigate if this is a bug in Netty or by design.

@graemerocher graemerocher changed the title Allow setting URL encoding to RFC 3986 Allow setting URL encoding/decoding to RFC 3986 Jan 17, 2025
@graemerocher graemerocher requested a review from yawkat January 17, 2025 15:49
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
68.8% Coverage on New Code (required ≥ 70%)
1 New Critical Issues (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@yawkat
Copy link
Member

yawkat commented Jan 20, 2025

The thing is, the HTML spec says to "Let query be the result of running the application/x-www-form-urlencoded serializer with pairs and encoding". So if we stop encoding SP as '+', we are technically violating the HTML spec. It depends on which spec you find authoritative. On the parsing side, from what I can see, the URL spec also does not actually specify how to parse the query. It defines percent-decoding, but the query part is actually left alone during URI parsing, to be parsed as application/x-www-form-urlencoded later on. In my opinion:

@jamesdh
Copy link

jamesdh commented Jan 27, 2025

@yawkat agreed it's confusing between the HTML spec and 3986, but HTML is a content specification that operates at a layer above the URI/URL specification. At the very least there needs to be some mechanism for adapting to either the HTML spec's interpretation or 3986. WhatWG's URL spec even states:

The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.

So I think the widespread confusion regarding this has led to some unfortunate implementations in tooling, as I've discovered a surprising amount recently while integrating with some 3rd party API's. I think the only pragmatic solution is to somehow be adaptable to which interpretation of the rules you wish to use on a per-client basis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
None yet
4 participants