-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Percent encoding spec and : and / #39
Comments
With reference to:
It is my opinion that the testsuite in the .NET, Python, and Java implementations as well as in the spec repo are incorrect and violate the spec. Whereas the testsuite in the Rust implementation adheres to the spec (as written). Spec states:
Therefore, the canonicalized purl should be changed to:
We need clarification on this. |
With reference to:
It is my opinion that the testsuite in the .NET, Python, and Java implementations as well as in the spec repo are incorrect and violate the spec. Whereas the testsuite in the Rust implementation adheres to the spec (as written). Spec states:
Therefore, the canonicalized purl should be changed to:
We need clarification on this. |
Based on my understanding of the spec, there are a total of three test cases that violate the spec as its currently written. The Rust implementation has went ahead and changed its test cases, which I'm in agreement with. But we need clarification as I've run into a situation with the Java implementation which requires additional encoding, but the additional encoding breaks the above test cases. However, adopting the testcase changes made by the Rust impl makes the Java impl happy as well. |
…ing decoding to version, qualifier values, and subpath when parsing a purl string. Added missing encoding to name, version, qualifier value when canonicalizing. These corrections break the testsuites. Corrected testsuites (now varies from spec testsuites). package-url/purl-spec#39 . Added additional validation as part of package-url/purl-spec#46
@pombredanne this ticket should be resolved prior to 1.0 #50 |
While I also stumbled hard about this and also would welcome a clearer wording, I see no contradiction here and would consider current testcases correct. First statement summarizes where the #', '?', '@' and ':' characters MUST NOT be encoded, but it makes no definite statement about where they MUST be encoded. This is further clarified in the following statement which clearly says that it is "unambiguous unencoded everywhere". To keep purls readable for humans, I think encoding shall be avoided where possible. Think about Debian versions - something like pkg:deb/debian/file@1%3A5.30-1%2Bdeb9u2?arch=amd64 looks quite unreadable to me. |
let's update the tests to match the spec and ... @gernot-h very good point. In fact on common packages such as scoped npms that have a namespace that has an |
@pombredanne Great, anything I can do to help sorting this out? Prepare some pull requests? (I will be on vacation for the next 1.5 weeks, but be interested to help afterwards) Regarding the confusion about the two statements regarding ":", what about rephrasing the first statement to: From: the '#', '?', '@' and ':' characters must NOT be encoded when used as separators. They may need to be encoded elsewhere To: the '#', '?', '@' and ':' characters must NOT be encoded when used as separators. Some of them need to be encoded elsewhere as specified in the rules below. |
The WHATWG URL spec has a great section on percent encoding that defines a set of character classes and their exact encoding rules. It would be great if the PURL spec defined its percent encoding rules in relation to those. For example, "the PURL path percent-encode set is the WHATWG URL path percent-encode set and U+0040 (@)". |
My interpretation of the rules when putting together a PR (#245) for a vcpkg purl type, was that any |
This comment is about qualifiers only. The PURL spec is fairly clear (IMO) that colons should not be encoded in version numbers, but could be more explicit. This ticket was originally about qualifiers and I've found a potentially serious problem with the way qualifiers are encoded and decoded. Neither PURL nor the WHATWG or RFC URL specs say that However, PURL qualifiers look suspiciously like x-www-form-urlencoded parameters, and people may be tempted to use the x-www-form-urlencoded rules for encoding and decoding qualifiers. The WHATWG x-www-form-url-encoded spec does not require them to be encoded when parsing, but says that If the set of encoded characters were the only difference, it would be fine if some implementations were using x-www-form-urlencoded and some were not, but there is one critical difference with x-www-form-urlencoded:
Besides needing to be more explicit about what characters need to be encoded in what parts of the PURL, to avoid PURLs that have different meanings to different implementations, the spec needs to clearly specify whether the qualifiers are x-www-form-urlencoded or if they are a simpler format that is like x-www-form-urlencoded but not. |
* Add MLflow and Hugging Face model registries * clean up * Address PR comments * Ensure repository_url is always present
FYI
|
It's inconsistent in packageurl-js 2.0.0. Plus signs parse as plus signs, except for in qualifier values where they parse as spaces. |
|
The colon separator between scheme and type must not be encoded. This is also using, a new, and not yet adopted wording for MUST according to its definition in RFC2119 as clarified in RFC8174 Suggested-by: @gernot-h Reference: #39 (comment) Signed-off-by: Philippe Ombredanne <[email protected]>
I posted a draft PR to address the colon encoding used in between the scheme and type... #361 |
See package-url#39 Supercedes package-url#52 Signed-off-by: Jeremy Long <[email protected]>
Reference: #376 Reference: #39 (comment) Signed-off-by: John M. Horan <[email protected]>
Reference: #376 Reference: #39 (comment) Signed-off-by: John M. Horan <[email protected]>
The notes in the specification about percent encoding of ":" are a bit confusing:
It seems like these 2 contradict either other. The former indicates that ":" may need to be encoded elsewhere. The later indicates that ":" is "unambiguous unencoded everywhere".
Similarly the qualifier component documentation says:
... and does not mention anything about "/". But the test-suite-data.json references canonical_purl representations like
repository_url=repo.spring.io/release
.And the percent-encoding docs state:
My interpretation of this boils down to... "/" is never encoded, but the language in the parts of the specification are unclear. The name, namespace and subpath parts are clear wrt to "/", which leaves the qualifier and version bits as vague as to if "/" is supposed to be percent-encoded or not.
The text was updated successfully, but these errors were encountered: