-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use "inherit left, wildcard right" behavior in base URL application and constructor string parsing. #198
Conversation
@sisidovski @domenic FYI |
Updated commit message in the opening comment (hopefully it's reasonably clear). I assume Gecko and WebKit implementation bugs are not desired since they don't yet (to my knowledge) implement URLPattern; is that right? I can file Deno and urlpattern-polyfill ones shortly before merging if appropriate. Ditto for MDN. For implementer interest, it sounded like Anne was on board with this direction; what level of assent is required to add WebKit to the implementer interest list? |
LGTM! You might want to add something about the compatibility implications, and how we've measured them to be low.
Right, we can treat them as being under the umbrella "implement URLPattern" bugs.
What I would do is check the box for WebKit, linking to #179 (comment) , but also tag @annevk to confirm, and give a few days before merging in case he has objections or wants to take a closer look. |
Alright, will do unless @annevk objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normatively LGTM, just editorial suggestions
I didn't do an in-depth review, but happy to agree on behalf of WebKit with the behavior we're aiming for here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with more nits, not all of which I'm sure about. @sisidovski did you want to take a look?
spec.bs
Outdated
* protocol, hostname, port, pathname, search, hash | ||
* protocol, hostname, port, username, password | ||
|
||
Username and password are also never inherited from a base URL when constructing a {{URLPattern}}. (They are, however, inherited from the base URL supplied as an argument to {{URLPattern/test()}} or {{URLPattern/exec()}}.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Username and password are also never inherited from a base URL when constructing a {{URLPattern}}. (They are, however, inherited from the base URL supplied as an argument to {{URLPattern/test()}} or {{URLPattern/exec()}}.) | |
Username and password are also never inherited from a base URL when constructing a {{URLPattern}}. (They are, however, inherited from the base URL when parsing a URL from the string supplied as an argument to {{URLPattern/test()}} or {{URLPattern/exec()}}.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added "when parsing a URL". You can actually supply a dictionary rather than a string if you want, and this behavior also applies there. e.g. urlPattern.test({pathname: "/world"}, "https://user:[email protected]/")
is a thing you can do.
@@ -510,17 +601,14 @@ A [=constructor string parser=] has an associated <dfn export for="constructor s | |||
<p>First, the URLPattern constructor string parser operates on [=tokens=] generated using the "`lenient`" [=tokenize policy=]. In constrast, [=basic URL parser=] operates on code points. Operating on [=tokens=] allows the URLPattern constructor string parser to more easily distinguish between code points that are significant pattern syntax and code points that might be a URL component separator. For example, it makes it trivial to handle named groups like "`:hmm`" in "`https://a.c:hmm.example.com:8080`" without getting confused with the port number. | |||
<p>Second, the URLPattern constructor string parser needs to avoid applying URL canonicalization to all code points like [=basic URL parser=] does. Instead we perform canonicalization on only parts of the pattern string we know are safe later when compiling each component pattern string. | |||
<p>Finally, the URLPattern constructor string parser does not handle some parts of the [=basic URL parser=] state machine. For example, it does not treat backslashes specially as they would all be treated as pattern characters and would require excessive escaping. In addition, this parser might not handle some more esoteric parts of the URL parsing algorithm like file URLs with a hostname. The goal with this parser was to handle the most common URLs while allowing any niche case to be handled instead via the {{URLPatternInit}} constructor. | |||
<p>In the constructor string algorithm, the pathname, search, and hash are wildcarded if earlier components are specified but later ones are not. For example, "`https://example.com/foo`" matches any search and any hash. Similarly, "`https://example.com`" matches any URL on that origin. This is analogous to the notion of a more specific component in the notes about [=process a URLPatternInit=] (e.g., a search is more specific than a pathname), but the constructor syntax only has a few cases where it is possible to specify a more specific component without also specifying the less specific components. | |||
<p>The username and password components are always wildcard unless they are explicitly specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want a mention the exception of port
here as other components are explicitly mentioned? It also makes sense not having it since we have a note in L723 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a mention here.
1. If |init|["{{URLPatternInit/port}}"] is not null then set |result|["{{URLPatternInit/port}}"] to the result of [=process port for init=] given |init|["{{URLPatternInit/port}}"], |result|["{{URLPatternInit/protocol}}"], and |type|. | ||
1. If |init|["{{URLPatternInit/pathname}}"] is not null: | ||
1. If |init|["{{URLPatternInit/protocol}}"] does not [=map/exist=], then set |result|["{{URLPatternInit/protocol}}"] to the result of [=processing a base URL string=] given |baseURL|'s [=url/scheme=] and |type|. | ||
1. If |type| is not "`pattern`" and |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", "{{URLPatternInit/port}}" and "{{URLPatternInit/username}}", then set |result|["{{URLPatternInit/username}}"] to the result of [=processing a base URL string=] given |baseURL|'s [=url/username=] and |type|. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, |type| will be pattern
only when constructing, right? match() and exec() also process URLPatternInit, but |type| is "url" in that case so I believe this is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
Apologize for the delay, LGTM with a few questions. |
Chromium CL has also landed, and should ship in M121. 🤞 |
This CL changes the behavior of the router rule registration in the ServiceWorker Static Routing API, especially when the |urlPattern| condition accepts the URLPatternInit object. Before this CL, the URLPatternInit input is accepted as it is, that means any unspecified fields are resulted in the wildcards (*). This behavior is inconsistent with the case when |urlPattern| accepts a string. When a string is passed, missing fields are complemented by baseURL, which the SW script URL is internally used when adding the new router rule. So some missing components will use the values inherited from the baseURL in that case. After this CL, the URLPatternInit input also internally uses the SW script URL as a baseURL if not provided. It typically happens when the input is an object that complies with the URLPatterInit interface. On the other hand, it wouldn't affect the case when URLPatternInit is the output of `new URLPattern()`. Thank to the recent change on the URLPattern[1], the constructed components in URLPatternInit will inherit or wildcarded. baseURL won't override components if those are not empty. This behavior change is based on the discussion in whatwg/urlpattern#182. [1] whatwg/urlpattern#198 Bug: 1371756 Change-Id: I5cce80fde05cf18237c8b6412b00e017ff5aad5b
This CL changes the behavior of the router rule registration in the ServiceWorker Static Routing API, especially when the |urlPattern| condition accepts the URLPatternInit object. Before this CL, the URLPatternInit input is accepted as it is, that means any unspecified fields are resulted in the wildcards (*). This behavior is inconsistent with the case when |urlPattern| accepts a string. When a string is passed, missing fields are complemented by baseURL, which the SW script URL is internally used when adding the new router rule. So some missing components will use the values inherited from the baseURL in that case. After this CL, the URLPatternInit input also internally uses the SW script URL as a baseURL if not provided. It typically happens when the input is an object that complies with the URLPatterInit interface. On the other hand, it wouldn't affect the case when URLPatternInit is the output of `new URLPattern()`. Thank to the recent change on the URLPattern[1], the constructed components in URLPatternInit will inherit or wildcarded. baseURL won't override components if those are not empty. This behavior change is based on the discussion in whatwg/urlpattern#182. [1] whatwg/urlpattern#198 Bug: 1371756 Change-Id: I5cce80fde05cf18237c8b6412b00e017ff5aad5b
The following changes apply to patterns which are constructed using a base URL, the constructor string syntax, or both -- but not any pattern which explicitly specifies components separately without a base URL.
For example:
"https://example.com/*"
also matches with any username, password, search, and hash. Previously this would be written"https://*:*@example.com/*\\?*#*"
.new URLPattern({ pathname: "/login" }, "https://example.com/?q=hello")
accepts any query string and hash, not only"?q=hello"
and""
."https://*:*"
or{protocol: "https"}
means "any HTTPS URL, on any port", and"https://*"
means "any HTTPS URL on the default port (443)". These have the same meaning whether or not a base URL is provided, since specifying the protocol prohibits inheritance of other components.This makes patterns more expansive than before, in cases where wildcards are likely to be desirable.
The logic of inheriting components from a base URL dictionary is also similarly changed in a way that may make it not match where it did before, but more consistently with the above and with how relative URL strings are resolved. For example,
new URLPattern("https://example.com/foo?q=1#hello").test({pathname: "/foo", hash: "hello", baseURL: "https://example.com/bar?q=1"})
previously returnedtrue
but will now befalse
, since the search component is not inherited when the pathname is specified. This is analogous to hownew URL("/foo#hello", "https://example.com/bar?q=1")
works. The reverse is also possible; in both cases this is quite niche.Fixes #179.
(See WHATWG Working Mode: Changes for more details.)
Preview | Diff