You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feeds to Pocket currently doesn't support relative URLs in entries. When a relative URL is encountered, it is added to the feed's processed_entries, but it is not sent to Pocket.
We use Url::parse, which only handles absolute URLs. Url::join can resolve an URL relative to a given absolute URL.
There are feeds in the wild that have relative URLs in their entries. The one that affects me right now is Sonic Retro's Atom feed: it uses scheme-relative URLs (i.e. URLs that begin with //sonicretro.org). (It has been doing so since sometime in March 2018, but I only notice it now, 8 months later. Oops!)
Atom specifies that the href attribute of a link element must be an IRI reference. An IRI reference can be an absolute or a relative IRI. The RSS "specification" mentions in the first paragraph under Comments that links must begin with a scheme, meaning they must be absolute URLs.
Relative URLs are supposed to be resolved by taking the xml:base attribute into account. This attribute can be applied on any XML element. Both rss and atom_syndication expose an extensions method that contain namespaced elements, but namespaced attributes seem to be ignored, and xml:base is not preserved specifically either. xml:base attributes can also contain relative URLs, and parsing the values in xml:base attributes can fail. Ideally, we would ignore errors in xml:base attributes when resolving an URL that is already absolute.
One thing to consider is whether we should use the original URL (possibly relative) or the fully-resolved absolute URL in processed_entries. Using the absolute URL means that if a feed switches from absolute URLs to relative URLs, then the URLs won't be sent again to Pocket. Pocket expects absolute URLs anyway, so we would be storing the URL we sent, essentially.
The text was updated successfully, but these errors were encountered:
Feeds to Pocket currently doesn't support relative URLs in entries. When a relative URL is encountered, it is added to the feed's
processed_entries
, but it is not sent to Pocket.We use
Url::parse
, which only handles absolute URLs.Url::join
can resolve an URL relative to a given absolute URL.There are feeds in the wild that have relative URLs in their entries. The one that affects me right now is Sonic Retro's Atom feed: it uses scheme-relative URLs (i.e. URLs that begin with
//sonicretro.org
). (It has been doing so since sometime in March 2018, but I only notice it now, 8 months later. Oops!)Atom specifies that the
href
attribute of alink
element must be an IRI reference. An IRI reference can be an absolute or a relative IRI. The RSS "specification" mentions in the first paragraph under Comments that links must begin with a scheme, meaning they must be absolute URLs.Relative URLs are supposed to be resolved by taking the
xml:base
attribute into account. This attribute can be applied on any XML element. Bothrss
andatom_syndication
expose anextensions
method that contain namespaced elements, but namespaced attributes seem to be ignored, andxml:base
is not preserved specifically either.xml:base
attributes can also contain relative URLs, and parsing the values inxml:base
attributes can fail. Ideally, we would ignore errors inxml:base
attributes when resolving an URL that is already absolute.One thing to consider is whether we should use the original URL (possibly relative) or the fully-resolved absolute URL in
processed_entries
. Using the absolute URL means that if a feed switches from absolute URLs to relative URLs, then the URLs won't be sent again to Pocket. Pocket expects absolute URLs anyway, so we would be storing the URL we sent, essentially.The text was updated successfully, but these errors were encountered: