Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle relative URLs in entries #4

Open
FraGag opened this issue Nov 21, 2018 · 0 comments
Open

Handle relative URLs in entries #4

FraGag opened this issue Nov 21, 2018 · 0 comments

Comments

@FraGag
Copy link
Owner

FraGag commented Nov 21, 2018

Feeds to Pocket currently doesn't support relative URLs in entries. When a relative URL is encountered, it is added to the feed's processed_entries, but it is not sent to Pocket.

We use Url::parse, which only handles absolute URLs. Url::join can resolve an URL relative to a given absolute URL.

There are feeds in the wild that have relative URLs in their entries. The one that affects me right now is Sonic Retro's Atom feed: it uses scheme-relative URLs (i.e. URLs that begin with //sonicretro.org). (It has been doing so since sometime in March 2018, but I only notice it now, 8 months later. Oops!)

Atom specifies that the href attribute of a link element must be an IRI reference. An IRI reference can be an absolute or a relative IRI. The RSS "specification" mentions in the first paragraph under Comments that links must begin with a scheme, meaning they must be absolute URLs.

Relative URLs are supposed to be resolved by taking the xml:base attribute into account. This attribute can be applied on any XML element. Both rss and atom_syndication expose an extensions method that contain namespaced elements, but namespaced attributes seem to be ignored, and xml:base is not preserved specifically either. xml:base attributes can also contain relative URLs, and parsing the values in xml:base attributes can fail. Ideally, we would ignore errors in xml:base attributes when resolving an URL that is already absolute.

One thing to consider is whether we should use the original URL (possibly relative) or the fully-resolved absolute URL in processed_entries. Using the absolute URL means that if a feed switches from absolute URLs to relative URLs, then the URLs won't be sent again to Pocket. Pocket expects absolute URLs anyway, so we would be storing the URL we sent, essentially.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant