-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report redirects as broken links #25
Comments
I encountered an interesting case with redirects: How do you think, is it the problem on that page and we should use some another page, or xrefcheck should be able to handle such cases? If the latter holds, would it be enough if I allow specifying exclusions list? |
There is another possible solution to this which may be handy in general, that is, allow local errors suppressions. For instance, prefixing a link with |
I would suggest treating 302, 304, and 307 as ok. |
The semantics of 302 and 307 are temporary redirects, so if you get them, it means that, most likely, the URL is valid, it is just altered temporarily, but for future requests you should use this same URL. 304 is only used for caching, so it is equivalent to 2xx for our purposes. |
What about the case with gitlab, that on access to non-existing file it returns "302 Found" and redirects to the root of repo? I guess I should allow adding it an exception written down in config then 🤔 |
As I said, I am for just accepting 302, 304, and 307 as valid unconditionally. |
Ah okay, I got your point. @gromakovsky what do you think? |
I think this is what should be done. I am not sure what's the best syntax for that, but I believe you will figure it out :) |
While investigating #218, we have seen that xrefcheck does not consider redirects as valid links. The HTTP client is configured to automatically follow redirects so:
If we currently configure the client to not follow redirects, then any redirect link is considered as invalid. This changes the point of view that we can take for configurations:
As @gromakovsky pointed out, if xrefcheck reports an error due to a redirect from |
Let me first leave my thoughts on permanent redirects like with 301 error code, and I want to discuss 302 after that. I personally like the logic of "if However I'm thinking about a couple of edge cases when this does not seem to be perfect:
How much this motivation makes sense? If it sounds reasonable, then probably we want the user to have some control over how redirects are treated, and e.g. let'em say: "Expect a redirect here, I know what I'm doing". But I agree that the 2nd option from the comment above should rather be the default behaviour when getting 301.
Yeah, this is a concern for another issue. Currently, we treat authentication errors as valid links, because one of the main goals - use in CI - assumes that we at all costs avoid false detection of invalid links. But I hope that one day we will also let the users to handle such links in a smarter way. |
It seems like a way of doing this in the same configuration style that is being used in the project would be with a new list in the configuration file, like I am thinking about a couple of possible default behaviours:
The first one is simpler and will lead to less false positives. The second one will keep failing for the gitlab case because it responds with a 503 after redirect. |
Yep, looks so. Let's add these two ways to configure things. |
Now on the temporary redirect error codes (302 and others). I tried to read about it and understand a few use cases for it (yeah, this area is quite new to me). Those include (from this article):
And given all this diversity of the scenarios and how to adequately react on them, I see no other option than to put this on the user. Sometimes the user wants to treat 302 as invalid ( Also, on whether to follow redirects, this also seems to differ depending on the site! Sometimes it is reasonable to treat 302 as valid because the site knows what it is doing and the suggested new link is valid. Sometimes it makes sense to follow redirects and check the target because 302 can be produced by one site that leads me to another site, and the link can easily become outdated.
Given all this, and assuming that my interpretation of the situation is correct, I propose the following:
Plus one note: if we allow "follow redirect", it would be best if we din't automatically follow the entire chain, but rather picked the immediate target, and re-run the check recursively. Making sure that all the rules (exclusions and redirects treatment) are applied to every link in the chain, which sounds good.
How does that sound? If we go with this plan, it turns out to be a relatively large issue, but I would like us to try it unless we hit some severe difficulties implementing it. |
I will try to sum up the conclusions here in order to check that it is clear. We can refine or correct it if I missed something:
I am not sure about how to introduce the 'treat temporary redirects as invalid' configuration. Perhaps with a global parameter that is false by default? This default behaviour will solve #218 and, if the configuration does not work in some case and produces a false positive,
|
Looks so 👍 Only a couple of corrections: On the exact codes: Wikipedia says that 303 should be treated as similar to 302, is that incorrect? Also on 304 (we didn't discuss this yet) - could you elaborate on your suggestion? On how to make this all configurable: I would suggest going with config like externalRefsRedirects:
- to: *
outcome: valid
- to: gitlab.com/* # pseudo-item, in reality should follow the format of `ignoreExternalRefsTo`
outcome: invalid
- to: xxx
on: permanent
outcome: follow
- to: yyy
on: 304
outcome: follow where Probably the annotations should be made a bit more configurable too, e.g. I suggest inserting But, I would actually leave these annotations to a separate pull request. This all is already very bug amount of work, and I'd like it to be split into two or more pieces if possible. It's fine to create several PRs for the same #25 ticket in such cases. |
Sorry, I took it from a comment above in this thread. For the defaults, I would agree with treating 301 and 308 as invalid (those that we consider to be permanent redirects), and treating 302, 303 and 307 as valid (those that we consider to be temporary redirects). I like your proposal, it will allow to configure the redirect behaviour more precisely beyond the defaults. In case of rule overlapping in the |
Ah I see, fair. I think we will have to discuss 304 separately, to me it seems like a completely different thing compared to temporary and permanent redirects.
I was thinking myself about picking the first matching, but somehow provided an example that illustrates the opposite 😄 I guess I did so because it feels natural to add new entries to the end of the list, and those are concrete patterns that will be added later, and the default rules are usually added in the very beginning. So picking the last matching may make sense. I find this a relatively weak argument, though. What do you think? |
…nagement [#218] Change redirects default behaviour
I have closed #218, which is included in the 0.3.0 milestone, with a first PR that changes the redirect links default behaviour. Should we also add this issue to 0.3.0 in order to include the work that remains to do? |
Good point, please include it. |
Problem: We previously changed the default behaviour of Xrefcheck when following link redirects, but do not provided a way to configure it. Solution: We are adding a new field in the configuration file to allow writing a list of redirect rules that will be applied to links that match them.
Some tests for testing errors related to redirect chains: broken chains and cycles.
Proposal of showing redirect chains in the verify errors output.
Removing some repetitive code from tests.
Redirect configuration explanation added in the FAQ section.
Problem: We previously changed the default behaviour of Xrefcheck when following link redirects, but do not provided a way to configure it. Solution: We are adding a new field in the configuration file to allow writing a list of redirect rules that will be applied to links that match them.
Problem: We previously changed the default behaviour of Xrefcheck when following link redirects, but do not provided a way to configure it. Solution: We are adding a new field in the configuration file to allow writing a list of redirect rules that will be applied to links that match them.
Problem: We previously changed the default behaviour of Xrefcheck when following link redirects, but did not provide a way to configure it. Solution: We are adding a new field in the configuration file to allow writing a list of redirect rules that will be applied to links that match them.
Problem: We previously changed the default behaviour of Xrefcheck when following link redirects, but did not provide a way to configure it. Solution: We are adding a new field in the configuration file to allow writing a list of redirect rules that will be applied to links that match them.
…rules [#25] Redirect links with configuration rules
Problem: after recent work on Xrefcheck redirect behavior, it remained to discuss how to handle 304 redirects. Solution: with the current default configuration, 304 redirects are considered as valid, which seems appropriate taking into account that 304 responses usually mean that you previously received a successful response for that same request and there is no need to retransmit the resource again. We add a test case for this default config and update the FAQ section.
Problem: after recent work on Xrefcheck redirect behavior, it remained to discuss how to handle 304 redirects. Solution: with the current default configuration, 304 redirects are considered as valid, which seems appropriate taking into account that 304 responses usually mean that you previously received a successful response for that same request and there is no need to retransmit the resource again. We add a test case for this default config and update the FAQ section.
Problem: After recent work on Xrefcheck redirect behavior, it remained to discuss how to handle 304 redirects. Solution: With the current default configuration, 304 redirects are considered as valid, which seems appropriate taking into account that 304 responses usually mean that you previously received a successful response for that same request and there is no need to retransmit the resource again. We add a test case for this default config and update the FAQ section.
Problem: We previously changed the default behaviour of Xrefcheck when following link redirects, but did not provide a way to configure it. Solution: We are adding a new field in the configuration file to allow writing a list of redirect rules that will be applied to links that match them.
Note: a real life use case when a permanent redirect was intentionally desired. That measure is kinda about achieving the same goal as xrefcheck, but some users may still prefer using such proxies. |
Clarification and motivation
As far as I understand currently
xrefcheck
treats redirects as valid links, but usually they indicate that something is wrong. If you have a linkx
in your md file and it redirects toy
, it will usually be the case that:x
withy
and eliminate unnecessary redirection step.y
is some page that says "x
does not exist, but you can enjoyz
right here" (and does not makexrefcheck
fail).Acceptance criteria
By default 3xx codes should be reported as broken links (I don't know these codes well enough, maybe some specific codes are ok, you are welcome to consider them ok if you find it better).
Optional: make it possible (add a flag or something) to say that 3xx codes are valid.
Note that there is another (private) issue about ignoring specific broken links.
The text was updated successfully, but these errors were encountered: