Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check only hostname for URL sources #191

Merged
merged 1 commit into from
Mar 1, 2024

Conversation

pedromml
Copy link

@pedromml pedromml commented Mar 1, 2024

This PR aims to solve OpenHistoricalMap/issues#643
It uses the URL() constructor to check if the source is a URL and only uses the hostname on the regex checks.
New tests were also added to cover having a URL in the source string.

@pedromml
Copy link
Author

pedromml commented Mar 1, 2024

This fix relies on the URL() constructor recognizing the entire source string as an URL, so if the user types in "I found this on https://abc.com/xyz-google", the check is going to default to comparing the entire string to the regexes. I tried separating the source string into words and checking each of them, but that would create a new problem: If a user typed in "Google drive", which should be ignored, the word "Google" would receive a warning. Should something be done abut these situations?

Copy link
Member

@1ec5 1ec5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this issue. The tests look good, and I can’t think of any other edge cases that would pose a problem here.

const sourceURL = new URL(source);
source = sourceURL.hostname;
} catch (e) {
// Source string is not a url
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this fallthrough will preserve the original passed-in value of source. 👍

@1ec5
Copy link
Member

1ec5 commented Mar 1, 2024

This fix relies on the URL() constructor recognizing the entire source string as an URL, so if the user types in "I found this on https://abc.com/xyz-google", the check is going to default to comparing the entire string to the regexes. I tried separating the source string into words and checking each of them, but that would create a new problem: If a user typed in "Google drive", which should be ignored, the word "Google" would receive a warning. Should something be done abut these situations?

You have a point that the rule will only catch either problematic URLs in isolation or phrases that aren’t URLs, but not URLs embedded in other text. I suspect this will be unlikely for the particular sources we’re flagging. The goal is to head off inadvertent mistakes or casual, well-meaning usage of problematic sources, but there are plenty of ways to intentionally get around the warning, such as not entering a source at all.

@1ec5
Copy link
Member

1ec5 commented Mar 1, 2024

For future reference: you’ll generally want to avoid posting a pull request from a branch with the same name as the target branch. In this case, you’d want to create a new branch specific to the bug fix. PRing staging into staging will give you a harder time keeping the branch up-to-date as this upstream repository changes. Once this PR lands, you can delete your branch and fetch it again.

@1ec5 1ec5 added the bug label Mar 1, 2024
@1ec5 1ec5 merged commit fac20bd into OpenHistoricalMap:staging Mar 1, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incompatible source validator warning in iD should only match hostname in URL
2 participants