Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check that GTFS schedule URL ends in ".zip" #1278

Open
owades opened this issue Oct 18, 2022 · 4 comments
Open

Check that GTFS schedule URL ends in ".zip" #1278

owades opened this issue Oct 18, 2022 · 4 comments
Labels
GTFS Best Practices Used for Adding or changing rules that belong in the GTFS Best Practices help wanted We need help from the community new rule New rule to be added status: Needs discussion We need a discussion on requirements before calling this issue ready

Comments

@owades
Copy link

owades commented Oct 18, 2022

Describe the problem

The GTFS Best Practices include the following best practice:

Datasets should be published at a public, permanent URL, including the zip file name. (e.g., www.agency.org/gtfs/gtfs.zip)...

We don't have a check that confirms that the gtfs feed URL contains the filename

Describe the new validation rule

If GTFS schedule URL does not end in ".zip", trigger a "missing_zip_file_name" notice

Sample GTFS datasets

GTFS schedule URL without zip file name:
City of Commerce, CA, USA: https://citycommbus.com/gtfs

Severity

Warning

Additional context

No response

@owades owades added new rule New rule to be added status: Needs triage Applied to all new issues labels Oct 18, 2022
@isabelle-dr
Copy link
Contributor

Thank you for opening this, interesting!

One could argue that an URL called https://citycommbus.com/gtfs, giving a gtfs.zip file when downloaded "contains the zip file name", just without the ".zip" extension.
If we want to validate this Best Practice, maybe we could go further and check if the name of the file downloaded is included in the name of the URL.

Is not having the ".zip" causing problems? Is it breaking something?

@isabelle-dr isabelle-dr added GTFS Best Practices Used for Adding or changing rules that belong in the GTFS Best Practices status: Needs discussion We need a discussion on requirements before calling this issue ready help wanted We need help from the community and removed status: Needs triage Applied to all new issues labels Nov 3, 2022
@isabelle-dr
Copy link
Contributor

I labeled "help wanted" because I'd like to hear what others think of this validation rule.

@owades
Copy link
Author

owades commented Nov 3, 2022

Thanks @isabelle-dr, I appreciate your input and I am also interested in what others think. My goal here is to capture the guideline accurately, and if I'm misunderstanding what the guideline means we can close out this request.

@derhuerst
Copy link

derhuerst commented Nov 5, 2022

One could argue that an URL called https://citycommbus.com/gtfs, giving a gtfs.zip file when downloaded "contains the zip file name", just without the ".zip" extension.

I agree that this is a possible interpretation of the spec.

Also, I assume the intention behind this is that, when people download the GTFS dataset using their browser on a platform where file extensions determine its behaviour, the file should be named *.zip; This is the case if the server sends a Content-Disposition: attachment; filename="….zip" header, even if the URL's path doesn't end with .zip.

The cost of checking the Content-Disposition header is of course a lot of complexity: The GTFS Validator will have to do an HTTP request, and people will (understandably) ask for support for Basic Auth, specific custom headers (e.g. auth keys), etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GTFS Best Practices Used for Adding or changing rules that belong in the GTFS Best Practices help wanted We need help from the community new rule New rule to be added status: Needs discussion We need a discussion on requirements before calling this issue ready
Projects
None yet
Development

No branches or pull requests

3 participants