-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow non ASCII but printable characters in ID field #918
Comments
Thank you for opening this issue.
A warning doesn't make a dataset invalid, but I see that it can cause a noisy output if having non-ASCII characters in id fields is part of your process. |
@isabelle-dr Thank you for your comment. Some of popular GTFS generation tools in Japan use non-ASCII characters in some fields except service_id. Tool1. Sono SujiyaTool: http://www.sinjidai.com/sujiya/
The number of warnings is 68,028! (screen shot on Google Transit) Tool2. Mieruka FormatTool: https://www.rosenzu.com/net/mieru/fm/
Tool3. Nishizawa ToolTool: https://home.csis.u-tokyo.ac.jp/~nishizawa/gtfs/
ExplanationThe share of these tool are about 20% for each in Japan. These IDs are natural composite keys.
|
Hello, Thank you for this precise answer. I understand better how the id fields are built in Japan (as natural composite keys). Having 60,828 warnings isn't a nice experience and it must be hard to know what is a "real" problem with the datasets. After talking with the specification team, here is what I can add: the GTFS community decided to recommend using only printable ASCII characters in id fields in order to prevent any parsing issues. If you are interested in using a different system for the id fields, and maybe changing the way the data is being checked by GTFS operators, I’ll be happy to help. Also, some GTFS providers modified the official GTFS schema in order to add information about the service:
Again, if you’re interested in adding this to your GTFS schema, I’ll be happy to help. The last thing I could recommend is maybe opening a discussion with the GTFS community and proposing a change in the specification. The process is described here. Let me know if I can be of any other assistance to you. |
Feature request
Please allow non ASCII but printable characters in ID field.
We, Japanese GTFS engineers, often use some local characters as an ID in order to ease to check the GTFS data by the non-tech person in a transit agency. Especially, we use Japanese expression of types of days such as holiday, weekday in Japanese way in service_id like "平日", "土休日". I think only a machine refers ID as an internal identifier, so even non ASCII characters are usable.
Is your feature request related to a problem? Please describe.
#712 is discussing this issue.
Also, discussion in GTFS specs relate to this issue.
Proposed solution
Remove this rule, or set the category of this rule be "INFO" rather than "WARNING".
Describe alternatives you've considered
One possible solution is to enable this rule referring to the given country code.
The text was updated successfully, but these errors were encountered: