-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: 1840 validator accepts replacement character in stop name field #1870
feat: 1840 validator accepts replacement character in stop name field #1870
Conversation
@emmambd Please advise how to properly name the notice and its description. |
📝 Acceptance Test Report📋 Summary✅ The rule acceptance has passed for commit a19abf6 📊 Notices ComparisonNew Errors (2 out of 1588 datasets, ~0%) ✅Details of new errors due to code change, which is less than the provided threshold of 1%.
Dropped Errors (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. New Warnings (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. Dropped Warnings (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. 🛡️ Corruption Check0 out of 1588 sources (~0 %) are corrupted. ⏱️ Performance Assessment📈 Validation TimeAssess the performance in terms of seconds taken for the validation process.
|
I'm thinking it's worth making this more generic by covering other similar text based fields that could also be affected, such as stop description, agency name, or route names. |
@davidgamez We can create a new annotation, @NoReplacementChar, to cover the replacement char check for all related fields, and have a validator automatically generated. Should we still merge this PR, or delete the branch and open a new issue for the generic check? How would you like to proceed? @emmambd |
Thanks for chiming in @smsm1-ito! You make a great point. @qcdyx Let's delete this branch and create a generic notice instead. |
Let's use the same issue. |
This contribution does not follow the conventions set by the Google Java style guide. Please run the following command line at the root of the project to fix formatting errors: |
📝 Acceptance Test Report📋 Summary✅ The rule acceptance has passed for commit f4b39df 📊 Notices ComparisonNew Errors (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. Dropped Errors (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. New Warnings (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. Dropped Warnings (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. 🛡️ Corruption Check0 out of 1588 sources (~0 %) are corrupted. ⏱️ Performance Assessment📈 Validation TimeAssess the performance in terms of seconds taken for the validation process.
|
Hey @emmambd and @davidgamez, any suggestions for the name of this annotation? @NoInvalidCharacters? @ShouldNotContainInvalidCharacters? |
This contribution does not follow the conventions set by the Google Java style guide. Please run the following command line at the root of the project to fix formatting errors: |
…er-in-stop_name-field
…er-in-stop_name-field
This contribution does not follow the conventions set by the Google Java style guide. Please run the following command line at the root of the project to fix formatting errors: |
1 similar comment
This contribution does not follow the conventions set by the Google Java style guide. Please run the following command line at the root of the project to fix formatting errors: |
📝 Acceptance Test Report📋 Summary✅ The rule acceptance has passed for commit b7dc7f1 📊 Notices ComparisonNew Errors (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. Dropped Errors (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. New Warnings (4 out of 1588 datasets, ~0%) ✅Details of new errors due to code change, which is less than the provided threshold of 1%.
Dropped Warnings (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. 🛡️ Corruption Check0 out of 1588 sources (~0 %) are corrupted. ⏱️ Performance Assessment📈 Validation TimeAssess the performance in terms of seconds taken for the validation process.
|
📝 Acceptance Test Report📋 Summary✅ The rule acceptance has passed for commit 6ebf6df 📊 Notices ComparisonNew Errors (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. Dropped Errors (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. New Warnings (4 out of 1588 datasets, ~0%) ✅Details of new errors due to code change, which is less than the provided threshold of 1%.
Dropped Warnings (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. 🛡️ Corruption Check0 out of 1588 sources (~0 %) are corrupted. ⏱️ Performance Assessment📈 Validation TimeAssess the performance in terms of seconds taken for the validation process.
|
@emmambd @tzujenchanmbd May I have a list of fields' names and descriptions that require @NoInvalidCharacters decoration please? |
core/src/main/java/org/mobilitydata/gtfsvalidator/notice/NoInvalidCharactersNotice.java
Outdated
Show resolved
Hide resolved
📝 Acceptance Test Report📋 Summary✅ The rule acceptance has passed for commit 59ece2b 📊 Notices ComparisonNew Errors (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. Dropped Errors (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. New Warnings (4 out of 1588 datasets, ~0%) ✅Details of new errors due to code change, which is less than the provided threshold of 1%.
Dropped Warnings (0 out of 1588 datasets, ~0%) ✅No changes were detected due to the code change. 🛡️ Corruption Check0 out of 1588 sources (~0 %) are corrupted. ⏱️ Performance Assessment📈 Validation TimeAssess the performance in terms of seconds taken for the validation process.
📜 Memory ConsumptionList of 25 datasets(memory has increased).
List of 25 datasets(memory has decreased).
List of 25 datasets(no reference available).
|
@qcdyx @emmambd here are the text fields: The ones in bold are the fields likely to have weird characters because the rest are usually numbers and basic latin letters (for codes and fare product names)and some are not likely public facing. Where do you think it's worth expanding to? |
Question: @emmambd Why do we need to annotate fields. Shouldn't this rule be applied to all fields? |
…er-in-stop_name-field
📝 Acceptance Test Report📋 Summary✅ The rule acceptance has passed for commit 1c9887e 📊 Notices ComparisonNew Errors (0 out of 1596 datasets, ~0%) ✅No changes were detected due to the code change. Dropped Errors (0 out of 1596 datasets, ~0%) ✅No changes were detected due to the code change. New Warnings (4 out of 1596 datasets, ~0%) ✅Details of new errors due to code change, which is less than the provided threshold of 1%.
Dropped Warnings (0 out of 1596 datasets, ~0%) ✅No changes were detected due to the code change. 🛡️ Corruption Check0 out of 1596 sources (~0 %) are corrupted. ⏱️ Performance Assessment📈 Validation TimeAssess the performance in terms of seconds taken for the validation process.
📜 Memory ConsumptionList of 25 datasets(memory has increased).
List of 25 datasets(memory has decreased).
List of 25 datasets(no reference available).
|
This contribution does not follow the conventions set by the Google Java style guide. Please run the following command line at the root of the project to fix formatting errors: |
…er-in-stop_name-field
@davidgamez @emmambd @skalexch I've applied @NoInvalidCharacters to all String fields except ids, ready for a second review. |
We can address this more generically by adding the notice trigger while parsing the strings. This will make the notice future-proof and remove the need for the annotation. See link. |
📝 Acceptance Test Report📋 Summary✅ The rule acceptance has passed for commit 3032e01 📊 Notices ComparisonNew Errors (0 out of 1601 datasets, ~0%) ✅No changes were detected due to the code change. Dropped Errors (0 out of 1601 datasets, ~0%) ✅No changes were detected due to the code change. New Warnings (5 out of 1601 datasets, ~0%) ✅Details of new errors due to code change, which is less than the provided threshold of 1%.
Dropped Warnings (0 out of 1601 datasets, ~0%) ✅No changes were detected due to the code change. 🛡️ Corruption Check0 out of 1601 sources (~0 %) are corrupted. ⏱️ Performance Assessment📈 Validation TimeAssess the performance in terms of seconds taken for the validation process.
📜 Memory ConsumptionList of 25 datasets(memory has increased).
List of 25 datasets(memory has decreased).
List of 25 datasets(no reference available).
|
@davidgamez The annotation approach makes the code cleaner and easier to maintain compared to manually checking row.asString(). Checking the entire row manually for invalid characters can consume more time compared to using the @NoInvalidCharacters annotation on individual string fields. Manual checking involves iterating through each character of the string and performing validation logic, which can be computationally expensive if done for every field in every row. |
The asString method doesn't check the entire row. It only parses the columns marked as String. If the intention is to check all string fields, this is the natural place. Otherwise, we will need to mark all string fields with the annotation, making the annotation redundant to the definition of the field type. I'm unsure what you meant by "manual check"; the exact implementation proposed by this PR(String.contains) can be placed in the asString method. |
@davidgamez As discussed in standup, we close this PR and I'll work on a new branch for the asString() approach. |
Summary:
Closes #1840
Expected behavior:
If stop name contains a replacement character, show a notice.
Please make sure these boxes are checked before submitting your pull request - thanks!
gradle test
to make sure you didn't break anything