Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quid solved #428

Merged
merged 9 commits into from
Oct 19, 2024
Merged

Quid solved #428

merged 9 commits into from
Oct 19, 2024

Conversation

Collins-Webdev
Copy link
Contributor

Contributor checklist


Description

Related issue

  • #ISSUE_NUMBER

* Overview
This PR addresses issue scribe-org#423 by implementing error handling for missing QID values in the `language_metadata.json` file. The changes focus on enhancing the robustness of the `cli_utils.py` module, particularly in scenarios where language entries lack a QID.

** Changes
1. Modified the `language_to_qid` dictionary creation process in `cli_utils.py`:
   - Implemented a try-except block to catch potential KeyErrors when accessing QID values.
   - Added a warning message for languages with missing QIDs.

2. Updated the `validate_language_and_data_type` function:
   - Enhanced error handling to accommodate languages without QIDs.
   - Improved the validation process to prevent crashes due to missing QID data.

3. Refactored related code sections for consistency and maintainability.

* Technical Details
- Utilized the `dict.get()` method with a default value of `None` to safely access potentially missing QID keys.
- Implemented a logging mechanism to warn about missing QIDs without halting execution.
- Adjusted the validation logic to gracefully handle languages with missing QIDs, allowing the CLI to continue functioning for valid entries.

** Testing
- Conducted thorough testing by removing QIDs from various language entries in `language_metadata.json`.
- Verified that the CLI continues to function correctly for languages with valid QIDs.
- Confirmed that appropriate warnings are logged for languages with missing QIDs.
- Tested edge cases, including scenarios with multiple missing QIDs and mixed valid/invalid entries.

** Impact
These changes significantly improve the resilience of the Scribe-Data CLI, ensuring it can operate effectively even when faced with incomplete language metadata. This enhancement aligns with our goal of creating a more robust and user-friendly tool.

** Next Steps
- Consider implementing a more comprehensive logging system for better traceability of warnings and errors.
- Explore the possibility of adding unit tests specifically for QID error handling scenarios.
- Evaluate the need for a data validation step during the metadata file loading process to preemptively identify and report missing or malformed entries.
Copy link

github-actions bot commented Oct 18, 2024

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you!

Maintainer checklist

  • The linting and formatting workflow within the PR checks do not indicate new errors in the files changed

  • The CHANGELOG has been updated with a description of the changes for the upcoming release and the corresponding issue (if necessary)

@andrewtavis andrewtavis added the hacktoberfest-accepted Accepted as a part of Hacktoberfest label Oct 18, 2024
@andrewtavis andrewtavis self-requested a review October 18, 2024 19:20
Comment on lines 60 to 68
for lang in language_metadata["languages"]:
lang_lower = lang["language"].lower()
qid = lang.get("qid")

if qid is None:
print(f"Warning: 'qid' missing for language {lang['language']}")
else:
language_map[lang_lower] = lang
language_to_qid[lang_lower] = qid
Copy link
Contributor

@catreedle catreedle Oct 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work on this, @Collins-Webdev! 😊

I believe we don't need to change the previous method of retrieving languages using language_metadata.items(), as the latest version of language_metadata.json no longer includes the languages key, trying to access language_metadata["languages"] will cause an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we should check for sub_languages first before attempting to retrieve the language qid. Languages like Norwegian and Chinese, which have sub-languages, naturally don't have their own qid.

- Remove assumption of 'languages' key in language_metadata
- Handle sub-languages correctly
- Improve warning messages for missing qids
Collins-Webdev and others added 2 commits October 19, 2024 12:36
- Remove assumption of 'languages' key in language_metadata
- Handle sub-languages correctly
- Improve warning messages for missing qids
@Collins-Webdev
Copy link
Contributor Author

Hello @catreedle 👋🏼,
Thank you for your feedback 🙏🏽.
I've made the following changes based on your comments:

  1. Removed the assumption of a 'languages' key in the language_metadata dictionary. The code now iterates directly over the language_metadata items.
  2. Added handling for sub-languages. The code now checks for the presence of 'sub_languages' before processing the main language qid.
  3. Improved the warning messages to differentiate between missing qids for main languages and sub-languages.

Please review the changes and let me know if any further modifications are needed.

@@ -27,8 +27,6 @@

from scribe_data.utils import DEFAULT_JSON_EXPORT_DIR

# MARK: CLI Variables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't remove marks from the code in the future @Collins-Webdev as these are meant to make the files more manageable to navigate.

@andrewtavis
Copy link
Member

50d4c30 further added back in some single quotes that were removed and were causing the tests to fail :)

Copy link
Member

@andrewtavis andrewtavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes here, @Collins-Webdev! And also appreciate the review, @catreedle! Made mine much easier 😊 Would be great to get continued support on checking PRs!

@andrewtavis andrewtavis merged commit 8321dc3 into scribe-org:main Oct 19, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest-accepted Accepted as a part of Hacktoberfest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants