Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for converting from Hijri calendar to undate and undate interval #107

Merged
merged 40 commits into from
Dec 20, 2024

Conversation

rlskoeser
Copy link
Member

@rlskoeser rlskoeser commented Nov 21, 2024

  • lark parser for Hijri dates - supports year, year month, and year month day
  • lark transformer that turns the lark parse tree into an undate objects
  • undate converter that connects the lark parser and transformer into the undate converter class
  • unit tests for all, borrowing examples from Princeton Geniza Project (PGP) and ISMI datasets

related to #10

improvements after initial review

  • improve documentation (with example usage) and include in sphinx docs
  • add a calendar property to undate (default gregorian)
  • initialize undate object with numeric year, month, day and add optional calendar (default gregorian)
  • use calendar to convert earliest/latest to gregorian for comparison; confirm date precision is set as expected
  • remove 3-digit year constraint in parser

Questions:

  • How reasonable is it to limit support to 3+ digit Hijri years? If we can make this assumption the parsing is much faster and more efficient, AND we can support both the year month day format in your ISMI data as well as the day month year in PGP data. ... but it might be an unreasonable limitation.
  • We don't yet have a notion of a precision or size for undate intervals, but this code suggests that it would be a useful option, for when the length of the interval is known
  • I'd like to write a notebook to test this logic against PGP dates, but prefer to do that in a subsequent PR - does that make sense, or would a demo notebook make it easier to review this code?

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced HijriDateConverter for converting Hijri calendar dates to Gregorian dates.
    • Added grammar rules for parsing Hijri dates, supporting various formats.
    • Implemented HijriDateTransformer for transforming parsed Hijri dates into Undate or UndateInterval objects.
    • Added HebrewDateConverter for converting Hebrew calendar dates to Gregorian dates.
    • Introduced GregorianDateConverter for handling Gregorian calendar date conversions.
    • Added support for multiple calendar types in the Undate class, including a new Calendar class.
    • Enhanced dependency management for project libraries.
    • Updated documentation to clarify library functionality and support for various calendars.
  • Bug Fixes

    • Updated EDTF date parsing rules to include the 20th day of the month.
  • Tests

    • Added comprehensive test suites for HijriDateConverter, hijri_parser, HijriDateTransformer, and HebrewDateConverter to ensure functionality and error handling.

Copy link
Contributor

coderabbitai bot commented Nov 21, 2024

Walkthrough

The pull request introduces several changes, primarily focusing on the addition of the HijriDateConverter class and related components for converting Hijri calendar dates to Gregorian dates. The pyproject.toml file is updated to modify dependencies, including the addition of convertdate and a change in the lark dependency. New grammar specifications for parsing Hijri and Hebrew dates are also introduced, along with tests to validate the new functionality. Additionally, the Hebrew calendar support is added, enhancing the overall date conversion capabilities.

Changes

File Path Change Summary
pyproject.toml Updated dependencies: added convertdate, changed lark to lark[interegular].
src/undate/converters/calendars/__init__.py Added HijriDateConverter, GregorianDateConverter, and HebrewDateConverter to __all__.
src/undate/converters/calendars/hijri/__init__.py Imported HijriDateConverter and updated __all__ to include it.
src/undate/converters/calendars/hijri/converter.py Introduced HijriDateConverter class with methods for converting Hijri dates.
src/undate/converters/calendars/hijri/hijri.lark Added grammar rules for parsing Hijri dates, including definitions for years, months, and days.
src/undate/converters/calendars/hijri/parser.py Introduced parser.py to initialize hijri_parser using hijri.lark.
src/undate/converters/calendars/hijri/transformer.py Added HijriDateTransformer class for transforming Hijri date parse trees.
src/undate/converters/calendars/gregorian.py Introduced GregorianDateConverter class for Gregorian date handling.
src/undate/converters/calendars/hebrew/__init__.py Imported HebrewDateConverter and updated __all__ to include it.
src/undate/converters/calendars/hebrew/converter.py Introduced HebrewDateConverter class for Hebrew date conversions.
src/undate/converters/calendars/hebrew/hebrew.lark Added grammar rules for parsing Hebrew dates.
src/undate/converters/calendars/hebrew/parser.py Introduced parser.py to initialize hebrew_parser using hebrew.lark.
src/undate/converters/edtf/edtf.lark Updated day rule regex to include the 20th day of the month.
src/undate/converters/edtf/parser.py Changed import from os.path to pathlib, removed commented-out test cases.
src/undate/undate.py Updated comments and type annotations in UndateInterval class.
tests/test_converters/test_calendars/test_hijri/test_hijri_converter.py Added tests for HijriDateConverter functionality.
tests/test_converters/test_calendars/test_hijri/test_hijri_parser.py Introduced tests for hijri_parser to validate various Hijri date formats.
tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py Added tests for HijriDateTransformer class and its conversion logic.
tests/test_converters/test_calendars/test_hebrew/test_hebrew_converter.py Introduced tests for HebrewDateConverter functionality.
tests/test_converters/test_calendars/test_hebrew/test_hebrew_parser.py Introduced tests for hebrew_parser to validate various Hebrew date formats.
tests/test_converters/test_calendars/test_hebrew/test_hebrew_transformer.py Added tests for HebrewDateTransformer class and its conversion logic.

Possibly related PRs

Suggested reviewers

  • ColeDCrawford
  • robcast
  • jdamerow

🐰 In the garden where dates bloom bright,
A converter hops with all its might.
From Hijri to Gregorian, it leaps with glee,
Transforming dates for you and me!
With tests in place, it’s sure to shine,
A joyful change, oh how divine! 🌼

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@rlskoeser rlskoeser requested a review from robcast November 21, 2024 22:05
@rlskoeser
Copy link
Member Author

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Nov 21, 2024

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Outside diff range and nitpick comments (18)
tests/test_converters/test_calendars/test_hijri/test_hijri_converter.py (3)

7-12: Consider improving test method name and documentation.

  1. The method name test_parse_ has a trailing underscore which is unusual. Consider renaming it to something more descriptive like test_parse_hijri_dates or test_parse_valid_dates.
  2. Add a docstring to explain the test's purpose and the significance of the test dates (if any).

Example improvement:

-    def test_parse_(self):
+    def test_parse_hijri_dates(self):
+        """Test parsing of valid Hijri dates in different formats.
+        
+        Tests conversion of:
+        - Specific day (7 Jumādā I 1243)
+        - Month only (Rajab 495)
+        - Year only (441)
+        """

14-18: Add context about test data selection.

Consider adding a comment explaining why these specific dates were chosen for testing. Are they edge cases or examples from real data (e.g., from Princeton Geniza Project or ISMI datasets mentioned in PR objectives)?


20-24: Consider adding more year parsing test cases.

Given the PR's question about limiting support to Hijri years with three or more digits, consider adding test cases for:

  1. Two-digit years (to verify they're rejected if that's the intended behavior)
  2. Four-digit years
  3. Edge cases around year boundaries
tests/test_converters/test_calendars/test_hijri/test_hijri_parser.py (3)

5-5: Enhance the test scope documentation.

The current comment could be more descriptive about what constitutes a "valid date" and what aspects of parsing are being tested.

Consider expanding the comment to:

-# for now, just test that valid dates can be parsed
+# For now, we only verify that the parser can handle valid Hijri dates without errors.
+# TODO: Add tests for:
+# - Invalid date validation
+# - Edge cases (e.g., leap years)
+# - Error messages for malformed inputs

7-46: Consider adding more comprehensive test cases.

While the current test cases cover various format variations well, consider adding:

  1. Boundary cases:
    • Minimum/maximum valid years
    • Edge cases for different months (e.g., months with varying days)
  2. Invalid inputs:
    • Invalid day numbers (e.g., 31st in a 30-day month)
    • Invalid month names
    • Malformed strings

Here's a suggested addition to the test cases:

# Add to testcases list:
    # boundary cases
    "1 Muḥarram 1",  # minimum year
    "29 Dhū al-Ḥijjah 9999",  # maximum year
    
    # more complex formats
    "early Ramaḍān 903",  # with period qualifier
    "mid-Shawwāl 1042",  # with period qualifier and hyphen

1-51: Consider adding test fixtures and helper functions.

To improve maintainability and reduce duplication as more test cases are added, consider:

  1. Creating fixtures for common test data
  2. Adding helper functions for result validation
  3. Organizing test cases into classes by format type

Example structure:

@pytest.fixture
def valid_hijri_dates():
    return {
        "year_only": ("521", {"year": 521}),
        "month_year": ("Muḥarram 900", {"year": 900, "month": 1}),
        # ... more test cases with expected results
    }

class TestHijriParser:
    def test_year_format(self, valid_hijri_dates):
        date_string, expected = valid_hijri_dates["year_only"]
        result = hijri_parser.parse(date_string)
        self._assert_parsed_result(result, expected)

    @staticmethod
    def _assert_parsed_result(result, expected):
        for key, value in expected.items():
            assert getattr(result, key) == value
src/undate/converters/calendars/hijri/hijri.lark (3)

10-12: Consider adding upper bound for year validation.

The current pattern /\d{3,}/ allows unlimited digits, which could accept invalid years. Consider:

-year: /\d{3,}/
+year: /\d{3,4}/  // Assuming max 4 digits for reasonable Hijri years

Also, consider adding a comment documenting the expected year range for the Hijri calendar.


28-28: Optimize day pattern for better readability.

The current day pattern can be simplified:

-day: /[1-9]/ | /[12][0-9]/ | /30/
+day: /([1-9]|[12][0-9]|30)/

33-51: Enhance month patterns for consistency and robustness.

Several improvements could be made to the month patterns:

  1. Use consistent pattern style (regex vs quotes):
-month_7: "Rajab"
+month_7: /Rajab/
  1. Consider adding case-insensitive flag for better flexibility:
-month_1: /al-Mu[ḥh]arram/ | /Mu[ḥh]arram/
+month_1: /(?i)al-mu[ḥh]arram/ | /(?i)mu[ḥh]arram/
  1. Add comments for each variant to document the source/reason for inclusion.
src/undate/converters/edtf/edtf.lark (1)

Line range hint 1-100: Clarify the relationship between EDTF and Hijri date parsing

Given that this PR aims to add Hijri calendar support, please clarify:

  1. Will the EDTF grammar serve as a base for Hijri date parsing?
  2. How will the parsing logic handle the differences between EDTF and Hijri date formats?

Consider adding documentation comments at the top of this file to explain its role in the broader date parsing architecture.

src/undate/converters/calendars/hijri/transformer.py (4)

7-9: Enhance class docstring with more details.

Consider expanding the docstring to include:

  • Input format expectations
  • Return value details
  • Examples of usage
 class HijriDateTransformer(Transformer):
-    """Transform a Hijri date parse tree and return an Undate or
-    UndateInterval."""
+    """Transform a Hijri date parse tree into Gregorian dates.
+
+    Transforms parsed Hijri dates into either a single Undate (for complete dates)
+    or an UndateInterval (for partial dates).
+
+    Args:
+        items: Parse tree containing year, month, and/or day nodes
+
+    Returns:
+        Undate: For complete dates with year, month, and day
+        UndateInterval: For partial dates missing month or day
+
+    Example:
+        >>> transformer = HijriDateTransformer()
+        >>> tree = parser.parse("450 AH")
+        >>> result = transformer.transform(tree)
+    """

28-29: Consider implementing the TODO for date precision.

The TODO raises a valid point about precision for partial dates. This could be important for academic or historical research purposes.

Would you like me to help design and implement the precision feature for UndateInterval? This could include:

  • Adding a precision enum (YEAR, MONTH, DAY)
  • Extending UndateInterval to store precision
  • Updating the transformation logic to set appropriate precision

32-35: Remove commented-out code.

The commented-out year method adds unnecessary noise. If it's not needed, remove it.

-    # this does nothing
-    # def year(self, items):
-    #     return Tree(data="year", children=[items[0]])

45-47: Add more comprehensive calendar constants.

Consider adding more calendar-specific constants for better maintainability.

 MIN_MONTH, MIN_DAY = 1, 1
 MAX_MONTH = 12
+MAX_DAY = 30  # Maximum days in Islamic calendar month
+MIN_YEAR = 1  # Minimum valid Hijri year
+MAX_YEAR = 9999  # Maximum supported year
tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py (1)

7-42: Consider adding more edge cases to the test suite.

The current test cases are well-structured and include real-world examples with proper documentation. However, consider adding these edge cases:

  • Invalid month names
  • Out-of-range day values
  • Edge cases around Islamic calendar year boundaries
  • Different variations of month name spellings
pyproject.toml (1)

12-12: Consider adding version constraints for dependencies.

While the project is in Pre-Alpha, it's good practice to specify version constraints to ensure reproducible builds and prevent compatibility issues:

-dependencies = ["lark[interegular]", "numpy", "convertdate"]
+dependencies = [
+    "lark[interegular]>=1.1.5",
+    "numpy>=1.24.0",
+    "convertdate>=2.4.0"
+]
src/undate/undate.py (1)

450-452: Enhance the TODO comment with implementation details

The TODO suggestion about adding precision/length field using DatePrecision is valuable, especially for Hijri date conversions. Consider enhancing the comment to specify:

  1. Whether the precision should be stored per interval or inherited from constituent Undates
  2. How precision would affect interval comparisons and duration calculations
  3. Examples of precision usage in Hijri date conversion scenarios

Here's a suggested enhancement for the TODO comment:

-    # TODO: let's think about adding an optional precision / length /size field
-    # using DatePrecision
+    # TODO: Consider adding an optional precision field using DatePrecision to:
+    # 1. Track precision of the interval (independent or derived from Undate precision)
+    # 2. Support precise interval comparisons and duration calculations
+    # 3. Handle varying precision levels in calendar conversions (e.g., Hijri dates)
src/undate/converters/calendars/hijri/converter.py (1)

47-48: Consider implementing reverse conversion functionality

Supporting conversion from an Undate or UndateInterval back to a Hijri date could enhance the utility of the HijriDateConverter. This feature would allow users to display dates in the Hijri calendar format when needed.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c83ed49 and 778c67b.

📒 Files selected for processing (13)
  • pyproject.toml (1 hunks)
  • src/undate/converters/calendars/__init__.py (1 hunks)
  • src/undate/converters/calendars/hijri/__init__.py (1 hunks)
  • src/undate/converters/calendars/hijri/converter.py (1 hunks)
  • src/undate/converters/calendars/hijri/hijri.lark (1 hunks)
  • src/undate/converters/calendars/hijri/parser.py (1 hunks)
  • src/undate/converters/calendars/hijri/transformer.py (1 hunks)
  • src/undate/converters/edtf/edtf.lark (1 hunks)
  • src/undate/converters/edtf/parser.py (1 hunks)
  • src/undate/undate.py (1 hunks)
  • tests/test_converters/test_calendars/test_hijri/test_hijri_converter.py (1 hunks)
  • tests/test_converters/test_calendars/test_hijri/test_hijri_parser.py (1 hunks)
  • tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • src/undate/converters/calendars/init.py
  • src/undate/converters/calendars/hijri/parser.py
  • src/undate/converters/edtf/parser.py
🧰 Additional context used
🪛 Ruff (0.7.0)
src/undate/converters/calendars/hijri/converter.py

45-45: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (10)
src/undate/converters/calendars/hijri/__init__.py (2)

3-3: LGTM! Clean public API declaration.

The __all__ declaration explicitly defines the public API, following Python best practices.


1-1: LGTM! Verify the import path.

The import statement follows Python package conventions and aligns with the PR objective.

Let's verify the existence and location of the imported module:

✅ Verification successful

Import path verified and correctly structured

The import statement is valid as:

  • The HijriDateConverter class exists in src/undate/converters/calendars/hijri/converter.py
  • The import path follows the correct package structure
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence and location of the HijriDateConverter module

# Check if the converter module exists
fd --type f "converter.py" --exec grep -l "class HijriDateConverter"

# Verify the import path structure
fd --type d "undate/converters/calendars/hijri"

Length of output: 169

tests/test_converters/test_calendars/test_hijri/test_hijri_converter.py (1)

1-6: LGTM! Clean imports and class structure.

The imports are appropriate and the test class follows proper naming conventions.

src/undate/converters/calendars/hijri/hijri.lark (2)

1-2: LGTM: Standard whitespace handling configuration.

The whitespace configuration is correctly set up using Lark's common whitespace import.


1-53: Verify grammar integration and test coverage.

Please ensure that:

  1. All date formats are covered in the test suite
  2. Edge cases are handled (e.g., invalid dates, boundary years)
  3. Integration with the converter handles all grammar productions correctly
✅ Verification successful

Let me gather more specific information about the test coverage and integration.


Let me search for the converter implementation one more time with a different approach.


Let me check the converter implementation to complete the verification.


Grammar integration and test coverage are comprehensive

The verification shows:

  1. Extensive test coverage in test_hijri_parser.py covering all date formats:

    • Single year (e.g., "521")
    • Month-year with variants (e.g., "al-Muḥarram 900", "Safar 581")
    • Day-month-year (e.g., "7 Jumādā I 1243")
    • Year-month and year-month-day formats
  2. Proper integration with converter system:

    • HijriDateConverter properly implements parsing through the grammar
    • Transformer correctly handles all date formats and precision levels
    • Error handling for invalid inputs is implemented
    • Calendar-specific labeling is maintained
  3. Edge cases are handled:

    • Different month name variants (with/without accents)
    • Valid day ranges (1-30)
    • Three-digit year assumption is properly enforced
    • Invalid input detection via UnexpectedCharacters exception
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for test files and patterns
echo "Searching for test files..."
fd -e py test_ -x grep -l "hijri" {}

# Look for test patterns in the codebase
echo "Checking test patterns..."
rg "test.*hijri" -A 5

# Verify converter integration
echo "Checking converter integration..."
ast-grep --pattern 'class $CONVERTER {
  $$$
  parse($INPUT) {
    $$$
  }
  $$$
}'

Length of output: 498


Script:

#!/bin/bash
# Check test file contents for coverage of different date formats
echo "Checking test coverage..."
cat ./tests/test_converters/test_calendars/test_hijri/test_hijri_parser.py

# Check transformer implementation
echo "Checking transformer..."
cat ./tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py

# Look for the actual converter implementation
echo "Looking for converter class..."
fd -e py -x grep -l "HijriConverter" {}

Length of output: 3628


Script:

#!/bin/bash
# Look for converter implementation using ripgrep
echo "Searching for converter implementation..."
rg -l "class.*HijriConverter"

# Check imports to find converter location
echo "Checking imports..."
rg "from.*hijri.*converter.*import"

# Look for any file containing 'hijri' in the converter directory
echo "Checking converter directory..."
fd -e py . src/undate/converters/calendars/hijri/

Length of output: 663


Script:

#!/bin/bash
# Check the converter implementation
echo "Checking converter implementation..."
cat src/undate/converters/calendars/hijri/converter.py

# Check the parser implementation
echo "Checking parser implementation..."
cat src/undate/converters/calendars/hijri/parser.py

Length of output: 2421

tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py (3)

1-6: LGTM! Well-organized imports.

The imports are logically grouped and include all necessary components for testing the Hijri date transformer.


53-54: Consider adding precision validation for UndateInterval.

The precision check is only performed for Undate objects, but UndateInterval instances should also maintain precision information.

Let's verify if UndateInterval supports precision:


1-56: Verify test coverage for the Hijri transformer implementation.

The test suite provides good coverage of the basic functionality, but let's verify the coverage metrics.

✅ Verification successful

Test coverage for the Hijri transformer is comprehensive

The test suite provides excellent coverage of the HijriDateTransformer implementation:

  • Tests cover all public methods (hijri_date and month)
  • Includes test cases for all three precision levels (day, month, year)
  • Verifies both single dates (Undate) and date intervals (UndateInterval)
  • Contains real-world examples from Princeton Geniza Project and ISMI data
  • Validates correct date conversions against external reference (muqawwim.com)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for any untested public methods in the transformer
ast-grep --pattern 'class HijriDateTransformer {
  $$$
  def $method($$$) {
    $$$
  }
  $$$
}'

Length of output: 98


Script:

#!/bin/bash
# Let's first locate the transformer implementation file
fd "transformer.py" --type f

# Then examine its content and the HijriDateTransformer class
rg -A 5 "class HijriDateTransformer" 

# Also check test coverage using test file patterns
fd "test.*transformer.py" --type f

Length of output: 951


Script:

#!/bin/bash
# Let's examine the full implementation of HijriDateTransformer
cat src/undate/converters/calendars/hijri/transformer.py

# And check all test cases in the test file
cat tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py

Length of output: 4241

pyproject.toml (1)

12-12: LGTM! Dependencies align with PR objectives.

The updated dependencies support the new Hijri calendar conversion functionality:

  • lark[interegular] provides enhanced parsing capabilities for Hijri date formats
  • convertdate enables calendar conversion operations
src/undate/undate.py (1)

445-447: LGTM: Type annotations are clear and consistent

The explicit type annotations for class attributes using Union[Type, None] are well-defined and maintain compatibility with Python versions pre-3.10. They align well with the class usage and will improve type checking for the new Hijri date conversion feature.

Comment on lines +26 to +32
def test_parse_error(self):
# a string we can't parse should raise an error
with pytest.raises(ValueError):
HijriDateConverter().parse("January 2, 1991")
# empty string should also error
with pytest.raises(ValueError):
HijriDateConverter().parse("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling test coverage.

Consider adding more error test cases:

  1. Malformed Hijri dates (e.g., invalid month names)
  2. Out-of-range values (e.g., invalid day numbers)
  3. Incomplete dates (e.g., month without year)
  4. Mixed calendar formats

Example additions:

def test_parse_error_cases(self):
    """Test various error cases for Hijri date parsing."""
    invalid_cases = [
        "13 Jumādā I 1243",  # Invalid day
        "Foo 495",           # Invalid month
        "Rajab",             # Incomplete date
        "1243/5/7",         # Wrong format
    ]
    for invalid_date in invalid_cases:
        with pytest.raises(ValueError):
            HijriDateConverter().parse(invalid_date)

Comment on lines 1 to 32
import pytest
from undate.converters.calendars import HijriDateConverter
from undate.undate import Undate, UndateInterval


class TestHijriDateConverter:
def test_parse_(self):
# day
date_str = "7 Jumādā I 1243"
date = HijriDateConverter().parse(date_str)
assert date == Undate(1827, 11, 26)
assert date.label == f"{date_str} {HijriDateConverter.calendar_name}"

# month
date_str = "Rajab 495"
date = HijriDateConverter().parse(date_str)
assert date == UndateInterval(Undate(1102, 4, 28), Undate(1102, 5, 27))
assert date.label == f"{date_str} {HijriDateConverter.calendar_name}"

# year
date_str = "441"
date = HijriDateConverter().parse(date_str)
assert date == UndateInterval(Undate(1049, 6, 11), Undate(1050, 5, 31))
assert date.label == f"{date_str} {HijriDateConverter.calendar_name}"

def test_parse_error(self):
# a string we can't parse should raise an error
with pytest.raises(ValueError):
HijriDateConverter().parse("January 2, 1991")
# empty string should also error
with pytest.raises(ValueError):
HijriDateConverter().parse("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Test coverage needs expansion for ISMI dataset formats

The current test cases don't fully cover the date formats found in the ISMI dataset. Specifically:

  • Missing test for Muharram month format (e.g., "6. Muḥarram 888")
  • No tests for dates with diacritical marks (e.g., "Ḏu al-Ḥijjaẗ")
  • No tests for year ranges in Islamic calendar
  • No tests handling the ±2 days ambiguity mentioned in ISMI data

The test file should be expanded to include these real-world use cases from the ISMI dataset to ensure robust handling of actual manuscript dates.

🔗 Analysis chain

Verify test coverage against real-world data.

The PR objectives mention examples from Princeton Geniza Project (PGP) and ISMI datasets. Let's verify if the current test cases cover the date formats found in these datasets.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for Hijri date patterns in the codebase to find more test cases
# Look for files that might contain example dates from PGP and ISMI datasets

# Search for potential test data files
fd -e json -e csv -e txt -e md . | grep -i -E 'test.*data|pgp|ismi|hijri|dates'

# Search for Hijri date patterns in these files
rg -i "hijri|jumada|rajab|ramadan" 

Length of output: 5539


Script:

#!/bin/bash
# Let's check the ISMI sample data to understand the date formats
cat examples/use-cases/ismi/data/ismi-om4-date-samples.json

# Also check the ISMI README for any documentation about date formats
cat examples/use-cases/ismi/README.md

Length of output: 5394

src/undate/converters/calendars/hijri/hijri.lark Outdated Show resolved Hide resolved
src/undate/converters/edtf/edtf.lark Outdated Show resolved Hide resolved
src/undate/converters/calendars/hijri/transformer.py Outdated Show resolved Hide resolved
src/undate/converters/calendars/hijri/transformer.py Outdated Show resolved Hide resolved
src/undate/converters/calendars/hijri/converter.py Outdated Show resolved Hide resolved
rlskoeser and others added 7 commits November 21, 2024 17:08
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@robcast
Copy link
Contributor

robcast commented Nov 22, 2024

Should a month or year in another calendar be represented in a single Undate or should it always be a UndateInterval?

I would find it more elegant if non-Gregorian months or years could also be represented as a single Undate. Does that make other Undate functions too complex? The default constructor assumes Gregorian and it would have to be changed.

Does this change what precision=year means? Is it too weird to have an Undate "year" based on the Gregorian calendar where the Gregorian text representation does not align on a year? Maybe we should have a "non-gregorian" flag for cases when you really care about this.

@rlskoeser
Copy link
Member Author

Should a month or year in another calendar be represented in a single Undate or should it always be a UndateInterval?

I would find it more elegant if non-Gregorian months or years could also be represented as a single Undate. Does that make other Undate functions too complex? The default constructor assumes Gregorian and it would have to be changed.

Does this change what precision=year means? Is it too weird to have an Undate "year" based on the Gregorian calendar where the Gregorian text representation does not align on a year? Maybe we should have a "non-gregorian" flag for cases when you really care about this.

@robcast these are fantastic questions and I like where you are going with it. I had been intending to add a calendar property to Undate so that we could be explicit about what calendar the date is in. Can you help me sketch out what it would look like and how it would operate? I agree that this would be much more elegant and powerful.

Here's what I'm thinking based on what you're proposing: when parsing a Hijri date, instead of converting to Gregorian before creating Undate objects, we initialize the Undate with numeric year, month, day values and a calendar. This allows us to keep them a single Undate instead of an UndateInterval, and also preserves the date precision logic. There are lots of benefits to this approach. If we do that, then would we calculate the earliest/latest dates in the Hijri calendar or Gregorian? Or have options for both? Or maybe the comparison logic needs to be calendar aware and we only do the conversion when it matters or is explicitly requested? I'm leaning towards this last option, but let me know what you think!

@rlskoeser
Copy link
Member Author

rlskoeser commented Nov 22, 2024

decisions from discussion with @robcast :

  • revise this code to test out the new approach
    • add a calendar property to undate (default gregorian)
    • initialize undate object with numeric year, month, day and add optional calendar (default gregorian)
    • when initializing with hijri calendar, convert to gregorian for earliest/latest to compare
    • confirm date precision is set as expected
  • 3-digit year is not a reasonable constraint; choose a format (day month year) and document that constraint

@rlskoeser rlskoeser force-pushed the feature/convert-hijri branch from f5429cd to e91b7ba Compare November 26, 2024 22:27
@rlskoeser
Copy link
Member Author

@robcast I've updated this PR based on the approach we discussed. I'm really happy with how it's working and how powerful this is going to be. I went ahead and made a minimal Gregorian calendar class for consistency with the pieces of information that could be calendar specific.

To see if this will scale, I took a first pass at implementing a calendar converter for the Hebrew Anno Mundi based on the Hijri implementation and with reference to my previous geniza project code. PR for that is here, if you want to see how it builds on this: #108

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (33)
tests/test_undate.py (2)

123-133: Consider adding more edge cases to test_calendar.

The test method covers the basic scenarios well. Consider adding these additional test cases:

  • Testing calendar property with partially known dates
  • Testing calendar inheritance in UndateInterval
  • Testing calendar conversion between different calendar systems
def test_calendar(self):
    # existing tests...
    
    # partially known dates
    assert Undate("19XX", calendar="HIJRI").calendar == Calendar.HIJRI
    
    # calendar inheritance in intervals
    interval = UndateInterval(
        Undate(848, calendar=Calendar.HIJRI),
        Undate(850, calendar=Calendar.HIJRI)
    )
    assert interval.calendar == Calendar.HIJRI
    
    # calendar conversion
    hijri_date = Undate(848, calendar=Calendar.HIJRI)
    gregorian_date = hijri_date.to_calendar(Calendar.GREGORIAN)
    assert gregorian_date.calendar == Calendar.GREGORIAN

571-577: Add error case testing for calendar converter retrieval.

While the happy path is well tested, consider adding error cases:

def test_calendar_get_converter():
    # existing tests...
    
    # Test error cases
    with pytest.raises(ValueError):
        Calendar.get_converter(None)
    
    with pytest.raises(ValueError):
        Calendar.get_converter("invalid_calendar")
src/undate/converters/calendars/hebrew/converter.py (4)

14-14: Typo in class docstring

There is a missing space between "Mundi" and "calendar" in the class docstring. It should be "Converter for Hebrew Anno Mundi calendar."

-    Converter for Hebrew Anno Mundicalendar.
+    Converter for Hebrew Anno Mundi calendar.

29-29: Typo in docstring

The word "convertion" should be corrected to "conversion" in the docstring.

-      additional methods for ``max_month``, ``max_day``, and convertion ``to_gregorian``
+      additional methods for ``max_month``, ``max_day``, and conversion ``to_gregorian``

31-31: Formatting issue in docstring

The inline code markup for __all__ is incorrect. It should be enclosed within double backticks.

-    - Import your calendar in ``undate/converters/calendars/__init__.py`` and include in `__all__``
+    - Import your calendar in ``undate/converters/calendars/__init__.py`` and include in ``__all__``

78-78: Typo in comment

The word "abitrary" should be corrected to "arbitrary" in the comment.

-    # i.e., generate a Hebrew date from an abitrary undate or undate interval?
+    # i.e., generate a Hebrew date from an arbitrary undate or undate interval?
src/undate/converters/base.py (5)

6-6: Typo in docstring

The word "functionaly" should be corrected to "functionality" in the docstring.

-    provides additional functionaly needed for calendar conversion.
+    provides additional functionality needed for calendar conversion.

29-29: Typo in docstring

The word "convertion" should be corrected to "conversion" in the docstring.

-      additional methods for ``max_month``, ``max_day``, and convertion ``to_gregorian``
+      additional methods for ``max_month``, ``max_day``, and conversion ``to_gregorian``

31-31: Formatting issue in docstring

The inline code markup for __all__ is incorrect. It should be enclosed within double backticks.

-    - Import your calendar in ``undate/converters/calendars/__init__.py`` and include in `__all__``
+    - Import your calendar in ``undate/converters/calendars/__init__.py`` and include in ``__all__``

119-120: Typo in docstring

Correct the phrases "available converters classes" to "available converter classes" and "calendar convert subclasses" to "calendar converter subclasses."

-            List of available converters classes. Includes calendar convert
-            subclasses.
+            List of available converter classes. Includes calendar converter subclasses.

126-127: Ensure robust removal of BaseCalendarConverter

To prevent a ValueError if BaseCalendarConverter is not in cls.__subclasses__(), check if it exists before attempting to remove it.

            subclasses = cls.__subclasses__()
-            subclasses.remove(BaseCalendarConverter)
+            if BaseCalendarConverter in subclasses:
+                subclasses.remove(BaseCalendarConverter)
src/undate/undate.py (2)

287-295: Simplify nested if statements

Combine nested if statements into a single condition to improve readability.

-            if looks_equal:
-                # if any part of either date that is known is _partially_ known,
-                # then these dates are not equal
-                if any(
-                    [self.is_partially_known(p) for p in self.initial_values.keys()]
-                ) or any(
-                    [other.is_partially_known(p) for p in other.initial_values.keys()]
-                ):
-                    return False
+            if looks_equal and (
+                any(self.is_partially_known(p) for p in self.initial_values) or
+                any(other.is_partially_known(p) for p in other.initial_values)
+            ):
+                return False
🧰 Tools
🪛 Ruff (0.8.0)

287-294: Use a single if statement instead of nested if statements

(SIM102)


291-291: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


293-293: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


291-293: Simplify dictionary iteration

When iterating over a dictionary's keys, you can omit the .keys() method.

-                    [self.is_partially_known(p) for p in self.initial_values.keys()]
+                    [self.is_partially_known(p) for p in self.initial_values]
-                    [other.is_partially_known(p) for p in other.initial_values.keys()]
+                    [other.is_partially_known(p) for p in other.initial_values]
🧰 Tools
🪛 Ruff (0.8.0)

291-291: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


293-293: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

src/undate/converters/calendars/hebrew/parser.py (1)

8-8: Correct grammatical error in comment

The comment has a grammatical error. It should read "can't be used due to ambiguity between years and dates."

-    # NOTE: LALR parser is faster but can't be used to ambiguity between years and dates
+    # NOTE: LALR parser is faster but can't be used due to ambiguity between years and dates
src/undate/converters/calendars/hebrew/hebrew.lark (1)

8-12: Document planned enhancements.

The TODOs for date ranges and qualifiers are important features that should be tracked.

Would you like me to create GitHub issues to track these enhancements? I can help draft detailed specifications for:

  1. Date range support
  2. Qualifier support (e.g., "first decade of", "first third of", seasons)
src/undate/converters/calendars/hijri/transformer.py (1)

6-9: Enhance HijriUndate documentation.

The docstring should provide more details about the class's purpose and usage.

Consider expanding the docstring:

-    """Undate convience subclass; sets default calendar to Hijri."""
+    """Undate convenience subclass that defaults to the Hijri calendar.
+
+    This class simplifies the creation of Hijri dates by automatically setting
+    the calendar type to Hijri. It inherits all functionality from Undate.
+
+    Example:
+        date = HijriUndate(year=1445, month=7, day=15)
+    """
src/undate/converters/calendars/hebrew/transformer.py (3)

7-7: Fix typo in class docstring

"convience" should be "convenience"

-    """Undate convience subclass; sets default calendar to Hebrew."""
+    """Undate convenience subclass; sets default calendar to Hebrew."""

25-27: Fix incorrect calendar reference in comment

The comment refers to "islamic" calendar instead of Hebrew calendar.

-        # initialize and return an undate with islamic year, month, day and
-        # islamic calendar
+        # initialize and return an undate with Hebrew year, month, day and
+        # Hebrew calendar

16-27: Consider adding input validation for date parts

The method should validate that the year, month, and day values are within valid ranges for the Hebrew calendar before creating the Undate object.

     def hebrew_date(self, items):
         parts = {}
         for child in items:
             if child.data in ["year", "month", "day"]:
                 # in each case we expect one integer value;
                 # anonymous tokens convert to their value and cast as int
                 value = int(child.children[0])
+                # Validate ranges
+                if child.data == "month" and not (1 <= value <= 13):  # Hebrew calendar can have 13 months
+                    raise ValueError(f"Invalid month value: {value}")
+                elif child.data == "day" and not (1 <= value <= 30):  # Max days in Hebrew month
+                    raise ValueError(f"Invalid day value: {value}")
                 parts[str(child.data)] = value
 
         return HebrewUndate(**parts)
tests/test_converters/test_calendars/test_hebrew/test_hebrew_parser.py (1)

9-36: Consider adding more edge cases to test cases

The test coverage is good but could be enhanced with additional edge cases:

  • Boundary cases for days (e.g., testing the last day of each month)
  • Leap year cases
  • Mixed Hebrew/English numerals
 testcases = [
     # existing cases...
+    # Boundary cases
+    "30 Tishri 5783",  # Last day of Tishri
+    "29 Heshvan 5783", # Last day of Heshvan
+    # Leap year
+    "30 Adar I 5784",  # Last day of Adar I in leap year
+    # Mixed numerals
+    "כ״ו Tammuz 4816", # Hebrew day with English month
 ]
src/undate/converters/calendars/gregorian.py (2)

8-9: Fix typo in class docstring

"Calendar onverter" should be "Calendar converter"

-    Calendar onverter class for Gregorian calendar.
+    Calendar converter class for Gregorian calendar.

26-42: Consider caching monthrange results for performance

The max_day method calls monthrange frequently. Consider caching results for common year/month combinations to improve performance.

+    from functools import lru_cache
+
+    @lru_cache(maxsize=128)
+    def _get_monthrange(self, year: int, month: int) -> tuple[int, int]:
+        return monthrange(year, month)

     def max_day(self, year: int, month: int) -> int:
         if month:
             year = year or self.NON_LEAP_YEAR
-            _, max_day = monthrange(year, month)
+            _, max_day = self._get_monthrange(year, month)
         else:
             max_day = 31
         return max_day
tests/test_converters/test_calendars/test_hebrew/test_hebrew_transformer.py (2)

15-29: Enhance test coverage with additional test cases

While the current test cases from Princeton Geniza Project are valuable:

  1. Consider adding edge cases (e.g., leap years, boundary dates)
  2. Add negative test cases for invalid dates
  3. The TODO comment about date ranges should be tracked

Would you like me to:

  1. Generate additional test cases for edge cases?
  2. Create a GitHub issue to track the date ranges support?

32-43: Strengthen test assertions for transformed dates

The current assertions verify equality and precision but could be enhanced:

  1. Add assertions for individual components (year, month, day)
  2. Verify the calendar property for all transformed dates

Example enhancement:

 def test_transform(date_string, expected, expected_precision):
     transformer = HebrewDateTransformer(visit_tokens=True)
     parsetree = hebrew_parser.parse(date_string)
     transformed_date = transformer.transform(parsetree)
     assert transformed_date == expected
+    # Verify individual components
+    if hasattr(transformed_date, 'year'):
+        assert transformed_date.year == expected.year
+    # Always verify calendar
+    assert transformed_date.calendar == Calendar.HEBREW
     if isinstance(transformed_date, Undate):
         assert transformed_date.precision == expected_precision
tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py (1)

38-49: Enhance test assertions for transformed dates.

The test function could be more robust with additional assertions:

 def test_transform(date_string, expected, expected_precision):
     transformer = HijriDateTransformer(visit_tokens=True)
     parsetree = hijri_parser.parse(date_string)
     transformed_date = transformer.transform(parsetree)
     assert transformed_date == expected
     if isinstance(transformed_date, Undate):
         assert transformed_date.precision == expected_precision
+        # Verify calendar type
+        assert transformed_date.calendar == Calendar.HIJRI
+        # Verify valid month range
+        if transformed_date.month is not None:
+            assert 1 <= int(transformed_date.month) <= 12
src/undate/converters/calendars/hijri/converter.py (1)

45-67: Document supported date formats and enhance error handling.

The parse method would benefit from better documentation and validation.

 def parse(self, value: str) -> Union[Undate, UndateInterval]:
     """
     Parse a Hijri date string and return an :class:`~undate.undate.Undate` or
     :class:`~undate.undate.UndateInterval`.
     The Hijri date string is preserved in the undate label.
+
+    Supported formats:
+    - Day Month Year: "7 Jumādā I 1243"
+    - Month Year: "Jumādā I 1243"
+    - Year only: "1243"
+
+    Args:
+        value: A string containing a Hijri date
+
+    Returns:
+        An Undate or UndateInterval object
+
+    Raises:
+        ValueError: If the string is empty or cannot be parsed
     """
src/undate/converters/iso8601.py (1)

80-106: Optimize date formatting logic.

The current implementation has repetitive try-except blocks and could be more efficient.

-            if undate.is_known(date_portion):
-                if date_portion == "year" and undate.year:
-                    try:
-                        date_parts.append("%04d" % int(undate.year))
-                    except ValueError:
-                        date_parts.append(undate.year)
-                elif date_portion == "month" and undate.month:
-                    try:
-                        date_parts.append("%02d" % int(undate.month))
-                    except ValueError:
-                        date_parts.append(undate.month)
-                elif date_portion == "day" and undate.day:
-                    try:
-                        date_parts.append("%02d" % int(undate.day))
-                    except ValueError:
-                        date_parts.append(undate.day)
+            def format_number(value: str, width: int) -> str:
+                try:
+                    return f"%0{width}d" % int(value)
+                except ValueError:
+                    return value
+
+            if undate.is_known(date_portion):
+                value = getattr(undate, date_portion)
+                if value:
+                    width = 4 if date_portion == "year" else 2
+                    date_parts.append(format_number(value, width))
tests/test_converters/test_calendars/test_hebrew/test_hebrew_converter.py (6)

10-34: Consider using pytest.mark.parametrize for better test organization.

The test cases are well-structured, but could be more maintainable using pytest's parametrize feature. This would make it easier to add more test cases and reduce code duplication.

Here's how you could refactor it:

@pytest.mark.parametrize("date_str, expected_date, expected_precision", [
    ("26 Tammuz 4816", HebrewUndate(4816, 4, 26), DatePrecision.DAY),  # July 17, 1056
    ("Ṭevet 5362", HebrewUndate(5362, 10), DatePrecision.MONTH),  # Teveth = month 10
    ("4932", HebrewUndate(4932), DatePrecision.YEAR),
])
def test_parse(self, date_str, expected_date, expected_precision):
    date = HebrewDateConverter().parse(date_str)
    assert date == expected_date
    assert date.calendar == Calendar.HEBREW
    assert date.precision == expected_precision
    assert date.label == f"{date_str} {HebrewDateConverter.calendar_name}"

36-61: Add test cases for leap years in both calendars.

While the current test cases cover basic scenarios well, consider adding test cases for:

  1. Hebrew leap years (שנה מעוברת)
  2. Gregorian leap years
  3. Years where both calendars have leap years

This would ensure robust handling of date conversions across calendar systems.

Would you like me to help generate test cases for these scenarios?


63-77: Add Hebrew calendar-specific error cases.

Consider adding test cases for Hebrew calendar-specific invalid inputs:

  • Invalid month names
  • Out-of-range days for specific months
  • Invalid year formats (e.g., negative years)
  • Hebrew numerals in wrong format

102-103: Fix typo in comment and clarify variable naming.

There's a typo in the comment "monthin" and the variable name last_month is used for two different purposes (semantic last month vs. maximum month number).

-        # of last semantic monthin the year
+        # of last semantic month in the year
         last_month = converter.max_month(year=1243)  # Consider renaming to max_month_num

139-140: Fix typo in comment and consider separating comparison tests.

There's a typo "c ontained" in the comment.

-        # so it falls within or is c ontained by July 1056
+        # so it falls within or is contained by July 1056

144-155: Consider separating sorting test into its own method.

The sorting test at the end of test_compare_across_calendars tests different functionality and would be clearer as a separate test method.

def test_mixed_calendar_sorting(self):
    sorted_dates = sorted([
        HebrewUndate(4816, 4, 26),  # 1056-07-17
        HebrewUndate(5416),  # 1655
        HebrewUndate(500),  # -3261
        Undate(1995),
        Undate(33),
        Undate(1350),
    ])
    expected_gregorian_years = [-3261, 33, 1056, 1350, 1655, 1995]
    assert [d.earliest.year for d in sorted_dates] == expected_gregorian_years
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 778c67b and 333e740.

📒 Files selected for processing (26)
  • .github/workflows/unit_tests.yml (1 hunks)
  • pyproject.toml (1 hunks)
  • src/undate/converters/base.py (3 hunks)
  • src/undate/converters/calendars/__init__.py (1 hunks)
  • src/undate/converters/calendars/gregorian.py (1 hunks)
  • src/undate/converters/calendars/hebrew/__init__.py (1 hunks)
  • src/undate/converters/calendars/hebrew/converter.py (1 hunks)
  • src/undate/converters/calendars/hebrew/hebrew.lark (1 hunks)
  • src/undate/converters/calendars/hebrew/parser.py (1 hunks)
  • src/undate/converters/calendars/hebrew/transformer.py (1 hunks)
  • src/undate/converters/calendars/hijri/converter.py (1 hunks)
  • src/undate/converters/calendars/hijri/hijri.lark (1 hunks)
  • src/undate/converters/calendars/hijri/parser.py (1 hunks)
  • src/undate/converters/calendars/hijri/transformer.py (1 hunks)
  • src/undate/converters/edtf/edtf.lark (1 hunks)
  • src/undate/converters/iso8601.py (1 hunks)
  • src/undate/undate.py (11 hunks)
  • tests/test_converters/edtf/test_edtf_parser.py (2 hunks)
  • tests/test_converters/test_base.py (2 hunks)
  • tests/test_converters/test_calendars/test_hebrew/test_hebrew_converter.py (1 hunks)
  • tests/test_converters/test_calendars/test_hebrew/test_hebrew_parser.py (1 hunks)
  • tests/test_converters/test_calendars/test_hebrew/test_hebrew_transformer.py (1 hunks)
  • tests/test_converters/test_calendars/test_hijri/test_hijri_converter.py (1 hunks)
  • tests/test_converters/test_calendars/test_hijri/test_hijri_parser.py (1 hunks)
  • tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py (1 hunks)
  • tests/test_undate.py (4 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/undate/converters/calendars/hebrew/init.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • src/undate/converters/calendars/init.py
  • src/undate/converters/calendars/hijri/parser.py
  • pyproject.toml
  • src/undate/converters/edtf/edtf.lark
  • tests/test_converters/test_calendars/test_hijri/test_hijri_converter.py
  • src/undate/converters/calendars/hijri/hijri.lark
🧰 Additional context used
🪛 Ruff (0.8.0)
tests/test_converters/test_calendars/test_hijri/test_hijri_parser.py

75-75: pytest.raises(Exception) should be considered evil

(B017)

src/undate/undate.py

192-192: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


287-294: Use a single if statement instead of nested if statements

(SIM102)


291-291: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


293-293: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (16)
tests/test_undate.py (2)

5-8: LGTM: Import changes are appropriate.

The new imports for BaseCalendarConverter and Calendar are correctly added to support the calendar functionality.


30-35: LGTM: Repr test updates correctly verify calendar information.

The test cases appropriately verify that the string representation includes the calendar type for both Gregorian and Hijri calendars.

src/undate/converters/calendars/hebrew/converter.py (1)

24-75: Implementation looks good

The implementation of the HebrewDateConverter class and its methods appears correct and follows best practices.

tests/test_converters/edtf/test_edtf_parser.py (2)

11-11: Addition of valid test case

The test case "1901-02-20" is a valuable addition to the test suite, enhancing coverage for valid date formats.


49-49: Addition of invalid test cases

Adding "1984-00" and "1984-01-00" to error_cases strengthens the parser's error handling for invalid dates.

src/undate/converters/calendars/hebrew/hebrew.lark (1)

49-49: Improve month pattern robustness.

The month patterns for Ṭeveth/Teveth and Adar variations could be more robust:

  1. The Ṭ/T pattern might miss other Unicode variants
  2. Adar patterns might not catch all valid representations

Let's check existing usage patterns:

Consider:

  1. Using Unicode character classes for Ṭ/T
  2. Adding more comprehensive Adar patterns if needed based on actual usage

Also applies to: 52-54

✅ Verification successful

Let me check the grammar rules for Adar in the Hebrew calendar parser to ensure we have complete coverage.


Let me try a case-insensitive search for the grammar file content.


Current month patterns are sufficient based on test coverage

Based on the grammar file and test cases:

  • The /[ṬT]eveth?/ pattern correctly matches both "Tevet" and "Teveth" variants seen in tests
  • Adar patterns /Adar( I)?/ and /Adar (II|Bet)/ successfully handle all test cases including:
    • "Adar"
    • "Adar I"
    • "Adar II"
    • "Adar Bet"

The existing patterns are robust enough to handle all documented use cases and test scenarios. No additional Unicode variants or representations are evidenced in the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for month patterns in test files and documentation
rg -i "adar|tevet" tests/

Length of output: 1063


Script:

#!/bin/bash
# Check the grammar rules for Adar months
rg -A 2 "adar" src/undate/converters/calendars/hebrew/hebrew.lark

Length of output: 65


Script:

#!/bin/bash
# View the complete grammar file to analyze month patterns
cat src/undate/converters/calendars/hebrew/hebrew.lark

Length of output: 1301

src/undate/converters/calendars/hijri/transformer.py (2)

16-27: ⚠️ Potential issue

Add input validation for Hijri date components.

The hijri_date method needs validation for the input values.

This issue was previously identified. The validation is still needed to ensure values are within valid Hijri calendar ranges.


29-33: Decide on the commented year method implementation.

The commented-out year method needs a decision - either implement it or remove the comments.

Let's check if other calendar implementations have this method:

Either implement the method if needed for consistency with other calendar transformers or remove the commented code.

✅ Verification successful

Let me check other calendar transformers specifically to get a complete picture.


Let me check the implementation in the Hebrew calendar transformer to make a complete comparison.


Remove the commented year method implementation

The commented year method should be removed because:

  • Both Hebrew and Hijri calendar transformers have the same commented-out implementation
  • The EDTF transformer has a special implementation to handle negative years
  • The comment accurately explains that the year translation is not needed for these calendar transformers as it would be a no-op
  • Keeping commented code adds unnecessary noise to the codebase
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for year method implementations in other calendar transformers
ast-grep --pattern 'def year(self, items):
  $$$
'

Length of output: 746


Script:

#!/bin/bash
# Search for all calendar transformer files
fd transformer.py src/undate/converters/calendars/

Length of output: 159


Script:

#!/bin/bash
# Check the Hebrew calendar transformer implementation
cat src/undate/converters/calendars/hebrew/transformer.py

Length of output: 1444

tests/test_converters/test_base.py (1)

21-21: Verify the implementation of BaseDateConverter.subclasses()

The change from __subclasses__() to subclasses() suggests better encapsulation, but we should verify the implementation.

Also applies to: 63-63

✅ Verification successful

The change appears to be a wrapper around __subclasses__() with additional logic

The subclasses() method is properly implemented in src/undate/converters/base.py as a classmethod that:

  1. Ensures converters are imported via import_converters()
  2. Gets all direct subclasses using Python's built-in __subclasses__()
  3. Removes BaseCalendarConverter from the results

The change from __subclasses__() to subclasses() is valid and provides better encapsulation of the subclass retrieval logic.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of BaseDateConverter.subclasses()
# Look for the definition of subclasses() method
ast-grep --pattern 'class BaseDateConverter:
  $$$
  def subclasses($$$):
    $$$'

Length of output: 84


Script:

#!/bin/bash
# Let's try a broader search for the BaseDateConverter class and its methods
rg -A 10 "class BaseDateConverter" 

Length of output: 797


Script:

#!/bin/bash
# Let's also search for any method named subclasses
rg "def subclasses"

Length of output: 111


Script:

#!/bin/bash
# Search for files containing BaseDateConverter to understand the context
fd BaseDateConverter

Length of output: 22


Script:

#!/bin/bash
# Let's get the implementation of the subclasses method
rg -A 5 "def subclasses" src/undate/converters/base.py

Length of output: 280


Script:

#!/bin/bash
# Let's get more context of the implementation
rg -B 2 -A 10 "def subclasses" src/undate/converters/base.py

Length of output: 500

tests/test_converters/test_calendars/test_hebrew/test_hebrew_transformer.py (1)

11-12: LGTM! Basic calendar association test is clear and focused.

The test verifies the fundamental requirement that HebrewUndate correctly associates with the Hebrew calendar.

tests/test_converters/test_calendars/test_hijri/test_hijri_parser.py (2)

7-47: LGTM! Comprehensive test cases covering various formats

The test cases thoroughly cover:

  • Different month name variants
  • Dates with/without diacritics
  • Various numeric formats

50-52: Strengthen test assertions for parsed results

The current test only verifies that parsing succeeds. Consider validating the parsed components.

tests/test_converters/test_calendars/test_hijri/test_hijri_transformer.py (2)

11-12: LGTM! Good test for calendar property.

The test effectively verifies that HijriUndate correctly sets the calendar property to HIJRI.


15-35: 🛠️ Refactor suggestion

Add error test cases to complement the happy path.

While the test cases are well-documented with real-world examples, error cases are missing.

Add test cases for:

  • Invalid month names
  • Out-of-range days (e.g., 31st day in a 29-day month)
  • Malformed date strings
  • Invalid year formats
error_testcases = [
    ("0 Muḥarram 1243", ValueError, "Invalid day"),
    ("32 Muḥarram 1243", ValueError, "Invalid day"),
    ("7 InvalidMonth 1243", ValueError, "Invalid month"),
    ("abc", ValueError, "Invalid date format"),
]
tests/test_converters/test_calendars/test_hebrew/test_hebrew_converter.py (2)

1-9: LGTM! Clean imports and class structure.

The imports are well-organized and the class structure follows Python conventions.


1-155: Verify test coverage for edge cases.

The test suite is well-structured and comprehensive. To ensure complete coverage, consider running coverage analysis to identify any untested code paths in the Hebrew date converter implementation.

Comment on lines +192 to +193
raise ValueError(f"Calendar `{calendar}` is not supported")
self.calendar = calendar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Chain exceptions for better traceback

When re-raising an exception with additional context, chain it using from err to preserve the original traceback.

-                    except KeyError:
-                        raise ValueError(f"Calendar `{calendar}` is not supported")
+                    except KeyError as err:
+                        raise ValueError(f"Calendar `{calendar}` is not supported") from err

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.8.0)

192-192: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Comment on lines +11 to +12
branches:
- "**"
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Testing against Python 3.13 alpha may cause unreliable CI results

The workflow is currently testing against Python 3.13, which is in alpha stage. The project has minimal version constraints (only for strenum < 3.11) but testing against pre-release Python versions can lead to:

  • False failures due to pre-release bugs
  • Unstable dependency behaviors with alpha Python releases
  • Increased CI maintenance overhead

Consider:

  • Moving Python 3.13 to an optional test matrix that's allowed to fail
  • Waiting until Python 3.13 reaches RC status before including it in the main test matrix

Example modification:

strategy:
  matrix:
    python: ["3.9", "3.10", "3.11", "3.12"]
  include:
    - python: "3.13"
      continue-on-error: true
🔗 Analysis chain

Consider the stability implications of Python 3.13.

Python 3.13 is currently in alpha/beta stage. While testing against future Python versions is good practice, including it in the test matrix this early might lead to false failures due to pre-release issues.

Let's check the package compatibility:

Consider adding Python 3.13 to an optional test matrix or waiting until it reaches release candidate status.

Also applies to: 24-24

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if dependencies specify Python version constraints
rg "python_version" pyproject.toml

Length of output: 131


Script:

#!/bin/bash
# Let's check the Python versions specified in the test workflow
cat .github/workflows/unit_tests.yml

Length of output: 1373

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a GenAI artifact since its training cutoff date is prior to October 2024? Python 3.13 is not in alpha

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ColeDCrawford, you're absolutely right—Python 3.13 is not in alpha as of December 2024. Including it in the CI test matrix is appropriate. Thank you for pointing out the oversight.


✏️ Learnings added
Learnt from: ColeDCrawford
PR: dh-tech/undate-python#107
File: .github/workflows/unit_tests.yml:11-12
Timestamp: 2024-12-06T16:47:25.988Z
Learning: As of December 2024, Python 3.13 is stable and should be included in CI test matrices without concerns about pre-release instability.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +31 to +32
// months have 29 or 30 days; we do not expect leading zeroes
day: /[1-9]/ | /[12][0-9]/ | /30/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance day pattern validation.

The current day pattern allows some invalid dates (e.g., "29" for months with only 28 days).

Consider implementing month-specific day validation in the transformer:

-day: /[1-9]/ | /[12][0-9]/ | /30/
+day: /[1-9]/ | /[12][0-9]/ | /30/  // Validation in transformer

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 44 to 49
def to_gregorian(self, year, month, day) -> tuple[int, int, int]:
"""Convert a Hijri date, specified by year, month, and day,
to the Gregorian equivalent date. Returns a tuple of year, month, day.
"""

return (year, month, day)
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect docstring and implementation of to_gregorian

The method's docstring suggests it converts from Hijri to Gregorian, but the implementation simply returns the input unchanged. Either:

  1. Remove the Hijri reference if this is intended as a pass-through method, or
  2. Implement actual Hijri to Gregorian conversion if that's the intention
-    def to_gregorian(self, year, month, day) -> tuple[int, int, int]:
-        """Convert a Hijri date, specified by year, month, and day,
-        to the Gregorian equivalent date. Returns a tuple of year, month, day.
-        """
+    def to_gregorian(self, year, month, day) -> tuple[int, int, int]:
+        """Return the input date as-is since it's already in Gregorian format.
+        This method exists to provide a consistent interface with other calendar converters.
+        """

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a correct catch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ColeDCrawford, glad to help. To address this issue, you can update the to_gregorian method's docstring to accurately reflect its functionality since it returns the input date unchanged. Here's the corrected docstring:

def to_gregorian(self, year, month, day) -> tuple[int, int, int]:
    """Return the input date as-is since it's already in Gregorian format.
    This method provides a consistent interface with other calendar converters.
    """
    return (year, month, day)

Alternatively, if the method is intended to perform a conversion from another calendar to the Gregorian calendar, you'll need to implement the appropriate conversion logic within the method.

Let me know if you need any further assistance.

Comment on lines +73 to +76
@pytest.mark.parametrize("date_string", error_cases)
def test_should_error(date_string):
with pytest.raises(Exception):
hijri_parser.parse(date_string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use specific exceptions instead of generic Exception

The generic Exception catch is too broad and was flagged by static analysis (B017).

 @pytest.mark.parametrize("date_string", error_cases)
 def test_should_error(date_string):
-    with pytest.raises(Exception):
+    with pytest.raises((ValueError, lark.exceptions.LarkError)):
         hijri_parser.parse(date_string)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.parametrize("date_string", error_cases)
def test_should_error(date_string):
with pytest.raises(Exception):
hijri_parser.parse(date_string)
@pytest.mark.parametrize("date_string", error_cases)
def test_should_error(date_string):
with pytest.raises((ValueError, lark.exceptions.LarkError)):
hijri_parser.parse(date_string)
🧰 Tools
🪛 Ruff (0.8.0)

75-75: pytest.raises(Exception) should be considered evil

(B017)

Comment on lines +27 to +29
def max_day(self, year: int, month: int) -> int:
"""maximum numeric day for the specified year and month in this calendar"""
return islamic.month_length(year, month)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation for year and month parameters.

The max_day method should validate its inputs before calling islamic.month_length.

 def max_day(self, year: int, month: int) -> int:
+    if not isinstance(year, int) or year < 1:
+        raise ValueError(f"Invalid year: {year}")
+    if not isinstance(month, int) or month < 1 or month > 12:
+        raise ValueError(f"Invalid month: {month}")
     return islamic.month_length(year, month)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def max_day(self, year: int, month: int) -> int:
"""maximum numeric day for the specified year and month in this calendar"""
return islamic.month_length(year, month)
def max_day(self, year: int, month: int) -> int:
"""maximum numeric day for the specified year and month in this calendar"""
if not isinstance(year, int) or year < 1:
raise ValueError(f"Invalid year: {year}")
if not isinstance(month, int) or month < 1 or month > 12:
raise ValueError(f"Invalid month: {month}")
return islamic.month_length(year, month)

Comment on lines +39 to +43
def to_gregorian(self, year: int, month: int, day: int) -> tuple[int, int, int]:
"""Convert a Hijri date, specified by year, month, and day,
to the Gregorian equivalent date. Returns a tuple of year, month, day.
"""
return islamic.to_gregorian(year, month, day)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for conversion failures.

The to_gregorian method should handle potential errors from the convertdate library.

 def to_gregorian(self, year: int, month: int, day: int) -> tuple[int, int, int]:
+    try:
+        if not all(isinstance(x, int) for x in (year, month, day)):
+            raise ValueError("Year, month, and day must be integers")
+        if month < 1 or month > 12:
+            raise ValueError(f"Invalid month: {month}")
+        max_days = self.max_day(year, month)
+        if day < 1 or day > max_days:
+            raise ValueError(f"Invalid day: {day} for month {month}")
         return islamic.to_gregorian(year, month, day)
+    except Exception as e:
+        raise ValueError(f"Failed to convert date: {year}-{month}-{day}") from e

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
src/undate/converters/calendars/gregorian.py (2)

8-8: Fix typo in class docstring

There's a typo in the class docstring: "onverter" should be "converter".

-    Calendar onverter class for Gregorian calendar.
+    Calendar converter class for Gregorian calendar.

11-13: Consider consolidating redundant name attributes

The class defines both name and calendar_name with identical values. Consider using just one attribute to avoid redundancy.

-    #: converter name: Gregorian
-    name: str = "Gregorian"
-    calendar_name: str = "Gregorian"
+    #: calendar name
+    calendar_name: str = "Gregorian"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 333e740 and b7ae594.

📒 Files selected for processing (1)
  • src/undate/converters/calendars/gregorian.py (1 hunks)
🔇 Additional comments (2)
src/undate/converters/calendars/gregorian.py (2)

18-43: LGTM! Well-implemented calendar methods

The calendar methods are well-implemented with proper handling of edge cases:

  • Uses a non-leap year constant when year is unknown
  • Properly calculates maximum days using Python's monthrange
  • Handles unknown year/month scenarios appropriately

44-50: LGTM! Clear implementation and documentation

The to_gregorian method is now well-documented, clearly explaining its pass-through nature and purpose in the converter interface. This addresses the concerns raised in previous reviews.

Copy link

codecov bot commented Dec 7, 2024

Codecov Report

Attention: Patch coverage is 94.06393% with 13 lines in your changes missing coverage. Please review.

Project coverage is 96.50%. Comparing base (3531bb9) to head (759d0c7).
Report is 29 commits behind head on develop.

Files with missing lines Patch % Lines
src/undate/converters/iso8601.py 60.00% 6 Missing ⚠️
src/undate/converters/base.py 80.95% 4 Missing ⚠️
src/undate/undate.py 95.91% 2 Missing ⚠️
src/undate/converters/calendars/gregorian.py 94.44% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #107      +/-   ##
===========================================
- Coverage    97.72%   96.50%   -1.22%     
===========================================
  Files           10       20      +10     
  Lines          483      658     +175     
===========================================
+ Hits           472      635     +163     
- Misses          11       23      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
src/undate/converters/calendars/gregorian.py (2)

7-9: Enhance class documentation

Consider expanding the class documentation to describe its role in the calendar conversion system and its relationship with other converters.

 """
-    Calendar converter class for Gregorian calendar.
+    Calendar converter class for Gregorian calendar that provides calendar-specific
+    functionality and serves as the base calendar for date comparisons across
+    different calendar systems. All other calendar converters convert their dates
+    to Gregorian format for standardized comparison.
 """

11-14: Consider consolidating redundant name attributes

The class has both name and calendar_name with identical values. Consider using a single attribute unless there's a specific need for this distinction.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b7ae594 and 759d0c7.

📒 Files selected for processing (1)
  • src/undate/converters/calendars/gregorian.py (1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/undate/converters/calendars/gregorian.py

[warning] 41-41: src/undate/converters/calendars/gregorian.py#L41
Added line #L41 was not covered by tests

🔇 Additional comments (1)
src/undate/converters/calendars/gregorian.py (1)

45-51: LGTM! The implementation and documentation are correct

The method correctly implements the identity function for Gregorian dates, and the docstring properly explains its purpose in the calendar conversion system.

Comment on lines +27 to +44
def max_day(self, year: int, month: int) -> int:
"""maximum numeric day for the specified year and month in this calendar"""
# if month is known, use that to calculate
if month:
# if year is known, use it; otherwise use a known non-leap year
# (only matters for February)
year = year or self.NON_LEAP_YEAR

# Use monthrange from python builtin calendar module.
# returns first day of the month and number of days in the month
# for the specified year and month.
_, max_day = monthrange(year, month)
else:
# if year and month are unknown, return maximum possible
max_day = 31

return max_day

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Missing test coverage for Gregorian calendar implementation

The codebase lacks dedicated test files for the Gregorian calendar implementation. While other calendars (Hebrew, Hijri) have comprehensive test suites, there are no tests specifically covering the Gregorian calendar's max_day method and its edge cases.

  • Create test file at tests/test_converters/test_calendars/test_gregorian/test_gregorian_converter.py
  • Add test cases for:
    • Regular month/year combinations
    • February in leap and non-leap years
    • Edge case where month is unknown (should return 31)
    • Edge case where year is unknown but month is known
    • Edge case where both year and month are unknown
🔗 Analysis chain

Add test coverage for max_day edge cases

The fallback case where both year and month are unknown (line 41) is not covered by tests.

Let's verify the test coverage:

Would you like me to help generate test cases for this scenario?

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for tests of max_day method
rg -A 5 "def test.*max_day" tests/

Length of output: 36


Script:

#!/bin/bash
# Let's first check if the tests directory exists and find test files
fd "test.*\.py$" tests/

# Then let's search for any test files related to gregorian calendar
fd ".*gregorian.*\.py$" tests/

# Finally, let's look for any test class or function containing 'max_day' or 'maxday' case-insensitive
rg -i "max.?day" tests/

Length of output: 1040


Script:

#!/bin/bash
# Let's check if there are any test files specifically for gregorian calendar
fd "test_gregorian.*\.py$" tests/

# Let's also check the test_base.py as it might contain base calendar tests
cat tests/test_converters/test_base.py

# Let's also look for any calendar test base classes that might contain shared tests
rg -l "class.*TestCase" tests/test_converters/

Length of output: 2391

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 41-41: src/undate/converters/calendars/gregorian.py#L41
Added line #L41 was not covered by tests

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
README.md (4)

143-143: Consider being more explicit about supported calendars

For better clarity, consider listing the specific supported calendars in this line.

-Currently available converters are "ISO8601" and "EDTF" and supported calendars.
+Currently available converters are "ISO8601" and "EDTF", with support for Gregorian, Hijri Islamic, and Anno Mundi Hebrew calendars.

161-163: Enhance calendar documentation with limitations and references

The calendar section provides good overview but could be more comprehensive.

Consider adding:

  1. Any limitations or constraints of calendar support (e.g., supported date ranges)
  2. A direct link to convertdate documentation for each calendar
  3. Information about how precision is handled differently in each calendar
-All `Undate` objects are calendar aware, and date converters include support for parsing and working with dates from other calendars. The Gregorian calendar is used by default; currently `undate` supports the Hijri Islamic calendar and the Anno Mundi Hebrew calendar based on calendar convertion logic implemented in the [convertdate](https://convertdate.readthedocs.io/en/latest/)package. 
+All `Undate` objects are calendar aware, and date converters include support for parsing and working with dates from other calendars. The Gregorian calendar is used by default; currently `undate` supports:
+
+- Hijri Islamic calendar ([convertdate docs](https://convertdate.readthedocs.io/en/latest/modules.html#islamic))
+- Anno Mundi Hebrew calendar ([convertdate docs](https://convertdate.readthedocs.io/en/latest/modules.html#hebrew))
+
+Calendar conversion is implemented using the [convertdate](https://convertdate.readthedocs.io/en/latest/) package. Each calendar supports different date ranges and has specific handling for precision.

165-182: Consider adding more comprehensive calendar examples

The examples effectively demonstrate basic calendar functionality.

Consider adding examples for:

  1. Date intervals across different calendars
  2. Error cases (e.g., dates outside supported ranges)
  3. Converting between calendars
# Example additions:
>>> # Date intervals across calendars
>>> interval = UndateInterval(rajab495, y2k)
>>> interval
<UndateInterval 'Rajab 495 Hijrī/2001' (0495-07/2001)>

>>> # Error handling
>>> try:
...     Undate.parse("Invalid Hijri", "Hijri")
... except ValueError as e:
...     print(f"Error: {e}")

>>> # Converting between calendars
>>> rajab495.to_calendar("Gregorian")
<Undate '1102-04' (Gregorian)>

184-184: Use consistent horizontal rule style

For consistency with other horizontal rules in the document, use dashes instead of asterisks.

-* * * 
+---
🧰 Tools
🪛 Markdownlint (0.37.0)

184-184: Expected: ---; Actual: * * *
Horizontal rule style

(MD035, hr-style)

docs/undate/converters.rst (1)

Line range hint 11-32: Consider adding format descriptions

While the structure is good, consider adding brief descriptions for each format (ISO8601 and EDTF) before their respective module references. This would help users understand the purpose and use cases of each format before diving into the API details.

Example addition:

 ISO8601
 ^^^^^^^
+
+ISO8601 is an international standard for date and time representation,
+providing a widely used format for date exchanges.
+

 Extended Date-Time Format (EDTF)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+EDTF extends ISO8601 to support additional date scenarios including
+uncertain dates, approximate dates, and date ranges.
+
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 759d0c7 and d9fd4ba.

📒 Files selected for processing (2)
  • README.md (2 hunks)
  • docs/undate/converters.rst (2 hunks)
🧰 Additional context used
🪛 Markdownlint (0.37.0)
README.md

184-184: Expected: ---; Actual: * * *
Horizontal rule style

(MD035, hr-style)

🔇 Additional comments (2)
docs/undate/converters.rst (2)

4-6: LGTM: Well-structured overview section

The overview section is properly formatted and provides a logical organization for the documentation.


39-52: Verify module paths for calendar converters

Let's ensure all the referenced module paths exist in the codebase.

✅ Verification successful

Let me try a different approach to verify the module paths.


Let me check one final thing to verify the Gregorian calendar module specifically.


Documentation module paths need correction

The module paths in the documentation need to be updated to match the actual codebase structure:

  • undate.converters.calendars.gregorian is correct (maps to gregorian.py)
  • undate.converters.calendars.hijri.converter is correct (maps to hijri/converter.py)
  • undate.converters.calendars.hebrew.converter is correct (maps to hebrew/converter.py)

All modules exist but are located under the src/ directory, which is the correct Python package structure. The documentation correctly references the import paths rather than the file system paths.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all referenced module paths exist

# Check for the existence of the calendar converter modules
for module in "gregorian" "hijri/converter" "hebrew/converter"; do
  if ! fd -t f "^.*\.py$" "undate/converters/calendars/$module"; then
    echo "Warning: Module path not found: undate/converters/calendars/$module"
  fi
done

Length of output: 1245


Script:

#!/bin/bash
# Let's first check if the base package directory exists and list its structure
fd -t d "undate"

# Then list all Python files recursively under any found undate directory
fd -t f ".*\.py$" -d 4 undate/

# As a backup, let's also check with ripgrep for any imports of these modules
rg -l "from.*undate\.converters\.calendars\.(gregorian|hijri|hebrew)"

Length of output: 925


Script:

#!/bin/bash
# Check for Gregorian calendar module
fd -t f ".*\.py$" src/undate/converters/calendars/

# Also search for any Gregorian-related imports or definitions
rg -l "gregorian" src/undate/converters/calendars/

Length of output: 792

Comment on lines +33 to +52
Calendars
---------

Gregorian
^^^^^^^^^

.. automodule:: undate.converters.calendars.gregorian
:members:

Hijri (Islamic calendar)
^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: undate.converters.calendars.hijri.converter
:members:

Anno Mundi (Hebrew calendar)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: undate.converters.calendars.hebrew.converter
:members:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance calendar documentation based on PR discussions

Based on the PR objectives and discussions about calendar representation, consider adding documentation that addresses:

  1. How different calendars handle precision (year, month, day)
  2. The relationship between calendar types and Undate/UndateInterval
  3. How calendar conversions work between different systems

Example addition:

 Calendars
 ---------
+
+Calendar support in undate handles different dating systems while preserving
+precision and maintaining consistency in date representations. Each calendar
+type is represented as a property of the Undate class, allowing for:
+
+- Explicit calendar type specification
+- Precision preservation (year, month, day)
+- Consistent date comparisons across calendar systems
+

 Gregorian
 ^^^^^^^^^
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Calendars
---------
Gregorian
^^^^^^^^^
.. automodule:: undate.converters.calendars.gregorian
:members:
Hijri (Islamic calendar)
^^^^^^^^^^^^^^^^^^^^^^^^
.. automodule:: undate.converters.calendars.hijri.converter
:members:
Anno Mundi (Hebrew calendar)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. automodule:: undate.converters.calendars.hebrew.converter
:members:
Calendars
---------
Calendar support in undate handles different dating systems while preserving
precision and maintaining consistency in date representations. Each calendar
type is represented as a property of the Undate class, allowing for:
- Explicit calendar type specification
- Precision preservation (year, month, day)
- Consistent date comparisons across calendar systems
Gregorian
^^^^^^^^^
.. automodule:: undate.converters.calendars.gregorian
:members:
Hijri (Islamic calendar)
^^^^^^^^^^^^^^^^^^^^^^^^
.. automodule:: undate.converters.calendars.hijri.converter
:members:
Anno Mundi (Hebrew calendar)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. automodule:: undate.converters.calendars.hebrew.converter
:members:

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/test_converters/test_base.py (1)

67-76: Add class docstring to explain the purpose

The test implementation looks good and provides thorough coverage of the base calendar converter's abstract methods. Consider adding a class docstring to explain its purpose and relationship to the calendar conversion functionality.

Add this docstring:

 class TestBaseCalendarConverter:
+    """Test cases for BaseCalendarConverter abstract class.
+    
+    Ensures that all abstract methods raise NotImplementedError when not implemented
+    by subclasses. These methods must be implemented by specific calendar converters
+    like HijriDateConverter.
+    """
     def test_not_implemented(self):
src/undate/converters/edtf/transformer.py (1)

62-64: Consider reducing code duplication with month_unspecified.

Both month_unspecified and day_unspecified share identical implementation. Consider extracting this logic into a private helper method.

+    def _unspecified_component(self, items, component_type):
+        # combine multiple parts into a single string
+        value = "".join(self.get_values(items))
+        return Tree(data=component_type, children=[value])
+
     def month_unspecified(self, items):
-        # combine multiple parts into a single string
-        value = "".join(self.get_values(items))
-        return Tree(data="month", children=[value])
+        return self._unspecified_component(items, "month")

     def day_unspecified(self, items):
-        # combine multiple parts into a single string
-        value = "".join(self.get_values(items))
-        return Tree(data="day", children=[value])
+        return self._unspecified_component(items, "day")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9fd4ba and 4372b23.

📒 Files selected for processing (2)
  • src/undate/converters/edtf/transformer.py (1 hunks)
  • tests/test_converters/test_base.py (3 hunks)
🔇 Additional comments (4)
tests/test_converters/test_base.py (2)

4-4: LGTM!

The import of BaseCalendarConverter is appropriately placed and necessary for the new test class.


21-21: Verify the base class method change impact

The change from __subclasses__() to subclasses() suggests a refactor in the base class.

Let's verify the implementation and usage:

Also applies to: 63-63

src/undate/converters/edtf/transformer.py (2)

57-59: LGTM! Clear and accurate documentation.

The added comment clearly explains the method's purpose of combining multiple parts into a single string.


69-69: Verify default transformation behavior for years.

Since the explicit year method was removed in favor of default transformation, please verify that:

  1. Negative years are correctly handled
  2. The behavior matches the EDTF specification

Let's check for test cases that verify year handling:

✅ Verification successful

Based on the search results, I can see comprehensive test coverage for year handling in the EDTF transformer, including negative years. The tests in test_edtf_transformer.py and the example notebook edtf-support.ipynb demonstrate that:

  1. Negative years are correctly handled:
    • Test case "-1985" verifies basic negative year support
    • Test case "Y-170000002" verifies extended negative year support
  2. The behavior matches the EDTF specification:
    • Level 0 supports basic year format
    • Level 1 adds support for negative years and extended year format with 'Y' prefix
    • All test cases pass with the default transformation

Default transformation correctly handles year values including negative years.

The removal of the explicit year method in favor of default transformation is verified to be working correctly through extensive test coverage.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Look for test cases that verify year transformation behavior

# Search for year-related test cases
rg -A 5 "def test.*year" test/
# Search for negative year test cases specifically
rg -A 5 "[-]?\d{1,4}[-]" test/

Length of output: 157


Script:

#!/bin/bash
# Let's first locate the test files using fd
fd -e py "test" .

# Then search for year-related test cases in all Python files
rg -A 5 "def test.*year" .

# Search for test assertions or examples with negative years
rg -A 5 "[-]\d{1,4}" .

# Look for EDTF date parsing tests
rg -A 5 "edtf.*test" .

Length of output: 83843

@rlskoeser rlskoeser merged commit a33e43b into develop Dec 20, 2024
12 checks passed
@rlskoeser rlskoeser deleted the feature/convert-hijri branch December 20, 2024 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants