Skip to content

Commit

Permalink
Merge pull request #107 from dh-tech/feature/convert-hijri
Browse files Browse the repository at this point in the history
Add support for converting from Hijri calendar to undate and undate interval
  • Loading branch information
rlskoeser authored Dec 20, 2024
2 parents 57f8f66 + 4372b23 commit a33e43b
Show file tree
Hide file tree
Showing 31 changed files with 1,265 additions and 118 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ on:
- 'undate/**'
- 'tests/**'
pull_request:
branches:
- "**"

env:
# python version used to calculate and submit code coverage
Expand Down
29 changes: 28 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ An `UndateInterval` is a date range between two `Undate` objects. Intervals can
```

You can initialize `Undate` or `UndateInterval` objects by parsing a date string with a specific converter, and you can also output an `Undate` object in those formats.
Available converters are "ISO8601" and "EDTF" (but only)
Currently available converters are "ISO8601" and "EDTF" and supported calendars.

```python
>>> from undate import Undate
Expand All @@ -156,6 +156,33 @@ Available converters are "ISO8601" and "EDTF" (but only)
<UndateInterval 1800/1900>
```

### Calendars

All `Undate` objects are calendar aware, and date converters include support for parsing and working with dates from other calendars. The Gregorian calendar is used by default; currently `undate` supports the Hijri Islamic calendar and the Anno Mundi Hebrew calendar based on calendar convertion logic implemented in the [convertdate](https://convertdate.readthedocs.io/en/latest/)package.

Dates are stored with the year, month, day and appropriate precision for the original calendar; internally, earliest and latest dates are calculated in Gregorian / Proleptic Gregorian calendar for standardized comparison across dates from different calendars.

```python
>>> from undate import Undate
>>> tammuz4816 = Undate.parse("26 Tammuz 4816", "Hebrew")
>>> tammuz4816
<Undate '26 Tammuz 4816 Anno Mundi' 4816-04-26 (Hebrew)>
>>> rajab495 = Undate.parse("Rajab 495", "Hijri")
>>> rajab495
<Undate 'Rajab 495 Hijrī' 0495-07 (Hijri)>
>>> y2k = Undate.parse("2001", "EDTF")
>>> y2k
<Undate 2001 (Gregorian)>
>>> [str(d.earliest) for d in [rajab495, tammuz4816, y2k]]
['1102-04-28', '1056-07-17', '2001-01-01']
>>> [str(d.precision) for d in [rajab495, tammuz4816, y2k]]
['MONTH', 'DAY', 'YEAR']
>>> sorted([rajab495, tammuz4816, y2k])
[<Undate '26 Tammuz 4816 Anno Mundi' 4816-04-26 (Hebrew)>, <Undate 'Rajab 495 Hijrī' 0495-07 (Hijri)>, <Undate 2001 (Gregorian)>]
```

* * *

For more examples, refer to the [example notebooks](https://github.com/dh-tech/undate-python/tree/main/examples/notebooks/) included in this repository.

## Documentation
Expand Down
35 changes: 29 additions & 6 deletions docs/undate/converters.rst
Original file line number Diff line number Diff line change
@@ -1,19 +1,25 @@
Converters
==========

Overview
--------

.. automodule:: undate.converters.base
:members:
:undoc-members:

Formats
--------

ISO8601
-------
^^^^^^^

.. automodule:: undate.converters.iso8601
:members:
:undoc-members:

Extended Date-Time Format (EDTF)
--------------------------------
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: undate.converters.edtf.converter
:members:
Expand All @@ -23,8 +29,25 @@ Extended Date-Time Format (EDTF)
:members:
:undoc-members:

.. transformer is more of an internal, probably doesn't make sense to include
.. .. automodule:: undate.converters.edtf.transformer
.. :members:
.. :undoc-members:

Calendars
---------

Gregorian
^^^^^^^^^

.. automodule:: undate.converters.calendars.gregorian
:members:

Hijri (Islamic calendar)
^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: undate.converters.calendars.hijri.converter
:members:

Anno Mundi (Hebrew calendar)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: undate.converters.calendars.hebrew.converter
:members:

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ readme = "README.md"
license = { text = "Apache-2" }
requires-python = ">= 3.9"
dynamic = ["version"]
dependencies = ["lark", "numpy"]
dependencies = ["lark[interegular]", "numpy", "convertdate", "strenum; python_version < '3.11'"]
authors = [
{ name = "Rebecca Sutton Koeser" },
{ name = "Cole Crawford" },
Expand Down
79 changes: 74 additions & 5 deletions src/undate/converters/base.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
"""
:class:`undate.converters.BaseDateConverter` provides a base class for
:class:`~undate.converters.BaseDateConverter` provides a base class for
implementing date converters, which can provide support for
parsing and generating dates in different formats and also converting
dates between different calendars.
parsing and generating dates in different formats.
The converter subclass :class:`undate.converters.BaseCalendarConverter`
provides additional functionaly needed for calendar conversion.
To add support for a new date format or calendar conversion:
To add support for a new date converter:
- Create a new file under ``undate/converters/``
- For converters with sufficient complexity, you may want to create a submodule;
Expand All @@ -18,6 +19,26 @@
The new subclass should be loaded automatically and included in the converters
returned by :meth:`BaseDateConverter.available_converters`
To add support for a new calendar converter:
- Create a new file under ``undate/converters/calendars/``
- For converters with sufficient complexity, you may want to create a submodule;
see ``undate.converters.calendars.hijri`` for an example.
- Extend ``BaseCalendarConverter`` and implement ``parse`` and ``to_string``
formatter methods as desired/appropriate for your converter as well as the
additional methods for ``max_month``, ``max_day``, and convertion ``to_gregorian``
calendar.
- Import your calendar in ``undate/converters/calendars/__init__.py`` and include in `__all__``
- Add unit tests for the new calendar logic under ``tests/test_converters/calendars/``
- Add the new calendar to the ``Calendar`` enum of supported calendars in
``undate/undate.py`` and confirm that the `get_converter` method loads your
calendar converter correctly (an existing unit test should cover this).
- Consider creating a notebook to demonstrate the use of the calendar
converter.
Calendar converter subclasses are also automatically loaded and included
in the list of available converters.
-------------------
"""

Expand Down Expand Up @@ -90,6 +111,54 @@ def available_converters(cls) -> Dict[str, Type["BaseDateConverter"]]:
"""
Dictionary of available converters keyed on name.
"""
return {c.name: c for c in cls.subclasses()} # type: ignore

@classmethod
def subclasses(cls) -> list[Type["BaseDateConverter"]]:
"""
List of available converters classes. Includes calendar convert
subclasses.
"""
# ensure undate converters are imported
cls.import_converters()
return {c.name: c for c in cls.__subclasses__()} # type: ignore

# find all direct subclasses, excluding base calendar converter
subclasses = cls.__subclasses__()
subclasses.remove(BaseCalendarConverter)
# add all subclasses of calendar converter base class
subclasses.extend(BaseCalendarConverter.__subclasses__())
return subclasses


class BaseCalendarConverter(BaseDateConverter):
"""Base class for calendar converters, with additional methods required
for calendars."""

#: Converter name. Subclasses must define a unique name.
name: str = "Base Calendar Converter"

def min_month(self) -> int:
"""Smallest numeric month for this calendar."""
raise NotImplementedError

def max_month(self, year: int) -> int:
"""Maximum numeric month for this calendar"""
raise NotImplementedError

def first_month(self) -> int:
"""first month in this calendar; by default, returns :meth:`min_month`."""
return self.min_month()

def last_month(self, year: int) -> int:
"""last month in this calendar; by default, returns :meth:`max_month`."""
return self.max_month(year)

def max_day(self, year: int, month: int) -> int:
"""maximum numeric day for the specified year and month in this calendar"""
raise NotImplementedError

def to_gregorian(self, year, month, day) -> tuple[int, int, int]:
"""Convert a date for this calendar specified by numeric year, month, and day,
into the Gregorian equivalent date. Should return a tuple of year, month, day.
"""
raise NotImplementedError
5 changes: 5 additions & 0 deletions src/undate/converters/calendars/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from undate.converters.calendars.gregorian import GregorianDateConverter
from undate.converters.calendars.hijri import HijriDateConverter
from undate.converters.calendars.hebrew import HebrewDateConverter

__all__ = ["HijriDateConverter", "GregorianDateConverter", "HebrewDateConverter"]
51 changes: 51 additions & 0 deletions src/undate/converters/calendars/gregorian.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
from calendar import monthrange

from undate.converters.base import BaseCalendarConverter


class GregorianDateConverter(BaseCalendarConverter):
"""
Calendar converter class for Gregorian calendar.
"""

#: converter name: Gregorian
name: str = "Gregorian"
#: calendar
calendar_name: str = "Gregorian"

#: known non-leap year
NON_LEAP_YEAR: int = 2022

def min_month(self) -> int:
"""First month for the Gregorian calendar."""
return 1

def max_month(self, year: int) -> int:
"""maximum numeric month for the specified year in the Gregorian calendar"""
return 12

def max_day(self, year: int, month: int) -> int:
"""maximum numeric day for the specified year and month in this calendar"""
# if month is known, use that to calculate
if month:
# if year is known, use it; otherwise use a known non-leap year
# (only matters for February)
year = year or self.NON_LEAP_YEAR

# Use monthrange from python builtin calendar module.
# returns first day of the month and number of days in the month
# for the specified year and month.
_, max_day = monthrange(year, month)
else:
# if year and month are unknown, return maximum possible
max_day = 31

return max_day

def to_gregorian(self, year, month, day) -> tuple[int, int, int]:
"""Convert to Gregorian date. This returns the specified by year, month,
and day unchanged, but is provided for consistency since all calendar
converters need to support conversion to Gregorian calendar for
a common point of comparison.
"""
return (year, month, day)
3 changes: 3 additions & 0 deletions src/undate/converters/calendars/hebrew/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from undate.converters.calendars.hebrew.converter import HebrewDateConverter

__all__ = ["HebrewDateConverter"]
78 changes: 78 additions & 0 deletions src/undate/converters/calendars/hebrew/converter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
from typing import Union

from convertdate import hebrew # type: ignore
from lark.exceptions import UnexpectedCharacters

from undate.converters.base import BaseCalendarConverter
from undate.converters.calendars.hebrew.parser import hebrew_parser
from undate.converters.calendars.hebrew.transformer import HebrewDateTransformer
from undate.undate import Undate, UndateInterval


class HebrewDateConverter(BaseCalendarConverter):
"""
Converter for Hebrew Anno Mundicalendar.
Support for parsing Anno Mundi dates and converting to Undate and UndateInterval
objects in the Gregorian calendar.
"""

#: converter name: Hebrew
name: str = "Hebrew"
calendar_name: str = "Anno Mundi"

def __init__(self):
self.transformer = HebrewDateTransformer()

def min_month(self) -> int:
"""Smallest numeric month for this calendar."""
return 1

def max_month(self, year: int) -> int:
"""Maximum numeric month for this calendar. In Hebrew calendar, this is 12 or 13
depending on whether it is a leap year."""
return hebrew.year_months(year)

def first_month(self) -> int:
"""First month in this calendar. The Hebrew civil year starts in Tishri."""
return hebrew.TISHRI

def last_month(self, year: int) -> int:
"""Last month in this calendar. Hebrew civil year starts in Tishri,
Elul is the month before Tishri."""
return hebrew.ELUL

def max_day(self, year: int, month: int) -> int:
"""maximum numeric day for the specified year and month in this calendar"""
# NOTE: unreleased v2.4.1 of convertdate standardizes month_days to month_length
return hebrew.month_days(year, month)

def to_gregorian(self, year: int, month: int, day: int) -> tuple[int, int, int]:
"""Convert a Hebrew date, specified by year, month, and day,
to the Gregorian equivalent date. Returns a tuple of year, month, day.
"""
return hebrew.to_gregorian(year, month, day)

def parse(self, value: str) -> Union[Undate, UndateInterval]:
"""
Parse a Hebrew date string and return an :class:`~undate.undate.Undate` or
:class:`~undate.undate.UndateInterval`.
The Hebrew date string is preserved in the undate label.
"""
if not value:
raise ValueError("Parsing empty string is not supported")

# parse the input string, then transform to undate object
try:
# parse the string with our Hebrew date parser
parsetree = hebrew_parser.parse(value)
# transform the parse tree into an undate or undate interval
undate_obj = self.transformer.transform(parsetree)
# set the original date as a label, with the calendar name
undate_obj.label = f"{value} {self.calendar_name}"
return undate_obj
except UnexpectedCharacters as err:
raise ValueError(f"Could not parse '{value}' as a Hebrew date") from err

# do we need to support conversion the other direction?
# i.e., generate a Hebrew date from an abitrary undate or undate interval?
Loading

0 comments on commit a33e43b

Please sign in to comment.