Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DatetimeOrdinal #818

Open
michaelrussell4 opened this issue Oct 16, 2024 · 4 comments
Open

Add DatetimeOrdinal #818

michaelrussell4 opened this issue Oct 16, 2024 · 4 comments

Comments

@michaelrussell4
Copy link

It'd be nice to have the ability to convert datetime columns to ordinal values. The DatetimeFeatures is nice for when one wants to extract the year, month, day, etc., but sometimes it's desirable to have the date simply as an ordinal value.

Here's an example I've used in code before:

class DatetimeOrdinal(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X.apply(lambda x: x.map(pd.Timestamp.toordinal, na_action='ignore'))

    def inverse_transform(self, X):
        return X.apply(lambda x: x.map(pd.Timestamp.fromordinal, na_action='ignore'))
@solegalli
Copy link
Collaborator

Thank you @michaelrussell4

I wasn't aware of this functionality.

When would it be useful to map dates to ordinal numbers? Do I understand correctly that the cardinality of the variable will still be high after this representation?

@michaelrussell4
Copy link
Author

Here are some reasons why this method can be useful:

  • Preserves Temporal Distance: Retains the actual time intervals between dates, useful for models that benefit from continuity.
  • Simplified Feature Space: Reduces complexity by converting datetime into a single numerical value.
  • Effective for Linear Trends: Ideal for models that assume linear relationships over time, such as regression.
  • Captures Long-Term Trends: Better suited for datasets where long-term changes are more relevant than short-term cycles.

The high cardinality of this method does have to be accounted for, probably best with a max-min normalisation (British spelling just for you 😉).

This technique isn't as popular as the DatetimeFeature extraction you currently have but I think it'd be worth considering adding.

@solegalli
Copy link
Collaborator

Thank you @michaelrussell4

We'll keep it on the radar :)

If you want to give it a go, you are welcome!

@michaelrussell4
Copy link
Author

I surely will if I get a chance. Thanks for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants