-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update _get_is_datetime
constraint logic to use the metadata
#1732
Update _get_is_datetime
constraint logic to use the metadata
#1732
Conversation
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## issue-1692-inequality-error #1732 +/- ##
===============================================================
+ Coverage 97.07% 97.14% +0.07%
===============================================================
Files 48 48
Lines 4506 4522 +16
===============================================================
+ Hits 4374 4393 +19
+ Misses 132 129 -3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the issue is to use the provided datetime format if applicable. I think the code in #1692 would still fail right now. Either way, can we add an integration test that checks that the datetime format is indeed being used to convert weird datetime columns during the constraints?
@amontanez24 I already tested that specific integration test in the other PR, but I can add an additional test for the datetime format as well |
I thought part of the problem was that we weren't converting the datetime properly because we didn't use the format but I guess I remembered it wrong. Idk if that's still a potential problem that could happen with oddly formed datetime strings though |
@amontanez24 The issue 1692 is not really related to the datetime_format, it was just misdiagnosed as such in the beginning, where the true issue was misidentifying the datetime column sdtype. To assuage your concerns about the datatime_format, we already verify it is valid. pandas_datetime_format = datetime_format.replace('%-', '%')
datetime_column = pd.to_datetime(
column,
errors='coerce',
format=pandas_datetime_format
)
valid = pd.isna(column) | ~pd.isna(datetime_column)
return set(column[~valid]) If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I see. No, I'm not concerned about the format being valid. I'm saying that sometimes when converting a string column to datetime with the pd.to_datetime() function, it fails if you don't provide the format. So if we make that call at all in the constraints, I think we should pass the format there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍🏻
CU-86aytb6w0, Resolve #1692