Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update _get_is_datetime constraint logic to use the metadata #1732

Merged
merged 4 commits into from
Jan 11, 2024

Conversation

fealho
Copy link
Member

@fealho fealho commented Jan 5, 2024

CU-86aytb6w0, Resolve #1692

@sdv-team
Copy link
Contributor

sdv-team commented Jan 5, 2024

@codecov-commenter
Copy link

codecov-commenter commented Jan 5, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (f25037b) 97.07% compared to head (48ac779) 97.14%.

❗ Current head 48ac779 differs from pull request most recent head f360d98. Consider uploading reports for the commit f360d98 to get more accurate results

Files Patch % Lines
sdv/constraints/tabular.py 95.45% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@                       Coverage Diff                       @@
##           issue-1692-inequality-error    #1732      +/-   ##
===============================================================
+ Coverage                        97.07%   97.14%   +0.07%     
===============================================================
  Files                               48       48              
  Lines                             4506     4522      +16     
===============================================================
+ Hits                              4374     4393      +19     
+ Misses                             132      129       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fealho fealho changed the base branch from main to issue-1692-inequality-error January 5, 2024 23:14
@fealho fealho marked this pull request as ready for review January 5, 2024 23:16
@fealho fealho requested a review from a team as a code owner January 5, 2024 23:16
@fealho fealho requested review from frances-h, amontanez24 and pvk-developer and removed request for a team and frances-h January 5, 2024 23:16
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the issue is to use the provided datetime format if applicable. I think the code in #1692 would still fail right now. Either way, can we add an integration test that checks that the datetime format is indeed being used to convert weird datetime columns during the constraints?

@fealho
Copy link
Member Author

fealho commented Jan 9, 2024

Part of the issue is to use the provided datetime format if applicable. I think the code in #1692 would still fail right now. Either way, can we add an integration test that checks that the datetime format is indeed being used to convert weird datetime columns during the constraints?

@amontanez24 I already tested that specific integration test in the other PR, but I can add an additional test for the datetime format as well

@amontanez24
Copy link
Contributor

Part of the issue is to use the provided datetime format if applicable. I think the code in #1692 would still fail right now. Either way, can we add an integration test that checks that the datetime format is indeed being used to convert weird datetime columns during the constraints?

@amontanez24 I already tested that specific integration test in the other PR, but I can add an additional test for the datetime format as well

I thought part of the problem was that we weren't converting the datetime properly because we didn't use the format but I guess I remembered it wrong. Idk if that's still a potential problem that could happen with oddly formed datetime strings though

@fealho
Copy link
Member Author

fealho commented Jan 9, 2024

Part of the issue is to use the provided datetime format if applicable. I think the code in #1692 would still fail right now. Either way, can we add an integration test that checks that the datetime format is indeed being used to convert weird datetime columns during the constraints?

@amontanez24 I already tested that specific integration test in the other PR, but I can add an additional test for the datetime format as well

I thought part of the problem was that we weren't converting the datetime properly because we didn't use the format but I guess I remembered it wrong. Idk if that's still a potential problem that could happen with oddly formed datetime strings though

@amontanez24 The issue 1692 is not really related to the datetime_format, it was just misdiagnosed as such in the beginning, where the true issue was misidentifying the datetime column sdtype.

To assuage your concerns about the datatime_format, we already verify it is valid. fit calls validate, which in turn calls validate_datetime_format:

    pandas_datetime_format = datetime_format.replace('%-', '%')
    datetime_column = pd.to_datetime(
        column,
        errors='coerce',
        format=pandas_datetime_format
    )
    valid = pd.isna(column) | ~pd.isna(datetime_column)

    return set(column[~valid])

If datetime_format doesn't match the column, an error will be raised during the validation step.

@fealho fealho requested a review from amontanez24 January 9, 2024 14:04
Copy link
Member

@pvk-developer pvk-developer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@amontanez24
Copy link
Contributor

Part of the issue is to use the provided datetime format if applicable. I think the code in #1692 would still fail right now. Either way, can we add an integration test that checks that the datetime format is indeed being used to convert weird datetime columns during the constraints?

@amontanez24 I already tested that specific integration test in the other PR, but I can add an additional test for the datetime format as well

I thought part of the problem was that we weren't converting the datetime properly because we didn't use the format but I guess I remembered it wrong. Idk if that's still a potential problem that could happen with oddly formed datetime strings though

@amontanez24 The issue 1692 is not really related to the datetime_format, it was just misdiagnosed as such in the beginning, where the true issue was misidentifying the datetime column sdtype.

To assuage your concerns about the datatime_format, we already verify it is valid. fit calls validate, which in turn calls validate_datetime_format:

    pandas_datetime_format = datetime_format.replace('%-', '%')
    datetime_column = pd.to_datetime(
        column,
        errors='coerce',
        format=pandas_datetime_format
    )
    valid = pd.isna(column) | ~pd.isna(datetime_column)

    return set(column[~valid])

If datetime_format doesn't match the column, an error will be raised during the validation step.

I see. No, I'm not concerned about the format being valid. I'm saying that sometimes when converting a string column to datetime with the pd.to_datetime() function, it fails if you don't provide the format. So if we make that call at all in the constraints, I think we should pass the format there

@fealho fealho requested a review from R-Palazzo January 10, 2024 18:13
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good!

@fealho fealho requested review from pvk-developer and removed request for R-Palazzo January 10, 2024 20:37
Copy link
Member

@pvk-developer pvk-developer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍🏻

@fealho fealho merged commit 69a9638 into issue-1692-inequality-error Jan 11, 2024
37 checks passed
@fealho fealho deleted the issue-1692-update-constraints branch January 11, 2024 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

InvalidDataError for Inequality constraint (even though data is valid)
5 participants