Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical reverse transform may crash with ValueError for certain dtypes (int64) #755

Merged
merged 11 commits into from
Jan 22, 2024

Conversation

R-Palazzo
Copy link
Contributor

CU-86ayz7xvf
Resolve #747

@R-Palazzo R-Palazzo requested a review from a team as a code owner January 17, 2024 16:42
@sdv-team
Copy link
Contributor

@R-Palazzo R-Palazzo removed the request for review from a team January 17, 2024 16:43
Copy link
Member

@fealho fealho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also copy/paste the example from the issue?

@@ -50,6 +50,7 @@ def __init__(self, order_by=None):
)

self.order_by = order_by
self._is_integer = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize with None.

@@ -333,6 +342,7 @@ def __init__(self, add_noise=False):
)
super().__init__()
self.add_noise = add_noise
self._is_integer = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

@@ -711,6 +723,7 @@ def __init__(self, add_noise=False, order_by=None):
)

self.order_by = order_by
self._is_integer = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

@@ -516,6 +527,7 @@ def _reverse_transform(self, data):
Returns:
pandas.Series
"""
check_nan_in_transform(data, self._is_integer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assign it to something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FrequencyEncoder doesn't need the output of check_nan_in_transform because it converts to float somewhere else. The function is just call to raise the warning if we are in the situation of the issue

@R-Palazzo
Copy link
Contributor Author

The example from the issue now:
Screenshot 2024-01-17 at 17 26 42

@R-Palazzo R-Palazzo requested a review from fealho January 17, 2024 17:28
Comment on lines 200 to 203
if convert_to_float:
return result.astype(float)

return result.astype(self.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of storing if the type needs to be converted to float we can just wrap the conversion in a try. If it fails, then we can see if the dtype is int and convert to float. Otherwise we just error. Then the check_nan_in_transform function can just be used to warn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice, I made the change

@R-Palazzo R-Palazzo requested a review from amontanez24 January 19, 2024 15:22
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (de011e2) 100.00% compared to head (779fc5e) 100.00%.
Report is 2 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #755   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           18        18           
  Lines         1959      1991   +32     
=========================================
+ Hits          1959      1991   +32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

"""
# Setup
data_int_with_nan = pd.Series([1.0, 2.0, np.nan, 4.0, 5.0])
data_not_convetible = pd.Series(['a', 'b', 'c', 'd', 'e'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: convetible -> convertible

@R-Palazzo R-Palazzo merged commit deb0f74 into main Jan 22, 2024
45 checks passed
@R-Palazzo R-Palazzo deleted the issue-747-categorical-nans branch January 22, 2024 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Categorical reverse transform may crash with ValueError for certain dtypes (int64)
5 participants