-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Categorical reverse transform may crash with ValueError
for certain dtypes (int64)
#755
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also copy/paste the example from the issue?
rdt/transformers/categorical.py
Outdated
@@ -50,6 +50,7 @@ def __init__(self, order_by=None): | |||
) | |||
|
|||
self.order_by = order_by | |||
self._is_integer = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize with None.
rdt/transformers/categorical.py
Outdated
@@ -333,6 +342,7 @@ def __init__(self, add_noise=False): | |||
) | |||
super().__init__() | |||
self.add_noise = add_noise | |||
self._is_integer = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
rdt/transformers/categorical.py
Outdated
@@ -711,6 +723,7 @@ def __init__(self, add_noise=False, order_by=None): | |||
) | |||
|
|||
self.order_by = order_by | |||
self._is_integer = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
rdt/transformers/categorical.py
Outdated
@@ -516,6 +527,7 @@ def _reverse_transform(self, data): | |||
Returns: | |||
pandas.Series | |||
""" | |||
check_nan_in_transform(data, self._is_integer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assign it to something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FrequencyEncoder doesn't need the output of check_nan_in_transform
because it converts to float somewhere else. The function is just call to raise the warning if we are in the situation of the issue
rdt/transformers/categorical.py
Outdated
if convert_to_float: | ||
return result.astype(float) | ||
|
||
return result.astype(self.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of storing if the type needs to be converted to float we can just wrap the conversion in a try. If it fails, then we can see if the dtype is int and convert to float. Otherwise we just error. Then the check_nan_in_transform
function can just be used to warn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the advice, I made the change
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #755 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 18 18
Lines 1959 1991 +32
=========================================
+ Hits 1959 1991 +32 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
""" | ||
# Setup | ||
data_int_with_nan = pd.Series([1.0, 2.0, np.nan, 4.0, 5.0]) | ||
data_not_convetible = pd.Series(['a', 'b', 'c', 'd', 'e']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: convetible -> convertible
CU-86ayz7xvf
Resolve #747