Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMOTE With Non-Continuous Data #154

Closed
jtsmith2 opened this issue Sep 21, 2016 · 2 comments
Closed

SMOTE With Non-Continuous Data #154

jtsmith2 opened this issue Sep 21, 2016 · 2 comments

Comments

@jtsmith2
Copy link

jtsmith2 commented Sep 21, 2016

I have a dataset with a feature that takes on only integer values (a performance rating, for example). Since SMOTE generates data along a continuous vector between two points, if they had differing values of this feature, the resulting point has a non-integer value. Using a decision tree, it quickly learns that these non-integer values are a good predictor of being in the minority class. For example, I end up with tree splits like: if performance <4 and then the next split is if performance > 3 to a leaf with all minority class data (all generated by SMOTE).

For example, a leaf of all SMOTE data when Performance Rating is 3 < x < 4.
smote-nc

In the original SMOTE paper, the authors suggested a SMOTE-NC for non-continuous data that would use the median of k-nearest neighbors for those non-continuous features. Is there is will to implement this feature? I suppose the user would need to pass the an index list of the non-continuous features over what is currently being passed to the algorithm.

@dvro
Copy link
Member

dvro commented Sep 21, 2016

@jtsmith2 idk if we're ought to include SMOTE-NC algorithm considering that we're also not supporting SMOTE for categorical data. So, at least for now we should keep it simple. For your specific problem I'd perform the following workaround:

import numpy as np

# regular smote processing here
# resulting in X, y ...

nc_feat_idxs = [0,1,5]
X[:,nc_feat_idxs] = np.round(X[:,nc_feat_idxs])

Let me know how it works!

[]'s

@glemaitre
Copy link
Member

glemaitre commented Oct 25, 2016

Move this is in new method #105

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants