Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Class imbalance #21

Open
chengsoonong opened this issue Nov 15, 2016 · 4 comments
Open

Class imbalance #21

chengsoonong opened this issue Nov 15, 2016 · 4 comments

Comments

@chengsoonong
Copy link
Owner

No description provided.

@MatthewJA
Copy link
Collaborator

Do you mean how to handle it? The simplest way is to tack a keyword argument onto the Predictor à la sklearn. Hypothetically a class balancer could be part of a pipeline but in my opinion this is a problem for the Predictor to deal with.

@chengsoonong
Copy link
Owner Author

One option would be to have a knob between 0 and 1.
0 = ignore class imbalance, and train. All predictors should support this.
1 = use the class proportions as the weights (balanced in sklearn). But some predictors may support this
inbetween = interpolate between 0 and 1.

@MatthewJA
Copy link
Collaborator

Sounds good to me.

For my own curiosity: In what situation would a value of, say, 0.5 be useful?

@chengsoonong
Copy link
Owner Author

If the astronomer wants to take care of class imbalance (say interested in rare classes), but does not trust that the class proportion observed in the current labelled set is the true class proportions.

This kind of reasoning is typical in machine learning. We assume that we know how to adjust if we know the true population value. But we really don't know, so we estimate the value based on data. But we don't trust the estimate, so we hedge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants