-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support fill_value for SimpleImputer with string data #1123
Support fill_value for SimpleImputer with string data #1123
Conversation
cd6b54d
to
e1716a3
Compare
@xadupre Sorry for the direct mention, but looking at the git history, you might be the best person to ask. Is there anything I can do to help with review on this pull request? |
Sorry for the delay, I was off. Your PR looks good. |
No worries. I will try to clean up the CodeQL reported issues in the PR tomorrow. The CI failures look to be related to other unchanged areas of code, but I am also willing to dive into those too as I have time this week. |
@xadupre I went ahead and re-ran the CI jobs on my own fork. It looks like the latest merge from main that I pulled into this branch addressed the remaining test flakiness. I believe the workflows should succeed now for this branch. Thank you for fixing those tests / dependencies in the other PRs you did! |
d4f5318
to
634a06e
Compare
Signed-off-by: Brandon Bodnar <[email protected]>
634a06e
to
e457593
Compare
Adds support for setting the
fill_value
of a SimpleImputer using the constant strategy on string data. This is done by moving the existing guard statement that limits string values offill_value
to be only the value "nan".Based on some testing, likely this guard statement can be removed completely due to sci-kit learn doing it's own casting checks when fitting data to the SimpleImputer (see https://github.com/scikit-learn/scikit-learn/blob/5600faab452b0adb1f4090219012aa3ae6208c66/sklearn/impute/_base.py#L375-L398). I moved the statement in this PR, in case there was some edge case I missed while doing my own checks and there is some numeric SimpleImputer that allows
fill_value=="nan"
. But let me know if you rather I remove it outright.Addresses #1122