-
-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Learning when drop_zeros=False #1595
Comments
Hey there @ColdTeapot273K! Would you say this is a bug? The idea here is that there is no need to keep track of values when What am I missing? Maybe it would be preferable if |
@MaxHalford sorry for the delay I got the idea, but let me argue that it's a hack. Let me argue that learning implies some "learnt state", as per principle of least astonishment. Workflows for debug, integration with other libs, serialization, etc. of learnable transformers/estimators rely on this behavior. Personally I had quite some cases where it was at least handy (sometimes - necessary) to inspect such a state to verify the behaviour, especially during productionisation of pipelines, converting to other frameworks, developing extensions, etc. The current implementation is a logic shortcut which saves on some space (& maybe time) complexity. At the cost of "correctness", as in, making learning logic inconsistent and disabling workflows that depend on inspecting a learnt state. Proposal: the previous implementation (which had the learnt state) was just fine and should be the default. The current shortcut implementation can be re-implemented by advanced users who are onto optimizing library code for some specific use case. |
@MaxHalford I can make a PR with fix if it makes your life easier 😅 |
Versions
river version: recent main (d606d7b)
Python version: Python 3.11.8
Operating system: Fedora Linux
Describe the bug
These 4 interesting lines effectively stop
OneHotEncoder
from learning whendrop_zeros=False
:Steps/code to reproduce
Setup:
Actual result:
Expected result:
The text was updated successfully, but these errors were encountered: