Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider opening up min_impurity_decrease in survival tree for early stoppping? #144

Open
agnesbao opened this issue Oct 6, 2020 · 3 comments

Comments

@agnesbao
Copy link

agnesbao commented Oct 6, 2020

Hello there,
Is it correct understanding that proxy_impurity_improvement here is in fact the log-rank z stats? In which case is it possible to open up min_impurity_decrease as a param like in sklearn as early stopping criteria instead of forcing it to be 0?

0.0, # min_impurity_decrease

Thanks,
Agnes

@sebp
Copy link
Owner

sebp commented Oct 6, 2020 via email

@agnesbao
Copy link
Author

agnesbao commented Oct 7, 2020

But the logrank criterion doesn't really use the concept of "node impurity" and that's why they are set to be all inf and not used in the algo, right? min_impurity_decrease is really min_logrank_zscore here for survival tree, and min_impurity_split is deprecated in sklearn anyway. I've tried overriding min_impurity_decrease with different threshold and it is behaving as expected in the current implementation, unless I'm missing something.

@sebp
Copy link
Owner

sebp commented Jan 4, 2021

Currently, proxy_impurity_improvement and impurity_improvement just return the log-rank test statistic. min_impurity_decrease is used in sklearn's tree builder, the improvement value is the return value of the criterion's impurity_improvement function, set by sklearn's node splitter. Therefore, I would conclude that min_impurity_decrease would essentially be an upper bound on the log-rank test statistic. This isn't useful, but if impurity_improvement returns the difference between the log-rank test statistic before the split and after the split has been applied, then min_impurity_decrease could be used to stop growing the tree if the change in log-rank test statistic becomes small.

If you have code already, it would be great if you could create a PR with your changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants