Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminal Node Constraint for Random Survival Forests #159

Open
mastervii opened this issue Dec 29, 2020 · 4 comments
Open

Terminal Node Constraint for Random Survival Forests #159

mastervii opened this issue Dec 29, 2020 · 4 comments

Comments

@mastervii
Copy link

Is there any parameter/variable which states the minimum number of "uncensored samples" required to be at a leaf node? I think it's called "a minimum of d0 > 0 unique deaths" in the original paper.

@sebp
Copy link
Owner

sebp commented Jan 4, 2021

Currently, d_0 is essentially fixed at 1 (see this part of the code).

I think it would make sense to make it configurable by adding it as a hyper-parameter to SurvivalTree.

@mastervii
Copy link
Author

mastervii commented Jan 5, 2021

I have tested it and there were a few nodes with only censored samples. I think when denom = 0, it implies that the current node contains only censored samples (– i.e., rs_total.n_events is always 0 throughout the time points), thus, no further split will take place. Hence, this current node, however, becomes a leaf with all censored samples which breaks the constraint of d_0 > 0.

@sebp
Copy link
Owner

sebp commented Jan 5, 2021

That's a very good point. In this case, it seems to be more complicated, because in sklearn's code the criterion does not determine whether a split is valid or not, but the TreeBuilder and Splitter do. One could misuse criterion.weighted_n_left, criterion.weighted_n_right and criterion.weighted_n_node_samples to only refer to uncensored samples, which would make the min_weight_fraction_leaf parameter similar to d_0.

@mastervii
Copy link
Author

Good idea. So, I could just create another variable, say weighted_n_node_uncensored_samples, to keep track of the uncensored samples, together with weighted_uncensored_left and weighted_uncensored_right. Update them in update(). And if weighted_uncensored_left == 0.0 or weighted_uncensored_right == 0.0 then return -INFINITY in proxy_impurity_improvement().

@sebp sebp reopened this Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants