Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: add timeout parameter to the .fit() method #10684

Open
fingoldo opened this issue Aug 8, 2024 · 6 comments
Open

Feature Request: add timeout parameter to the .fit() method #10684

fingoldo opened this issue Aug 8, 2024 · 6 comments

Comments

@fingoldo
Copy link

fingoldo commented Aug 8, 2024

Adding the timeout parameter to the .fit() method, that should force the library to return best known solution found so far as soon as provided number of seconds since the start of training are passed, will allow to satisfy training SLAs, when a user has only a limited time budget to finish certain model training. Also, this will make possible fair comparison of different hyperparameters.

Reaching the timeout should have the same effect as reaching max iterations, maybe with additional warning and/or attribute set so that the training job's finishing reason is clear to the end user.

@RAMitchell
Copy link
Member

Can you achieve this with a custom callback?

@fingoldo
Copy link
Author

fingoldo commented Aug 8, 2024

I did not realize it can be used to solve this problem. If I, while using early stopping, return True from my custom callback to stop the training, will the best iteration be set correctly by xgboost, or there will be some training progress loss?

@jameslamb
Copy link
Contributor

Can you achieve this with a custom callback?

Just to connect these 2 conversations... that is what I suggested in the feature request opened in LightGBM at the same time: microsoft/LightGBM#6596 (comment)

@fingoldo
Copy link
Author

Right, it seemed very natural to me to use direct timeout instead of (or along with) n_estimators, and I ideally would like to have an universal parameter for that (similar to n_estimators) in the major gradient boosting libraries. Most of the cases I'd say exact max number of trees is not important to the user, it's actually max time spent that matters. And some hyperparameters combinations can lead to vastly different runtimes even with the same n_estimators. The timeout parameter would solve this problem.

@fingoldo
Copy link
Author

Last but not least, imagine that aliens have attacked the Earth and we only have one minute to compute trajectories of their missiles with ML. If this feature request is approved, responsible person just sets timeout=60, we intervene, and survive.

1 similar comment
@fingoldo
Copy link
Author

Last but not least, imagine that aliens have attacked the Earth and we only have one minute to compute trajectories of their missiles with ML. If this feature request is approved, responsible person just sets timeout=60, we intervene, and survive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants