-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replicate result of cv splits with the solution of a fit #791
Comments
I'm not sure what are you trying to do? Could you please provide more context? Are you trying to call AutoML 5 times on each fold and compare it with AutoML running on all folds? |
Yes. An equivalent to :
Using the solution code from the best_tpot_pipeline.py:
here, expect that final_metric is equivalent to the value reported by AutoML (I understand that my example is with tpot, sorry) |
Hi @bsaldivaremc2, It should be possible to extract each step from AutoML and run each fold training manually. We don't have such feature to automatically extract pipeline steps for selected model. So you need to debug selected model to get each step of the pipeline. |
Hi, thanks for the reply. Just to be sure I am communicating my doubt correctly. This is what I want to do:
When I run this I always get a mean_auc higher than the one found during fit. Like if the model already knows all folds. I feel like in the loop I am doing at the end I am giving the model to fit on new data progressively and therefore it learns all folds. Could you please point out if I am doing this wrongly? |
@bsaldivaremc2 when you run AutoML for second time you have smaller dataset ... do you want to have quick chat on google meets? send me an email at [email protected] |
Best regards.
I read and tested the previous issues:
Fit best model on new data in Optuna mode
Saving mljar automl model for future use
I want to replicate manually the results found with the normal fit (compete, optuna or other).
For instance, I already run fit with optuna mode and some custom cv_indices and the results were stored.
The cv_indices has 5 elements, so I want to do:
And this print should be similar to the results reported during the fit:.
Nonetheless, the result I get is higher (like if the model already saw the data, therefore already part of "training set")
I appreciate your help.
The text was updated successfully, but these errors were encountered: