-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
automatic detection of multioutput datasets #1001
base: development
Are you sure you want to change the base?
Conversation
Neat! Thank you for this PR. I think it should work for checking multi output dataset. But I checked this link, I think that not all regressors in our default config are supporting multi-output regression. I think we need a workaround for this issue. |
There are several ways to go about this:
|
Thank you for your ideas here. I prefer the 3rd one but I think a better solution based on it is to automatically use |
While working on it, i think i spotted a bug: Line 215 in aea42a5
This should be dep_import_str as the key and not import_str, right? It seems this also changes some tests. |
If i run the failing tests manually, they work. It has to be some unsuitable combination due to the random seed.. i'm not sure how to debug this. |
Yes, that is a bug. Hmm, I thought I fixed it a while ago but might not merge to master/dev branch. |
@@ -501,6 +500,35 @@ def _fit_init(self): | |||
self._last_optimized_pareto_front_n_gens = 0 | |||
self._setup_config(self.config_dict) | |||
|
|||
if multi_output_target: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying _config_dict may not work in the situation that use use a customized configurations instead of default one. So, I think a practical way is to modify the _compile_to_sklearn
function (here). If multi_output_target
is True
, then
sklearn_pipeline=MultiOutputClassifier(estimator=sklearn_pipeline)
or sklearn_pipeline=MultiOutputRegessor(estimator=sklearn_pipeline)
. I think it maybe a more general solution for multioutput dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems better, to be honest i didn't find a good place where to put my code and only settled on the _fit_init function therefore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay i looked into it, but the code would be a mess. Several functions would have to take multi_output_target as a new argument (most of them in export_utils.py), since they don't have access to the data or the TPOT Object.
Imo _fit_init seems to be the least intrusive point to include the checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for looking into. You are right. I think TPOT exported codes should also include MultiOutputRegessor/MultiOutputClassifier, which should change a lot of codes in TPOT. I will look into it when I get some time next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any updates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhmenke I am sorry for overlooking this. I did not get a chance to look into this issue those days due to my busy schedule. I agree that TPOT need some major changes for including MultiOutputRegessor/MultiOutputClassifier
. I may get some time in March to add those changes. You are welcome to push any changes meanwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use my PR as a temporary fix until you have time to thoroughly refactor the code? I can prepare an update with the current development branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I think we can use it for a temporary solution with a minor release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay i merged the current master/development into this. Should be good to go as an interim solution then.
Fix a bug in the PR
Sorry, my mistake! I changed it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we cannot add this support for now since it need more changes.
if model in single_output_classifiers: | ||
if 'sklearn.multioutput.MultiOutputClassifier' not in self._config_dict.keys(): | ||
self._config_dict['sklearn.multioutput.MultiOutputClassifier'] = {"estimator": {}} | ||
self._config_dict['sklearn.multioutput.MultiOutputClassifier']['estimator'][model] = self._config_dict[model] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only one sklearn.multioutput.MultiOutputClassifier in self._config_dict and estimator
is keeping updated until the last model in single_output_classifiers
so that the rest models should be removed.
Is this the closest pull to multi-output? any plans to finish it? |
sorry, no time for this right now.. feel free to work with my branch. i think the last todo is cloning the multioutputclassifier for every estimator that only supports single outputs as per weixuanfu's review. |
Would love to see multioutput regression/classification working with tpot! Keep going! Haha |
What does this PR do?
Slight variation of #903 depending on the shape of the target array.
What are the relevant issues?
#971