-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fully machine-readable datasets metadata #10
Comments
Can you please provide an example of a dataset which does not have all the information that you require, and tell us what information you would add? We have tried to provide the API that sklearn.datasets follow, when it was possible, with some additions when the repository provided more information. |
for example this one: https://github.com/daviddiazvico/scikit-datasets/blob/master/skdatasets/ucr.py
In more details it is described in openml/OpenML#876 :
|
Moved from #9
|
I mean I should just call an api to get the datasets on topic, and their description should cointain enough information for domain-specific feature engineering.
And then a call without any manual tuning to do the automatic feature engineering and ML settings and then a call to tune hyperparams and a call for training a model (I have a framework doing the things relying on machine-readable specification of features, though it is very unfinished, but for now enough for my needs).
In other words, if I developed a ml software for some task, I may wanna test it and existing software against as much datasets where this kind software is applicable as possible in order to compare. To do this I need as many datasets as possible with metadata about columns. Detection if a model can be applied to the dataset is simple - if the dataset contains the column of the type learned by that model, the model can be applied to it.
The text was updated successfully, but these errors were encountered: