Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark/dataset for classical ML algorithms #188

Open
ksangeek opened this issue Mar 9, 2019 · 3 comments
Open

Add benchmark/dataset for classical ML algorithms #188

ksangeek opened this issue Mar 9, 2019 · 3 comments
Labels
Backlog An issue to be discussed in a future Working Group, but not the immediate next one.

Comments

@ksangeek
Copy link

ksangeek commented Mar 9, 2019

I don't see any datasets in MLPERF, which can be solved with classical machine learning algorithms ( e.g. Linear or Logistic Regression, Decision Trees, Random Forest etc.).
Some examples of datasets I can reference here are :

  1. https://www.kaggle.com/c/criteo-display-ad-challenge/data for binary classification.
  2. https://www.kaggle.com/c/house-prices-advanced-regression-techniques for regression.

These would be useful for use in real-world scenarios where interpretability of the prediction is of utmost importance. Generalized Linear Models have a good share in the real world for this very reason!
I did not find a reference which states that MLPERF is only for deep learning problems, so I think this kind of benchmark/dataset should be added for the democratization of these suit of benchmarks.
Thanks!

@psyhtest
Copy link

psyhtest commented Mar 9, 2019

I totally agree that ML != DL, but do you have any data on how widely these models are used in production?

@ksangeek
Copy link
Author

ksangeek commented Mar 9, 2019

Well, I think they target different problem space(though sometimes overlap). I can't confidently say much about the actual usage in production, but based on Kaggle survey 2018 I still see sizable importance given by data science practitioners to sklearn, random forest and xgboost. There are also new promising players like snapML and cuML which continue to invest in the classic machine learning space.

@TheKanter
Copy link
Contributor

TheKanter commented Mar 9, 2019 via email

@petermattson petermattson added the Backlog An issue to be discussed in a future Working Group, but not the immediate next one. label May 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backlog An issue to be discussed in a future Working Group, but not the immediate next one.
Projects
None yet
Development

No branches or pull requests

4 participants