Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Simple but Complete Example? #89

Open
dakami opened this issue Aug 10, 2016 · 4 comments
Open

A Simple but Complete Example? #89

dakami opened this issue Aug 10, 2016 · 4 comments

Comments

@dakami
Copy link

dakami commented Aug 10, 2016

HI! I really like your Machine Learning for humans focus. Any chance you might provide an example that:

a) Reads from a CSV file, with labels as the first row
b) Builds a classification or a regression model
c) Reports whether the model works (split the CSV, do folding, etc)
d) Saves the model to disk
e) Applies that model to another CSV file with an identical schema, adding a column for predictions or replacing the existing column?

@jonas-eschle
Copy link

jonas-eschle commented Aug 10, 2016

I think this is may too off-topic. It does not look very difficult and mostly not focused on what the REP-repo provides. Anyway, if you want to do it, why don't you just create it? I would really appreciate that the developers work on implementations rather then on simple examples;)
a) use the pandas.to_csv()
b) I think there are enough examples in this repo to find out how that works:
http://nbviewer.jupyter.org/github/yandex/rep/blob/master/howto/01-howto-Classifiers.ipynb
c) Well...using the FoldingClassifier for and the ClassificationReport does this work. HowTo:
http://nbviewer.jupyter.org/github/yandex/rep/blob/master/howto/04-howto-folding.ipynb
d) there is an example for the new CacheClassifier in the docs. To reliable save it, use pickle.
e) same as c) for the classification part and have a look at pandas on how to add columns

What do you think?

Good luck with the challenge you are working on right now ;D

@dakami
Copy link
Author

dakami commented Aug 11, 2016

Hmm. I'm curious, what is the topic, and what is it that REP provides?

I've been creating a significant amount of this code. Pickle works in some projects, sometimes, in some contexts. Other times there actually isn't a way to serialize the model at all, which is sort of funny.

@arogozhnikov
Copy link
Contributor

arogozhnikov commented Aug 11, 2016

@dakami

it's hard to provide minimal complete example, as many people think about this differently.
Physicists are interested in .root (and never use csv), someone is using hdf5.

I'll try to fit something-like-minimal-pipeline this during the next rewriting of example notebooks, but no guarantees so far.

As for pickle – REP contains wrappers for several libraries, the wrappers follow the same scikit-learn-like interface, and we make sure (among other things) that pickle works for them.

Additionally there are meta-estimators to compose models / simplify training process and some other sweeties (check out documentation for details).

@anaderi
Copy link
Contributor

anaderi commented Aug 14, 2016

@dakami , I'd really appreciate if you could add a how-to similar to this one howto example with all the steps you've mentioned.
If you get stuck, let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants