-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates and bugfixes to the workshop materials #75
Conversation
Otherwise the charts don’t show up correctly for some users.
Per the scikit-learn 0.18 release
`pydot.graph_from_dot_data` returns a list, so we need to get the first (and only) Dot item in that list.
Wow, thanks a lot for all the bug fixes. Also great that you added the missing exercises! That will come in handy for whoever needs to prepare the SciPy 2017 workshop :). We will just have to bit careful how we merge since I just saw that it replaced SciPy with "Webstep" at various |
"Then, we built a vocabulary of all tokens (lowercased words) that appear in our whole dataset. This is usually a very large vocabulary.\n", | ||
"Finally, looking at our single sample, we could how often each word in the vocabulary appears.\n", | ||
"We represent our string by a vector, where each entry is how often a given word in the vocabular appears in the string.\n", | ||
"Then, we build a vocabulary of all tokens (lowercase words) that appear in our whole dataset. This is usually a very large vocabulary.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with "lowercase". That would imply discarding capitalized words. Instead, what we do is lower-case all words.
@@ -29,6 +29,13 @@ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# SciPy 2016 Scikit-learn Tutorial" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this doing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added a missing title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Points are classified in a one-vs-rest fashion (aka one-vs-all), where we assign a test point to the class whose model has the highest confidence (in the SVM case, smallest distance to the separating hyperplane) for the test point." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highest distance to the separating hyperplane.
@@ -18,6 +18,13 @@ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"# SciPy 2016 Scikit-learn Tutorial" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you add that to some but not all notebooks? Or was it already there in the rest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was already in there for most of them. I just made it consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I think we didn't cover all of the notebooks, and thus some of them may have had missing titles ... (or we simply forgot)
"matplotlib 1.5.1\n" | ||
] | ||
"collapsed": false, | ||
"nbpresent": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't used nbpresent. What does this metadata do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does nothing. I believe I was checking out the nbpresent functionality but didn't actually use it. Still, the ipynb added this metadata. Not a big deal, I don't think.
# This is a weird way to get the indices but it works | ||
train_idx = None | ||
test_idx = None | ||
for train_idx, test_idx in sss: | ||
for train_idx, test_idx in sss.split(numeric_data, numeric_labels): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what's happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the hacky code that was already there. I just updated the API calls to sklearn 0.18.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, but that wouldn't work on sklearn 0.17 anymore, right? We would have to add sklearn 0.18 to the requirements and check_env.ipynb then (in bold font) since it could confuse people otherwise. I think it would generally be a good idea to add travis tests for the notebooks to check if they all execute without error using the packages listed in the requirements. E.g., sth like jupyter nbconvert --to notebook --execute mynotebook.ipynb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm yeah but hacky code is no good ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an issue to fix it in the future.
Made a couple fixes per your suggestions. |
we should definitely get travis up for this. I just moved jobs yesterday and I'm pretty busy :-/ |
Ok, I'm merging this... thanks! |
Added an issue about Travis CI. |
Hi all, this PR contains several updates and bug fixes for the SciPy sklearn workshop. Sorry for the huge PR, but this was the easiest way to merge my fixes and updates to the workshop materials.
List of changes:
pandas
as a requirement for the workshop (it's used in some examples)model_selection
%matplotlib notebook
instead of%matplotlib inline
, which causes issues for some usersmax_depth
> 0