Updates and bugfixes to the workshop materials #75

rhiever · 2016-10-30T17:28:41Z

Hi all, this PR contains several updates and bug fixes for the SciPy sklearn workshop. Sorry for the huge PR, but this was the easiest way to merge my fixes and updates to the workshop materials.

List of changes:

Fixed issues solutions to exercises #33, Add solution to 12 Case Study - SMS Spam Detection #45, Add solution to 17 In Depth - Linear Models #50, Add solution to 18 In Depth - Support Vector Machines #51, Add solution to 19 In Depth - Trees and Forests #52, Add solution to 20 Feature Selection #53, Add solution to 21 Unsupervised learning - Hierarchical and density-based clustering algorithms #54, Add solution to 22 Unsupervised learning - Non-linear dimensionality reduction #55
Added pandas as a requirement for the workshop (it's used in some examples)
Added an aside about scatterplot matrices when discussing the visualization of data. This is a very useful technique to know about for EDA.
Updated notebook code to sklearn v0.18 compatible code, most notably accomodating the move of many cross-validation modules to model_selection
Fixed notebook 17 where it was importing a helper plotting method from the wrong directly
Fixed notebook 17 where it was using %matplotlib notebook instead of %matplotlib inline, which causes issues for some users
Fixed a known bug in notebook 19 where the widget for the decision tree was not displaying the generated tree when the user set the max_depth > 0
Fixed notebook 20 exercise to properly generate the XOR data (+noise features)
Various typo and grammar fixes

Otherwise the charts don’t show up correctly for some users.

Per the scikit-learn 0.18 release

`pydot.graph_from_dot_data` returns a list, so we need to get the first (and only) Dot item in that list.

rasbt · 2016-11-01T01:11:36Z

Wow, thanks a lot for all the bug fixes. Also great that you added the missing exercises! That will come in handy for whoever needs to prepare the SciPy 2017 workshop :). We will just have to bit careful how we merge since I just saw that it replaced SciPy with "Webstep" at various
verious points.

amueller · 2016-11-02T14:21:01Z

notebooks/11 Text Feature Extraction.ipynb

-    "Then, we built a vocabulary of all tokens (lowercased words) that appear in our whole dataset. This is usually a very large vocabulary.\n",
-    "Finally, looking at our single sample, we could how often each word in the vocabulary appears.\n",
-    "We represent our string by a vector, where each entry is how often a given word in the vocabular appears in the string.\n",
+    "Then, we build a vocabulary of all tokens (lowercase words) that appear in our whole dataset. This is usually a very large vocabulary.\n",


I disagree with "lowercase". That would imply discarding capitalized words. Instead, what we do is lower-case all words.

amueller · 2016-11-02T14:23:09Z

notebooks/17 In Depth - Linear Models.ipynb

@@ -29,6 +29,13 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
+    "# SciPy 2016 Scikit-learn Tutorial"


what's this doing here?

Just added a missing title.

amueller · 2016-11-02T14:24:25Z

notebooks/17 In Depth - Linear Models.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Points are classified in a one-vs-rest fashion (aka one-vs-all), where we assign a test point to the class whose model has the highest confidence (in the SVM case, smallest distance to the separating hyperplane) for the test point."


Highest distance to the separating hyperplane.

amueller · 2016-11-02T14:26:09Z

notebooks/20 Feature Selection.ipynb

@@ -18,6 +18,13 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
+    "# SciPy 2016 Scikit-learn Tutorial"


did you add that to some but not all notebooks? Or was it already there in the rest?

It was already in there for most of them. I just made it consistent.

thanks, I think we didn't cover all of the notebooks, and thus some of them may have had missing titles ... (or we simply forgot)

amueller · 2016-11-02T14:26:53Z

notebooks/21 Unsupervised learning - Hierarchical and density-based clustering algorithms.ipynb

-      "matplotlib 1.5.1\n"
-     ]
+    "collapsed": false,
+    "nbpresent": {


what's this?

I haven't used nbpresent. What does this metadata do?

It does nothing. I believe I was checking out the nbpresent functionality but didn't actually use it. Still, the ipynb added this metadata. Not a big deal, I don't think.

amueller · 2016-11-02T14:30:25Z

notebooks/helpers.py

    # This is a weird way to get the indices but it works
    train_idx = None
    test_idx = None
-    for train_idx, test_idx in sss:
+    for train_idx, test_idx in sss.split(numeric_data, numeric_labels):


I don't understand what's happening here.

That's the hacky code that was already there. I just updated the API calls to sklearn 0.18.

Hm, but that wouldn't work on sklearn 0.17 anymore, right? We would have to add sklearn 0.18 to the requirements and check_env.ipynb then (in bold font) since it could confuse people otherwise. I think it would generally be a good idea to add travis tests for the notebooks to check if they all execute without error using the packages listed in the requirements. E.g., sth like jupyter nbconvert --to notebook --execute mynotebook.ipynb

hm yeah but hacky code is no good ;-)

Added an issue to fix it in the future.

rhiever · 2016-11-02T18:00:07Z

Made a couple fixes per your suggestions.

amueller · 2016-11-02T18:24:57Z

we should definitely get travis up for this. I just moved jobs yesterday and I'm pretty busy :-/

amueller · 2016-11-02T18:26:21Z

Ok, I'm merging this... thanks!

rhiever · 2016-11-02T19:09:31Z

Added an issue about Travis CI.

rhiever added 30 commits October 27, 2016 11:32

Update headers

7140614

Update headers

fffd757

Update README.md

ab7ad81

Remove out-of-core learning section

3c5a643

CountVectorizer import fix

b571454

%matplotlib notebook --> inline

92ef21d

Otherwise the charts don’t show up correctly for some users.

Change all references from cross_validation to model_selection

d2dd526

Per the scikit-learn 0.18 release

Bug fix for decision tree interactive plot

9200785

`pydot.graph_from_dot_data` returns a list, so we need to get the first (and only) Dot item in that list.

Explain what the colors mean in the interactive decision tree plot

6ce5fc7

Update slides

863a7d5

Add AutoML slides

8f9f586

Add sklearn cheat sheet

c77eebe

Add AutoML notebook

6b9312a

Update README.md

911ab64

Notebook cleanup

0f26532

Clean up material for presentation

8530ffb

Notebook cleanup

b10db57

Notebook cleanup

df9413a

Notebook cleanup

123e8aa

Notebook cleanup

cee0109

Update requirements

a0864f6

Add exercise solutions for notebook 17

8d08609

Add notebook 18 exercise solution

0dd8e76

Add notebook 19 exercise solution

1ffbebc

Add notebook 20 exercise solution

c628751

Add notebook 21 exercise solution

fd50f14

Add notebook 22 exercise solutions

98b971c

Update README.md

72a2356

Update README.md

6a83824

Update slides

60070e0

rhiever added 7 commits October 30, 2016 17:56

Typo fix in notebook 6 solution

0548865

Add exercise solutions for notebook 12

e1aa894

Put out-of-core learning notebook back

c9013c4

Remove TPOT

ad4d051

Roll back README

b443f84

Rollback notebook titles to SciPy

bbdb278

Rollback TODO file

cac7a18

rhiever changed the title ~~Merge back~~ Updates and bugfixes to the workshop materials Oct 30, 2016

amueller reviewed Nov 2, 2016

View reviewed changes

rhiever added 2 commits November 2, 2016 13:57

Update 11 Text Feature Extraction.ipynb

3f1ed91

Update 17 In Depth - Linear Models.ipynb

ca6a232

amueller merged commit 4f3a830 into amueller:master Nov 2, 2016

rhiever deleted the merge-back branch November 2, 2016 19:08

rhiever mentioned this pull request Nov 2, 2016

Set up Travis CI to automatically test all code #82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates and bugfixes to the workshop materials #75

Updates and bugfixes to the workshop materials #75

rhiever commented Oct 30, 2016

rasbt commented Nov 1, 2016 •

edited

Loading

amueller Nov 2, 2016

amueller Nov 2, 2016

rhiever Nov 2, 2016

rasbt Nov 2, 2016

amueller Nov 2, 2016

amueller Nov 2, 2016

rhiever Nov 2, 2016

rasbt Nov 2, 2016

amueller Nov 2, 2016

amueller Nov 2, 2016

rhiever Nov 2, 2016

amueller Nov 2, 2016

rhiever Nov 2, 2016

rasbt Nov 2, 2016

amueller Nov 2, 2016

rhiever Nov 2, 2016

rhiever commented Nov 2, 2016

amueller commented Nov 2, 2016

amueller commented Nov 2, 2016

rhiever commented Nov 2, 2016

Updates and bugfixes to the workshop materials #75

Updates and bugfixes to the workshop materials #75

Conversation

rhiever commented Oct 30, 2016

rasbt commented Nov 1, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhiever commented Nov 2, 2016

amueller commented Nov 2, 2016

amueller commented Nov 2, 2016

rhiever commented Nov 2, 2016

rasbt commented Nov 1, 2016 •

edited

Loading