Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates and bugfixes to the workshop materials #75

Merged
merged 39 commits into from
Nov 2, 2016

Conversation

rhiever
Copy link
Contributor

@rhiever rhiever commented Oct 30, 2016

Hi all, this PR contains several updates and bug fixes for the SciPy sklearn workshop. Sorry for the huge PR, but this was the easiest way to merge my fixes and updates to the workshop materials.

List of changes:

@rhiever rhiever changed the title Merge back Updates and bugfixes to the workshop materials Oct 30, 2016
@rasbt
Copy link
Collaborator

rasbt commented Nov 1, 2016

Wow, thanks a lot for all the bug fixes. Also great that you added the missing exercises! That will come in handy for whoever needs to prepare the SciPy 2017 workshop :). We will just have to bit careful how we merge since I just saw that it replaced SciPy with "Webstep" at various
verious points.

"Then, we built a vocabulary of all tokens (lowercased words) that appear in our whole dataset. This is usually a very large vocabulary.\n",
"Finally, looking at our single sample, we could how often each word in the vocabulary appears.\n",
"We represent our string by a vector, where each entry is how often a given word in the vocabular appears in the string.\n",
"Then, we build a vocabulary of all tokens (lowercase words) that appear in our whole dataset. This is usually a very large vocabulary.\n",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with "lowercase". That would imply discarding capitalized words. Instead, what we do is lower-case all words.

@@ -29,6 +29,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# SciPy 2016 Scikit-learn Tutorial"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this doing here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added a missing title.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

"cell_type": "markdown",
"metadata": {},
"source": [
"Points are classified in a one-vs-rest fashion (aka one-vs-all), where we assign a test point to the class whose model has the highest confidence (in the SVM case, smallest distance to the separating hyperplane) for the test point."
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highest distance to the separating hyperplane.

@@ -18,6 +18,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# SciPy 2016 Scikit-learn Tutorial"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you add that to some but not all notebooks? Or was it already there in the rest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was already in there for most of them. I just made it consistent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I think we didn't cover all of the notebooks, and thus some of them may have had missing titles ... (or we simply forgot)

"matplotlib 1.5.1\n"
]
"collapsed": false,
"nbpresent": {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't used nbpresent. What does this metadata do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does nothing. I believe I was checking out the nbpresent functionality but didn't actually use it. Still, the ipynb added this metadata. Not a big deal, I don't think.

# This is a weird way to get the indices but it works
train_idx = None
test_idx = None
for train_idx, test_idx in sss:
for train_idx, test_idx in sss.split(numeric_data, numeric_labels):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what's happening here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the hacky code that was already there. I just updated the API calls to sklearn 0.18.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but that wouldn't work on sklearn 0.17 anymore, right? We would have to add sklearn 0.18 to the requirements and check_env.ipynb then (in bold font) since it could confuse people otherwise. I think it would generally be a good idea to add travis tests for the notebooks to check if they all execute without error using the packages listed in the requirements. E.g., sth like jupyter nbconvert --to notebook --execute mynotebook.ipynb

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm yeah but hacky code is no good ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an issue to fix it in the future.

@rhiever
Copy link
Contributor Author

rhiever commented Nov 2, 2016

Made a couple fixes per your suggestions.

@amueller
Copy link
Owner

amueller commented Nov 2, 2016

we should definitely get travis up for this. I just moved jobs yesterday and I'm pretty busy :-/

@amueller
Copy link
Owner

amueller commented Nov 2, 2016

Ok, I'm merging this... thanks!

@amueller amueller merged commit 4f3a830 into amueller:master Nov 2, 2016
@rhiever rhiever deleted the merge-back branch November 2, 2016 19:08
@rhiever
Copy link
Contributor Author

rhiever commented Nov 2, 2016

Added an issue about Travis CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants