Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues for nltk_guidelines: NLTK Methods - End #2

Open
rachelrakov opened this issue May 23, 2018 · 0 comments
Open

Issues for nltk_guidelines: NLTK Methods - End #2

rachelrakov opened this issue May 23, 2018 · 0 comments

Comments

@rachelrakov
Copy link

rachelrakov commented May 23, 2018

  • In section "NLTK Methods with the NLTK Corpus", when importing matplotlib, mention what it is - define it.

  • In section "Searching for Words":

    • Typo: "What about Sense and Sensibility?" - add question mark
      *Define "phatic function", as compared to a semantic one; may not be known by all
  • In section "Positioning Words"

    • Define what a pipeline is
  • In section "Types vs Tokens":

    • Typo: "we would need a count of the 'lol's." is missing and end parenthesis
    • You bring in list comprehensions here with the code text1_tokens= [t.lower() for t in text1 if t.isalpha()] - I am not certain if participants will have ever seen a list comprehension before (not sure if Patrick covers it in the Python workshop) - take a minute to explain them.
    • Might be worth mentioning that .isalpha() ignores punctuation - that is not necessarily intuitive
    • Restore non-code after the above example back to words in markdown format - for awhile it's all in code format, which makes it very difficult to read
  • In section "Cleaning the Corpus":

    • Might be work explaining how len(set(text1_tokens)) shows how many unique words there are - unless you have predefined set, which I did not see.
    • It is important, if you are billing list comprehensions as a computationally faster way of completing what can be done in lists, to define them and mention that they are computationally faster as early on as you can.
    • Typo: "but just for illustration, we can try it out with a instead (Porter is the most common)." Need a word between a and instead.
    • I am not certain that Patrick will have taught dictionaries in his lesson - check with him before you mention dictionaries in trying to describe Frequency Distributions.
    • I am also not certain if Patrick will have taught pass during his workshop - might need to explain what that does as you are going through the code.
  • In section "Make Your Own Corpus":

    • Once again, make the text that isn't code here back into a markdown format, for ease of reading (might just be a matter of making the cell a markdown cell, if you wrote this in jupyter notebook)
  • In section "Part of Speech Tagging":

    • Typo - "I'll make an epty dictionary to hold my results" - should be "empty"
    • Once again, you may need to explain the syntax of creating a dictionary here, if that is not covered in the Python tutorial, as well as what a dictionary is and how it works. Check with Patrick.
    • Fix the formatting at the end - once again, it's all in code format, making it hard to read. Everything that isn't code should be in markdown cells.
  • Are you going to add what is at the end of your jupyter notebook for this session into your nltk_guidelines markdown? There is discrepancy there; not sure if that is deliberate or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant