This is the repo for the group learning about Natural Language Processing.
This project aims to capture major decade trends that occur in New York Times article titles. This will be accomplished by pulling article titles from the NYT Archives API and performing text analysis. The goal is to find words that are important for a given decade relative to all other decades. The important concepts from a decade will be visualized using word clouds. This project will be built using python with the nltk and wordcloud libraries.
- recessions vs expansion periods: what are significant words for each across all 150 years?
- recessions through the years: what makes each 50-year period of recessions unique?
- expansions through the years: what makes each 50-year period of expansions unique?
- introduction - research questions
- explain data (nytapi)
- explain methods (python api, tfidf, wordcloud)
- show results 1
- show results 2
Use this to install libraries (binaries) as wheel files. http://www.lfd.uci.edu/~gohlke/pythonlibs/
We use the archives api from the nyt api. https://developer.nytimes.com/archive_api.json
We can start learning NLP by going through this tutorial. http://www.nltk.org/book/
gensim is a python library https://radimrehurek.com/gensim/models/word2vec.html