Repository of NLP code from research and courses (Text Pre-Processing, Semantic Similarity Scores, Vector Paraphrase, Word2Vec, etc.(
Currently enrolled in Computational Programming and Linguistics as part of Georgetown's Master's of Data Science and Analytics program for Spring 2022. The course is designed and taught by Professor Elizabeth Merkhofer. Topics include :
- Authorship attribution
- Semantic text similarity
- Paraphrase identification using Logistic Regression
- Feature extraction
- Usage of TFIDF vectorizers, stemmers, BLEU, WER, NIST, LCS, cosine similarity
- Various machine learning models (SVM, logistic regression)
- Neural vector space models
- Named Entity Recognition
- Conditional Random Fields
I am also a current Research Assistant for Professor Nita Rudra in Georgetown's Government Department, using NLP and deep learning models to parse SEC labor documents and classify different types of labor risks across varying economics. Related code will be put here as allowed.