youonf / text_summarization_document_classification Public

Notifications You must be signed in to change notification settings
Fork 1
Star 0

Models are built to perform text summarization and document classification.

0 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
notebook		notebook
LICENSE		LICENSE
README.md		README.md

Repository files navigation

NLP - Text Summarization and Document Classification

This is a project work of Xccelerate Data Science Bootcamp Cohort 3.

Project work for Natural Language Processing

Find a document and auto-summarize it. It can be a blog/news article/research paper. Use a machine based approach for this.

2 versions of text summarization are built:
- Version 1: Do word count of the paragraph. The score of each word will be its occurrence.
- Version 2: Using TF-IDF for summarization.
Due to time-constraint, the machine learning part is still to be developed. You can go to the notebook to find out more.

Skills: BeautifulSoup, nltk, TF-IDF (without using tfidf vectorizer)
Find a list of blogs from any website and find a theme. Use this learning to test if new articles can have be classified into themes as well.

2 models are built:
- Version 1: K-means to classify TF-IDF matrix of the news articles
- Version 2: Supervised learning for news articles categorization
Skills: nltk, K-means, elbow method, Silhouette Coefficient, tfidf_vectorizer, randomforestclassifier, xgboost, LinearSVC, KNN

About

Models are built to perform text summarization and document classification.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 100.0%