Skip to content

Models are built to perform text summarization and document classification.

License

Notifications You must be signed in to change notification settings

youonf/text_summarization_document_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

NLP - Text Summarization and Document Classification

made-with-python MIT license

This is a project work of Xccelerate Data Science Bootcamp Cohort 3.

Project work for Natural Language Processing

  1. Find a document and auto-summarize it. It can be a blog/news article/research paper. Use a machine based approach for this.

    2 versions of text summarization are built:

    • Version 1: Do word count of the paragraph. The score of each word will be its occurrence.
    • Version 2: Using TF-IDF for summarization.

    Due to time-constraint, the machine learning part is still to be developed. You can go to the notebook to find out more.

    Skills: BeautifulSoup, nltk, TF-IDF (without using tfidf vectorizer)

  2. Find a list of blogs from any website and find a theme. Use this learning to test if new articles can have be classified into themes as well.

    2 models are built:

    • Version 1: K-means to classify TF-IDF matrix of the news articles
    • Version 2: Supervised learning for news articles categorization

    Skills: nltk, K-means, elbow method, Silhouette Coefficient, tfidf_vectorizer, randomforestclassifier, xgboost, LinearSVC, KNN

About

Models are built to perform text summarization and document classification.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published