Skip to content

Naive Bayes Classifier for article headlines which annotates a headline as either 'clickbait' or 'news' (non-clickbait)

License

Notifications You must be signed in to change notification settings

annabfenske/clickbait_classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Anna Fenske (af2570)
NLP Final Project

CLICKBAIT CLASSIFIER

Files:
	features.py: feature extractor
	
	train.py: collect featuresets of training data, write them to training_data.json
	
	test.py: train classifier and test on test_clickbait.json and test_news.json
		TO RUN: python test.py [news OR clickbait OR all]
	
	testing_data\test_clickbait.json: corpus of headlines from Buzzfeed (not in training corpus) for testing
		results\output_clickbait.txt: output from test.py on test_clickbait.json
	
	testing_data\test_news.json: corpus of headlines from New York times (not in training corpus) for testing
		results\output_news.txt: output from test.py on test_news.json
	
	testing_data\test_all.json: test data from both test_clickbait.json and test_news.json
		results\output_all.txt: output from test.py on test_all.json
	
	training_data.json: Headlines from all sources and their feature sets and annotations for training classifier
	NOTE: training_data.json not included in this repository (file size too large). This should not be an issue though since test.py runs train.py.
	
	training_data: directory holding annotated training data separated by source

About

Naive Bayes Classifier for article headlines which annotates a headline as either 'clickbait' or 'news' (non-clickbait)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages