GitHub - annabfenske/clickbait_classifier: Naive Bayes Classifier for article headlines which annotates a headline as either 'clickbait' or 'news' (non-clickbait)

annabfenske / clickbait_classifier Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Naive Bayes Classifier for article headlines which annotates a headline as either 'clickbait' or 'news' (non-clickbait)

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
results		results
testing_data		testing_data
training_data		training_data
LICENSE		LICENSE
README.txt		README.txt
features.py		features.py
test.py		test.py
train.py		train.py

Repository files navigation

Anna Fenske (af2570)
NLP Final Project

CLICKBAIT CLASSIFIER

Files:
	features.py: feature extractor
	
	train.py: collect featuresets of training data, write them to training_data.json
	
	test.py: train classifier and test on test_clickbait.json and test_news.json
		TO RUN: python test.py [news OR clickbait OR all]
	
	testing_data\test_clickbait.json: corpus of headlines from Buzzfeed (not in training corpus) for testing
		results\output_clickbait.txt: output from test.py on test_clickbait.json
	
	testing_data\test_news.json: corpus of headlines from New York times (not in training corpus) for testing
		results\output_news.txt: output from test.py on test_news.json
	
	testing_data\test_all.json: test data from both test_clickbait.json and test_news.json
		results\output_all.txt: output from test.py on test_all.json
	
	training_data.json: Headlines from all sources and their feature sets and annotations for training classifier
	NOTE: training_data.json not included in this repository (file size too large). This should not be an issue though since test.py runs train.py.
	
	training_data: directory holding annotated training data separated by source