-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.txt
26 lines (17 loc) · 1.16 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Anna Fenske (af2570)
NLP Final Project
CLICKBAIT CLASSIFIER
Files:
features.py: feature extractor
train.py: collect featuresets of training data, write them to training_data.json
test.py: train classifier and test on test_clickbait.json and test_news.json
TO RUN: python test.py [news OR clickbait OR all]
testing_data\test_clickbait.json: corpus of headlines from Buzzfeed (not in training corpus) for testing
results\output_clickbait.txt: output from test.py on test_clickbait.json
testing_data\test_news.json: corpus of headlines from New York times (not in training corpus) for testing
results\output_news.txt: output from test.py on test_news.json
testing_data\test_all.json: test data from both test_clickbait.json and test_news.json
results\output_all.txt: output from test.py on test_all.json
training_data.json: Headlines from all sources and their feature sets and annotations for training classifier
NOTE: training_data.json not included in this repository (file size too large). This should not be an issue though since test.py runs train.py.
training_data: directory holding annotated training data separated by source