cyberbullying

CS229 Final Project

Please view in raw format

All files provided and ready to run simulator.py

Read step 4 for simulator use

If you wish to recreate the experiment from scratch, do the following:

Delete all non-py files from the main directory (files, not folders)
Edit parse_data.py - You will find 2 hyperparameters
- w = boolean indicating word or character gram
- n = number of variable in each dictionary slot
- default: 1 and 1

1.5 Run parse_data

-Requires a folder titled "Myspace" in the same folder as parse_data
-Requires a folder titled "Bully output" inside "Myspace", containing files like the sample one provided
-Requires XML files of conversations inside "Myspace"
-Outputs as pickle dumps:
	-using_files.txt - which XML conversation files were used
	-feature_list.txt - feature dictionary
	-labels.txt - labels aligned with using_files
	-feature_matrix.txt -  contains sentence vectors, encoded using feature_list

Run shuffle_results.py
- Outputs as pickle dumps -training/test_labels.txt: Shuffled 80-20 division of labels.txt -training/test_matrix.txt: Shuffled (same) 80-20 division of feature_matrix.txt
Run svm.py

-Trains SVM using training data (which is divided 80-20 into train/dev) -Outputs error on dev set -Outputs error on test set
Run simulator.py

-Do NOT run from IDLE (Windows), just double click -Simulator is extremely basic; input text and press enter -If bullying is detected, it will output as such -type "s bully" (no quotes) to tell the SVM the previous statement was bullying -same for "s not bully" -SVM will NOT update until you type "Done" -Outputs: -master_convo.npz - sparse matrix representation of ALL data, direct SVM input -model.pkl - Current SVM model
Re-run simulator.py to test

Enjoy!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cyberbullying

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Myspace.rar		Myspace.rar
README.md		README.md
feature_list.txt		feature_list.txt
feature_matrix.txt		feature_matrix.txt
labels.txt		labels.txt
master_convo.npz		master_convo.npz
master_labels.txt		master_labels.txt
model.pkl		model.pkl
parse_data.py		parse_data.py
shuffle_results.py		shuffle_results.py
simulator.py		simulator.py
svm.py		svm.py
test_labels.txt		test_labels.txt
test_matrix.txt		test_matrix.txt
training_labels.txt		training_labels.txt
training_matrix.txt		training_matrix.txt
using_files.txt		using_files.txt

mike12724/cyberbullying

Folders and files

Latest commit

History

Repository files navigation

cyberbullying

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages