Skip to content

A program that analyzes Reddit comments on politics by parsing texts into a smooth format and use them to train a Spark's classifier to study the data points and trends on Reddit.

Notifications You must be signed in to change notification settings

furncyn/big-data-traning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

			CS143 Project 2B

(Extra credit) Task 10 part 5: We compute the percentage of positive and negative comments over each month 

To run the script:
Usage: `spark-submit reddit_model.py` or run in `pyspark` shell
Additional: The program requires the use of javascript 8 not 11
 - To change the version of javascript in Linux, type `sudo update-alternatives --config java`

To run analysis.R you may need to move and rename the .csv files. If it will not take the .csv files after you have moved them, then it may be necessary to go into the R files and change the file references to the full path. 

About

A program that analyzes Reddit comments on politics by parsing texts into a smooth format and use them to train a Spark's classifier to study the data points and trends on Reddit.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published