GitHub - furncyn/big-data-traning: A program that analyzes Reddit comments on politics by parsing texts into a smooth format and use them to train a Spark's classifier to study the data points and trends on Reddit.

furncyn / big-data-traning Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

A program that analyzes Reddit comments on politics by parsing texts into a smooth format and use them to train a Spark's classifier to study the data points and trends on Reddit.

0 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
test_proj1		test_proj1
.gitignore		.gitignore
analysis.R		analysis.R
cleantext.py		cleantext.py
readme.txt		readme.txt
reddit_model.py		reddit_model.py
team.txt		team.txt

Repository files navigation

			CS143 Project 2B

(Extra credit) Task 10 part 5: We compute the percentage of positive and negative comments over each month 

To run the script:
Usage: `spark-submit reddit_model.py` or run in `pyspark` shell
Additional: The program requires the use of javascript 8 not 11
 - To change the version of javascript in Linux, type `sudo update-alternatives --config java`

To run analysis.R you may need to move and rename the .csv files. If it will not take the .csv files after you have moved them, then it may be necessary to go into the R files and change the file references to the full path.

About

A program that analyzes Reddit comments on politics by parsing texts into a smooth format and use them to train a Spark's classifier to study the data points and trends on Reddit.

Readme

Activity

0 stars

2 watching

0 forks

Report repository