Matrix Methods in Hadoop

David F. Gleich, Computer Science, Purdue University

These codes accompany my presentation on Matrix Methods in Hadoop at the BIGDATA Techcon in Boston, MA in April 2013. I suspect they'll be used in other presentations as well.

Overview

The goal in these slides is to demonstrate how to implement simple matrix computations in Hadoop using Yelp's mrjob system.

Sparse matrix-vector products
Matrix-matrix products
A recommender system for epinions data

Getting started

Get mrjob working. Nothing here will require an actual MapReduce cluster, but feel free to use one if you wish! I setup a virtualenv for this and use pip.
```
 mkdir envs
 virtualenv envs/mrjob
 source envs/mrjob/bin/activate
 pip install mrjob
```
Get the datasets for the recommender system
```
 make getdata
```

Run some examples

Sparse matrix-vector products

 python codes/smatvec.py samples/smat_10_5_A.txt samples/vec_5.txt 
 
 # Compare the output to a non-MR computation
 python codes/test_smatvec.py samples/smat_10_5_A.txt samples/vec_5.txt

Sparse matrix-matrix products

 python codes/matmat.py samples/smat_10_5_A.txt samples/smat_5_5.txt 
 
 # Compare the output to a non-MR computation
 python codes/test_smatmat.py samples/smat_10_5_A.txt samples/smat_5_5.txt

Run the recommender system

Warning, this actually takes a while. I'm not sure where the bottle-neck is, but

python recsys/recsys.py data/rating.txt.gz data/user_ratings.txt.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Matrix Methods in Hadoop

David F. Gleich, Computer Science, Purdue University

Overview

Getting started

Run some examples

Run the recommender system

Files

README.md

Latest commit

History

README.md

File metadata and controls

Matrix Methods in Hadoop

David F. Gleich, Computer Science, Purdue University

Overview

Getting started

Run some examples

Run the recommender system