Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unconventional MatrixMarket format #25

Open
fenekku opened this issue May 20, 2014 · 2 comments
Open

Unconventional MatrixMarket format #25

fenekku opened this issue May 20, 2014 · 2 comments

Comments

@fenekku
Copy link

fenekku commented May 20, 2014

There are a number of output files obtained from running collaborative filtering algorithms found in toolkits/collaborative_filtering that advertise themselves to be MatrixMarket files through a .mm extension or a %%MatrixMarket matrix array real general header, but do not seem to follow the expected MatrixMarket format as defined by NIST.

For example, the output of running ./toolkits/collaborative_filtering/rating --training=smallnetflix_mm --num_ratings=5 --quiet=1 --algorithm=als is two files:

  • smallnetflix_mm.ids
  • smallnetflix_mm.ratings

Their header is (only one is shown here):

$ head -n 10 smallnetflix_mm.ids
%%MatrixMarket matrix array real general 
%This file contains item ids matching the ratings. In each row i, num_ratings top item ids for user i. (First column: user id, next columns, top K ratings). Note: 0 item id means there are no more items to recommend for this user.
95526 6 
1 1243 424 2641 2109 1557
2 2641 1548 1227 548 76 
3 1243 2548 1227 2641 76 
4 1449 2641 2109 3172 1227 
5 1449 1227 2298 735 1382 
6 2109 2669 1227 3112 2583
7 3516 2016 2647 1548 1243 

'array' here indicates to the parser that the output is expected to be one value per line (column-oriented), yet it is not the case. Other files with the same problem include files ending in _U.mm or _V.mm.

This problem is especially apparent when using mmread from scipy.io (Python third-party way of reading matrixmarket files) to read these files as the format is then perceived as invalid and the file can't be read. (The --R_output_format option is not changing any of that for me).

I might be missing something here though. Thanks for the tool :).

@zachmayer
Copy link

Theres a similar issue with inputs, particular for the gensgd program

@meteotester
Copy link

More about this unconventional MatrixMarket format:
#9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants