You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are a number of output files obtained from running collaborative filtering algorithms found in toolkits/collaborative_filtering that advertise themselves to be MatrixMarket files through a .mm extension or a %%MatrixMarket matrix array real general header, but do not seem to follow the expected MatrixMarket format as defined by NIST.
For example, the output of running ./toolkits/collaborative_filtering/rating --training=smallnetflix_mm --num_ratings=5 --quiet=1 --algorithm=als is two files:
smallnetflix_mm.ids
smallnetflix_mm.ratings
Their header is (only one is shown here):
$ head -n 10 smallnetflix_mm.ids
%%MatrixMarket matrix array real general
%This file contains item ids matching the ratings. In each row i, num_ratings top item ids for user i. (First column: user id, next columns, top K ratings). Note: 0 item id means there are no more items to recommend for this user.
95526 6
1 1243 424 2641 2109 1557
2 2641 1548 1227 548 76
3 1243 2548 1227 2641 76
4 1449 2641 2109 3172 1227
5 1449 1227 2298 735 1382
6 2109 2669 1227 3112 2583
7 3516 2016 2647 1548 1243
'array' here indicates to the parser that the output is expected to be one value per line (column-oriented), yet it is not the case. Other files with the same problem include files ending in _U.mm or _V.mm.
This problem is especially apparent when using mmread from scipy.io (Python third-party way of reading matrixmarket files) to read these files as the format is then perceived as invalid and the file can't be read. (The --R_output_format option is not changing any of that for me).
I might be missing something here though. Thanks for the tool :).
The text was updated successfully, but these errors were encountered:
There are a number of output files obtained from running collaborative filtering algorithms found in
toolkits/collaborative_filtering
that advertise themselves to be MatrixMarket files through a.mm
extension or a%%MatrixMarket matrix array real general
header, but do not seem to follow the expected MatrixMarket format as defined by NIST.For example, the output of running
./toolkits/collaborative_filtering/rating --training=smallnetflix_mm --num_ratings=5 --quiet=1 --algorithm=als
is two files:Their header is (only one is shown here):
'array' here indicates to the parser that the output is expected to be one value per line (column-oriented), yet it is not the case. Other files with the same problem include files ending in
_U.mm
or_V.mm
.This problem is especially apparent when using
mmread
fromscipy.io
(Python third-party way of reading matrixmarket files) to read these files as the format is then perceived as invalid and the file can't be read. (The--R_output_format
option is not changing any of that for me).I might be missing something here though. Thanks for the tool :).
The text was updated successfully, but these errors were encountered: