Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add reference output #39

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

add reference output #39

wants to merge 10 commits into from

Conversation

willow-ahrens
Copy link
Collaborator

@willow-ahrens willow-ahrens commented Oct 30, 2023

This PR adds reference output for binsparse matrices and vectors @BenBrock @ivirshup. fixes #38

@github-actions
Copy link

github-actions bot commented Oct 30, 2023

Automated Review URLs

@BenBrock
Copy link
Contributor

Hi @willow-ahrens, just took a look and ran into two issues with my parser:

  1. My parser reads all the binsparse matrices as having shape [0, 0], which I don't think is correct. Not sure if this is a parsing issue or an error in the files.
  2. There's no "version" key in the files, which my parser expects. Maybe we can add a "version" key equal to 0.1?

@willow-ahrens
Copy link
Collaborator Author

I think it should be fixed now!

@BenBrock
Copy link
Contributor

I'm still not seeing a version in the HDF5, which is causing my parser to fail. Here's what I see in hdf5dump for the binsparse JSON in /examples/reference/b1_ss_COO/b1_ss_COO.bsp.h5:

   ATTRIBUTE "binsparse" {
      DATATYPE  H5T_STRING {
         STRSIZE 328;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "{
               "binsparse": {
                   "format": "COOR",
                   "fill": true,
                   "shape": [
                       7,
                       7
                   ],
                   "data_types": {
                       "indices_0": "int64",
                       "indices_1": "int64",
                       "values": "float64",
                       "fill_value": "float64"
                   },
                   "attrs": {}
               }
           }
           "
      }
   }

Could we add a version tag with value 1.0? We should also perhaps discuss how version checking should work at some point. (e.g. should a non-tensor parser try to accept everything in [1.0, 2.0) and fail upon encountering something unparsable, or instead only accept something in [1.0, highest_known_nontensor_version]?)

@willow-ahrens
Copy link
Collaborator Author

willow-ahrens commented Apr 23, 2024

oops! I fixed Finch but I didn't regenerate the reference files. I have checked there is a version, it looks like it's currently set to 0.1. Should we set it to 1.0?

Regarding tensor vs. matrix formats, I think that tensors should be v1.0, but tensor parsers should normalize to the matrix format tags whenever possible. If it helps, we could define a key that tells us if there is tensor stuff.

Whether or not there are tensor formats feels orthogonal to 1.0 vs 2.0. We may one day want to upgrade matrix version from 1.0 to 2.0, but that shouldn't have to mean the upgraded version needs to support tensors.

@BenBrock
Copy link
Contributor

I can see the version there now, but it's stored as a string, not a number ("1.0" vs. 1.0), which creates some problems for my parser. Could you update again?

That makes sense about adding a key for the tensor extensions. I added a note to the agenda for our next meeting about that.

@willow-ahrens
Copy link
Collaborator Author

Are we sure we want to store versions as numbers? you mean a floating point number?

@DrTimothyAldenDavis
Copy link
Member

DrTimothyAldenDavis commented Apr 23, 2024 via email

@DrTimothyAldenDavis
Copy link
Member

DrTimothyAldenDavis commented Apr 23, 2024 via email

@BenBrock
Copy link
Contributor

Okay, so it sounds like we're going with a string version, where the string must satisfy the regex ^\d+.\d+$. [illustrative examples]

e.g., there's a major and minor version number only.

@BenBrock
Copy link
Contributor

@willow-ahrens I can read the binsparse files successfully now, but I have a couple of issues with the Matrix Market files. Would it be possible to just use the raw files from SuiteSparse Matrix Collection? My parser doesn't work on the tensor object type you use in these files, which seems to swap the row and column index.

@willow-ahrens
Copy link
Collaborator Author

I'll try to fix my matrixmarket today

@willow-ahrens
Copy link
Collaborator Author

@BenBrock I investigated this, and updated the matrixmarket writer to use matrix when applicable. Regarding row and column swapping, these mtx files match the index ordering used in the original files, so I don't think that's the issue?

@BenBrock
Copy link
Contributor

BenBrock commented May 16, 2024

Thanks, I can read the Matrix Market files, although I do still have a little problem with the rows/columns, which appear to me to be swapped in the Binsparse files.

Here's b1_ss.mtx as an example:

%%MatrixMarket matrix coordinate real general
7 7 15
5 1 -0.03599942
6 1 -0.0176371
7 1 -0.007721779
1 2 1.0
2 2 -1.0
1 3 1.0
3 3 -1.0
1 4 1.0
4 4 -1.0
2 5 0.45
5 5 1.0
3 6 0.1
6 6 1.0
*4 7 0.45* <- Let's look at this stored value as an example.
7 7 1.0

There's a stored value at (4, 7) one-indexed. That's (3, 6) zero-indexed, so that's what we should see in Binsparse. If I h5dump the included b1_ss_COO.bsp.h5 to compare, I see the following indices_0 (row indices) and indices_1 (column indices):

   DATASET "indices_0" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
      DATA {
      (0): 1, 2, 3, 1, 4, 2, 5, 3, *6*, 0, 4, 0, 5, 0, 6
      }
   }
   DATASET "indices_1" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
      DATA {
      (0): 0, 0, 0, 1, 1, 2, 2, 3, *3*, 4, 4, 5, 5, 6, 6
      }
   }

I've highlighted our indices with *, which are stored at indices_0[8] and indices_1[8]. For COOR, indices_0 holds row indices and indices_1 holds column indices, which means the Binsparse file stored (6, 3), not (3, 6) as it should be.

Not sure exactly where things are going wrong---maybe you're defaulting to COOC for COO instead of COOR?

@willow-ahrens
Copy link
Collaborator Author

willow-ahrens commented May 16, 2024

Ah, I see. I think I caught it, it was on my side, the writer for COO was mixing up the numbering of the levels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add reference output
3 participants