Skip to content

Commit

Permalink
CSHARP-5202: BSON Binary Vector Subtype Support
Browse files Browse the repository at this point in the history
  • Loading branch information
BorisDog committed Feb 4, 2025
1 parent ed795aa commit 166824a
Show file tree
Hide file tree
Showing 25 changed files with 2,187 additions and 129 deletions.
58 changes: 58 additions & 0 deletions specifications/bson-binary-vector/tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Testing Binary subtype 9: Vector

The JSON files in this directory tree are platform-independent tests that drivers can use to prove their conformance to
the specification.

These tests focus on the roundtrip of the list of numbers as input/output, along with their data type and byte padding.

Additional tests exist in `bson_corpus/tests/binary.json` but do not sufficiently test the end-to-end process of Vector
to BSON. For this reason, drivers must create a bespoke test runner for the vector subtype.

## Format

The test data corpus consists of a JSON file for each data type (dtype). Each file contains a number of test cases,
under the top-level key "tests". Each test case pertains to a single vector. The keys provide the specification of the
vector. Valid cases also include the Canonical BSON format of a document {test_key: binary}. The "test_key" is common,
and specified at the top level.

#### Top level keys

Each JSON file contains three top-level keys.

- `description`: human-readable description of what is in the file
- `test_key`: name used for key when encoding/decoding a BSON document containing the single BSON Binary for the test
case. Applies to *every* case.
- `tests`: array of test case objects, each of which have the following keys. Valid cases will also contain additional
binary and json encoding values.

#### Keys of individual tests cases

- `description`: string describing the test.
- `valid`: boolean indicating if the vector, dtype, and padding should be considered a valid input.
- `vector`: list of numbers
- `dtype_hex`: string defining the data type in hex (e.g. "0x10", "0x27")
- `dtype_alias`: (optional) string defining the data dtype, perhaps as Enum.
- `padding`: (optional) integer for byte padding. Defaults to 0.
- `canonical_bson`: (required if valid is true) an (uppercase) big-endian hex representation of a BSON byte string.

## Required tests

#### To prove correct in a valid case (`valid: true`), one MUST

- encode a document from the numeric values, dtype, and padding, along with the "test_key", and assert this matches the
canonical_bson string.
- decode the canonical_bson into its binary form, and then assert that the numeric values, dtype, and padding all match
those provided in the JSON.

Note: For floating point number types, exact numerical matches may not be possible. Drivers that natively support the
floating-point type being tested (e.g., when testing float32 vector values in a driver that natively supports float32),
MUST assert that the input float array is the same after encoding and decoding.

#### To prove correct in an invalid case (`valid:false`), one MUST

- raise an exception when attempting to encode a document from the numeric values, dtype, and padding.

## FAQ

- What MongoDB Server version does this apply to?
- Files in the "specifications" repository have no version scheme. They are not tied to a MongoDB server version.
51 changes: 51 additions & 0 deletions specifications/bson-binary-vector/tests/float32.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
{
"description": "Tests of Binary subtype 9, Vectors, with dtype FLOAT32",
"test_key": "vector",
"tests": [
{
"description": "Simple Vector FLOAT32",
"valid": true,
"vector": [127.0, 7.0],
"dtype_hex": "0x27",
"dtype_alias": "FLOAT32",
"padding": 0,
"canonical_bson": "1C00000005766563746F72000A0000000927000000FE420000E04000"
},
{
"description": "Vector with decimals and negative value FLOAT32",
"valid": true,
"vector": [127.7, -7.7],
"dtype_hex": "0x27",
"dtype_alias": "FLOAT32",
"padding": 0,
"canonical_bson": "1C00000005766563746F72000A0000000927006666FF426666F6C000"
},
{
"description": "Empty Vector FLOAT32",
"valid": true,
"vector": [],
"dtype_hex": "0x27",
"dtype_alias": "FLOAT32",
"padding": 0,
"canonical_bson": "1400000005766563746F72000200000009270000"
},
{
"description": "Infinity Vector FLOAT32",
"valid": true,
"vector": ["-inf", 0.0, "inf"],
"dtype_hex": "0x27",
"dtype_alias": "FLOAT32",
"padding": 0,
"canonical_bson": "2000000005766563746F72000E000000092700000080FF000000000000807F00"
},
{
"description": "FLOAT32 with padding",
"valid": false,
"vector": [127.0, 7.0],
"dtype_hex": "0x27",
"dtype_alias": "FLOAT32",
"padding": 3
}
]
}

57 changes: 57 additions & 0 deletions specifications/bson-binary-vector/tests/int8.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
{
"description": "Tests of Binary subtype 9, Vectors, with dtype INT8",
"test_key": "vector",
"tests": [
{
"description": "Simple Vector INT8",
"valid": true,
"vector": [127, 7],
"dtype_hex": "0x03",
"dtype_alias": "INT8",
"padding": 0,
"canonical_bson": "1600000005766563746F7200040000000903007F0700"
},
{
"description": "Empty Vector INT8",
"valid": true,
"vector": [],
"dtype_hex": "0x03",
"dtype_alias": "INT8",
"padding": 0,
"canonical_bson": "1400000005766563746F72000200000009030000"
},
{
"description": "Overflow Vector INT8",
"valid": false,
"vector": [128],
"dtype_hex": "0x03",
"dtype_alias": "INT8",
"padding": 0
},
{
"description": "Underflow Vector INT8",
"valid": false,
"vector": [-129],
"dtype_hex": "0x03",
"dtype_alias": "INT8",
"padding": 0
},
{
"description": "INT8 with padding",
"valid": false,
"vector": [127, 7],
"dtype_hex": "0x03",
"dtype_alias": "INT8",
"padding": 3
},
{
"description": "INT8 with float inputs",
"valid": false,
"vector": [127.77, 7.77],
"dtype_hex": "0x03",
"dtype_alias": "INT8",
"padding": 0
}
]
}

98 changes: 98 additions & 0 deletions specifications/bson-binary-vector/tests/packed_bit.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
{
"description": "Tests of Binary subtype 9, Vectors, with dtype PACKED_BIT",
"test_key": "vector",
"tests": [
{
"description": "Padding specified with no vector data PACKED_BIT",
"valid": false,
"vector": [],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": 1
},
{
"description": "Simple Vector PACKED_BIT",
"valid": true,
"vector": [127, 7],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": 0,
"canonical_bson": "1600000005766563746F7200040000000910007F0700"
},
{
"description": "Empty Vector PACKED_BIT",
"valid": true,
"vector": [],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": 0,
"canonical_bson": "1400000005766563746F72000200000009100000"
},
{
"description": "PACKED_BIT with padding",
"valid": true,
"vector": [127, 7],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": 3,
"canonical_bson": "1600000005766563746F7200040000000910037F0700"
},
{
"description": "Overflow Vector PACKED_BIT",
"valid": false,
"vector": [256],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": 0
},
{
"description": "Underflow Vector PACKED_BIT",
"valid": false,
"vector": [-1],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": 0
},
{
"description": "Vector with float values PACKED_BIT",
"valid": false,
"vector": [127.5],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": 0
},
{
"description": "Padding specified with no vector data PACKED_BIT",
"valid": false,
"vector": [],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": 1
},
{
"description": "Exceeding maximum padding PACKED_BIT",
"valid": false,
"vector": [1],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": 8
},
{
"description": "Negative padding PACKED_BIT",
"valid": false,
"vector": [1],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": -1
},
{
"description": "Vector with float values PACKED_BIT",
"valid": false,
"vector": [127.5],
"dtype_hex": "0x10",
"dtype_alias": "PACKED_BIT",
"padding": 0
}
]
}

30 changes: 30 additions & 0 deletions specifications/bson-corpus/tests/binary.json
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,36 @@
"description": "$type query operator (conflicts with legacy $binary form with $type field)",
"canonical_bson": "180000000378001000000010247479706500020000000000",
"canonical_extjson": "{\"x\" : { \"$type\" : {\"$numberInt\": \"2\"}}}"
},
{
"description": "subtype 0x09 Vector FLOAT32",
"canonical_bson": "170000000578000A0000000927000000FE420000E04000",
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"JwAAAP5CAADgQA==\", \"subType\": \"09\"}}}"
},
{
"description": "subtype 0x09 Vector INT8",
"canonical_bson": "11000000057800040000000903007F0700",
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"AwB/Bw==\", \"subType\": \"09\"}}}"
},
{
"description": "subtype 0x09 Vector PACKED_BIT",
"canonical_bson": "11000000057800040000000910007F0700",
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"EAB/Bw==\", \"subType\": \"09\"}}}"
},
{
"description": "subtype 0x09 Vector (Zero-length) FLOAT32",
"canonical_bson": "0F0000000578000200000009270000",
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"JwA=\", \"subType\": \"09\"}}}"
},
{
"description": "subtype 0x09 Vector (Zero-length) INT8",
"canonical_bson": "0F0000000578000200000009030000",
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"AwA=\", \"subType\": \"09\"}}}"
},
{
"description": "subtype 0x09 Vector (Zero-length) PACKED_BIT",
"canonical_bson": "0F0000000578000200000009100000",
"canonical_extjson": "{\"x\": {\"$binary\": {\"base64\": \"EAA=\", \"subType\": \"09\"}}}"
}
],
"decodeErrors": [
Expand Down
4 changes: 4 additions & 0 deletions src/MongoDB.Bson/ObjectModel/BsonBinarySubType.cs
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@ public enum BsonBinarySubType
/// </summary>
Sensitive = 0x08,
/// <summary>
/// Vector data.
/// </summary>
Vector = 0x09,
/// <summary>
/// User defined binary data.
/// </summary>
UserDefined = 0x80
Expand Down
Loading

0 comments on commit 166824a

Please sign in to comment.