Operator set wave 3 #805

fdwr · 2025-01-16T14:59:22Z

Adds the following operators, per #375:

Some TODO's remain:

finishing algorithm steps
examples to the scattering/gathering operators
data type tables

partial interface MLGraphBuilder
{
    ...
    MLOperand cumulativeSum(MLOperand input, unsigned long axis, optional MLCumulativeSumOptions options = {});
    MLOperand sign(MLOperand input, optional MLOperatorOptions options = {});
    MLOperand tile(MLOperand input, sequence<unsigned long> repetitions, optional MLOperatorOptions options = {});

    // Extends the family beyond the existing gather.
    MLOperand gatherElements(MLOperand input, MLOperand indices, optional MLGatherOptions options = {});
    MLOperand scatterElements(MLOperand input, MLOperand indices, MLOperand updates, optional MLScatterOptions options = {});
    MLOperand gatherND(MLOperand input, MLOperand indices, optional MLOperatorOptions options = {});
    MLOperand scatterND(MLOperand input, MLOperand indices, MLOperand updates, optional MLOperatorOptions options = {});

    MLOperand dequantizeLinear(MLOperand input, MLOperand scale, MLOperand zeroPoint, optional MLOperatorOptions options = {});
    MLOperand quantizeLinear(MLOperand input, MLOperand scale, MLOperand zeroPoint, optional MLOperatorOptions options = {});

    MLOperand logicalAnd(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
    MLOperand logicalOr(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
    MLOperand logicalXor(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
    MLOperand notEqual(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});

    MLOperand reverse(MLOperand input, optional MLReverseOptions options = {});
    MLOperand slice(
        MLOperand input,
        sequence<[EnforceRange] unsigned long> starts,
        sequence<[EnforceRange] unsigned long> sizes,
        optional MLSliceOptions options = {} // Now includes steps
    );
    ...
}

dictionary MLCumulativeSumOptions : MLOperatorOptions
{
    bool exclusive = false; // Post-sum addition rather than inclusive pre-sum. https://en.wikipedia.org/wiki/Prefix_sum
    bool reversed = false; // Reverse the summation direction
}

// Already exists for `gather`. Reuse for `gatherElements` too.
dictionary MLGatherOptions : MLOperatorOptions
{
    unsigned long axis = 0;
};

dictionary MLScatterOptions : MLOperatorOptions
{
    unsigned long axis = 0;
};

dictionary MLReverseOptions : MLOperatorOptions {
  sequence<[EnforceRange] unsigned long> axes;
};

dictionary MLSliceOptions : MLOperatorOptions {
  sequence<[EnforceRange] unsigned long> strides;
};

Preview | Diff

inexorabletash · 2025-01-16T15:17:02Z

Another "TODO" - the new ops need "constraints" tables

inexorabletash · 2025-01-16T18:18:27Z

Note from discussion w/ @a-sully - CoreML has restrictions on the dequantize op that we'll need to think about.

scale and bias must be constant
the input must be int8 or uint8 (note: CoreML has a different operator for (u)int4, but it requires everything - including the input - to be constant)
scale and bias must be either scalars or 1D
scale and bias must be the same rank (so, both scalars or both 1D)
scale must be the same data type as the output (note: the existing WPTs appear to assert the scale is always float32 (and the description of that test appears to have a typo))
scale must be positive

Re-emphasizing that dequantizing (u)int4 in CoreML is extremely limited (input must be const). @mwyrzykowski - any thoughts about how we can handle the proposed ops efficiently?

inexorabletash

Initial pass.

inexorabletash · 2025-01-16T22:12:59Z

index.bs

+<dl dfn-type=dict-member dfn-for=MLCumulativeSumOptions>
+    : <dfn>exclusive</dfn>
+    ::
+        Whether to include or exclude the current value in the output, meaning inclusive presum addition (see https://en.wikipedia.org/wiki/Prefix_sum) or exclusive post-sum addition. Given input *[1,2,3,4]*, inclusive addition would yield an output of *[1,3,6,10]* whereas exclusive would yield *[0,1,3,6]*. The default is inclusive.


Inconsistent between "presum" vs. "post-sum"

How about "inclusive prefix sum addition" and "exclusive prefix sum addition" (or w/o the "addition" part)?

Suggested change

Whether to include or exclude the current value in the output, meaning inclusive presum addition (see https://en.wikipedia.org/wiki/Prefix_sum) or exclusive post-sum addition. Given input *[1,2,3,4]*, inclusive addition would yield an output of *[1,3,6,10]* whereas exclusive would yield *[0,1,3,6]*. The default is inclusive.

Whether to include or exclude the current value in the output, meaning inclusive prefix sum addition or exclusive prefix sum addition [[Prefix-sum]]. Given input *[1,2,3,4]*, inclusive addition would yield an output of *[1,3,6,10]* whereas exclusive would yield *[0,1,3,6]*. The default is inclusive.

Here's the biblio entry to be added at the end of the <pre class="biblio"> block:

+ }, + "Prefix-Sum": { + "href": "https://en.wikipedia.org/wiki/Prefix_sum", + "title": "Prefix Sum", + "authors": ["The Wikipedia community"], + "date": "January 2025" } } </pre>

The Wikipedia article seems quite good. I believe this operation has been known for centuries(?), so hard to hunt down the canonical reference :-)

The reason why it is good to keep the spec references in their own section is to help assess their stability, licensing etc. during certain transitions. Especially important for normative references: https://www.w3.org/guide/process/tilt/normative-references.html

inexorabletash · 2025-01-16T22:13:59Z

index.bs

+
+    : <dfn>reversed</dfn>
+    ::
+        Whether to reverse the summation direction along the active axis to instead start from the high coordinate to low coordinate. Given input *[1,2,3,4]*, inclusive forward addition would yield an output of *[1,3,6,10]* whereas backward summation would yield *[10,9,7,4]*. The default is exclusive.


"The default is exclusive" - should be "The default is forward." ?

Also, inconsistent phrasing "inclusive forward addition" vs. "backward summation"

inexorabletash · 2025-01-16T22:16:35Z

index.bs

        - *greater*: Compare if the values of the first input tensor is greater, element-wise.
        - *greaterOrEqual*: Compare if the values of the first input tensor is greater or equal, element-wise.
        - *lesser*: Compare if the values of the first input tensor is lesser, element-wise.
        - *lesserOrEqual*: Compare if the values of the first input tensor is lesser or equal, element-wise.
        - *logicalNot*: Invert the values of the input tensor to values 0 or 1, element-wise. Specifically, when the input value is non-zero, invert it to 0. Conversely, for a zero input value, invert it to 1.
+        - *logicalAnd*: Compute the logical *and* operator, element-wise, treating any non-zero value as true and returning elements of 0 or 1.


For consistency, how about "of the two input tensors" instead of "operator" ?

inexorabletash · 2025-01-16T22:20:06Z

index.bs

  MLBinarySupportLimits greater;
  MLBinarySupportLimits greaterOrEqual;
  MLBinarySupportLimits lesser;
  MLBinarySupportLimits lesserOrEqual;
  MLLogicalNotSupportLimits logicalNot;
+  MLLogicalNotSupportLimits logicalAnd;
+  MLLogicalNotSupportLimits logicalOr;
+  MLLogicalNotSupportLimits logicalXor;
 };
 </script>



Need to add the new ops to the <div dfn-for... just below this line.

This ensures the arguments in the IDL get linked up appropriately.

inexorabletash · 2025-01-16T22:23:09Z

index.bs

@@ -3482,6 +3630,13 @@ partial dictionary MLOpSupportLimits {
        1. Return |output|.
    </div>

+    <div algorithm>
+    The <dfn method for=MLGraphBuilder>sign(|input|, |options|)</dfn> method steps are:
+        1. Let |output| be the result of running the [=MLGraphBuilder/element-wise-unary-op | create element-wise unary operation=] given "sign", |input|, signed types « {{MLOperandDataType/"float32"}}, {{MLOperandDataType/"float16"}}, {{MLOperandDataType/"int32"}}, {{MLOperandDataType/"int8"}} », and |options|.


Nit: We don't include flavor text like "signed types" before the list of supported types elsewhere.

inexorabletash · 2025-01-16T22:27:36Z

index.bs

+### dequantizeLinear ### {#api-mlgraphbuilder-dequantizelinear}
+Dequantizes an integer tensor to floating point space using the scale and zero-point bias, where `output = (input - zeroPoint) * scale`.
+
+The operation will be [=broadcast=] according to [[!numpy-broadcasting-rule]]. The input tensors must be [=bidirectionally broadcastable=]. The [=MLOperand/rank=] of the output tensor is the maximum [=MLOperand/rank=] of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors, and each dimension must be blockwise compatible with the output (e.g. given an input shape [12], scales of the following shapes are blockwise compatible {[1], [3], [4], [6], [12]} as they are all multiples of the input dimensions, but a shape of [5] would not be).


Maybe introduce "blockwise compatible" as a definition, to avoid duplicating it between dequantizeLinear() and quantizeLinear()?

inexorabletash · 2025-01-16T22:30:24Z

index.bs

+    1. [=list/For each=] |size| of |shapeInput|:
+        1. If |dimCount| is less than or equal to |axis| then [=iteration/continue=].
+        1. Set |shapeOutput|[|rankOutput| + |dimCount| - |axis| - 1] to |size|.
+        1. Increment |dimCount| by one.


This dimCount isn't used after it is calculated?

(Pre-existing issue!)

inexorabletash · 2025-01-16T22:35:05Z

index.bs

@@ -7878,6 +8674,45 @@ partial dictionary MLOpSupportLimits {
  </details>
 </div>

+### tile ### {#api-mlgraphbuilder-tile}
+Repeat a tensor the number of times along each dimension.


Wording "a number" rather than "the number" ?

inexorabletash · 2025-01-16T22:35:41Z

index.bs

+<div dfn-for="MLGraphBuilder/tile(input, options)" dfn-type=argument>
+    **Arguments:**
+        - <dfn>input</dfn>: an {{MLOperand}}. The input N-D tensor.
+        - <dfn>repetitions</dfn>: A count per each dimension of how many times to repeat that dimension. The repetitions count must match the input rank, using 1's for any axis that should retain the same size.


Wording: "per each" seems redundant. Either "count per dimension" or "count for each dimension" ?

inexorabletash · 2025-01-16T22:38:19Z

index.bs

@@ -8537,6 +9372,8 @@ Operations present in other neural network inference APIs can often be emulated

 <p class="note">{{Float16Array}} is at <a href="https://tc39.es/process-document/">ECMA Stage 3</a> signaling its design is finished. Implementers wanting to enable this type ahead native implementations can emulate the type by passing raw bits via {{Uint16Array}}. <a href="https://github.com/webmachinelearning/webnn/issues/373">[Issue webnn#373]</a></p>

+<p class="note">There is no Uint4Array/Int4Array class. Nybbles are stored in byte arrays of {{Uint8Array}} with the lower nybble in the lower bits, meaning tensor element 0 would be found in byte 0 bits 0-3, and tensor element 1 in byte 0 bits 4-7 (and so on, with tensor element 5 in byte 2 bits 4-7). Odd tensor element counts are rounded up to whole bytes, and last nybble is ignored, meaning a 5 element tensor uses 3 bytes.</a></p>


I think "nibble" is the preferred spelling.

Also, the examples may be a bit confusing, as the first example mentions tensor element 5 (so, a 6-or-more element tensor) which would pack the that element's bits into the upper nibble, whereas the second example mentions a 5 element tensor which would pack the last element's bits into the lower nibble. Maybe intentionally pick a number other than 5 for the first or second example?

The definitions of byte length and validate buffer with descriptor should get updated. Probably more places too.

Add to MLOperandDataType enum too.

Also the definition of the cast() operator and the cast algorithm - even just to call out that uint4 is not permitted.

fdwr · 2025-01-17T03:20:30Z

index.bs

+  <tr>
+    <td>{{scale}}</td>
+    <td>{{MLOperandDataType/"float32"}}, {{MLOperandDataType/"float16"}}</td>
+    <td>0 to {{input}}'s [=MLOperand/rank=]</td>


To reduce testing complexity some, we could require the scale and zeroPoint are explicitly reshaped to the same rank as input ahead of time? That would also nicely resolve variant oddities in how the caller express axes ahead of time🤔.

Reviewing other ops, requiring scale and zeroPoint to be the same rank as the input would be consistent. Easy enough for frameworks to inject a reshape().

If someone was manually coding against the API then supporting scalars would be convenient, but I don't think we should prioritize that.

fdwr · 2025-01-17T03:23:41Z

index.bs

+  <tr>
+    <td>{{indices}}</td>
+    <td>{{MLOperandDataType/"int32"}}, {{MLOperandDataType/"uint32"}}, {{MLOperandDataType/"int64"}}</td>
+    <td>&gt; 1</td>


I don't see other occurrences of > 1. Should I use 2 to N instead?

"2 to N" I guess?

While "> 1" is compact and readable, making it "2 to N" is probably what we should do, since N is a defined term and we may allow implementations to define a maximum N.

Please take a peek at the definition of "allowed ranks" and reword if you think it's needed. It already mentions ranges, so might be okay.

fdwr · 2025-01-17T03:24:25Z

Another "TODO" - the new ops need "constraints" tables

Added data type tables.

Initial pass.

Thanks - will address more ~~tomorrow~~ after the weekend.

inexorabletash · 2025-01-17T19:17:22Z

Add "Resolves #779" to the summary, so that issue will get linked to this PR and auto-closed when this merges?

... and "Resolves #773"
... and "Resolves #772"
... and "Resolves #467"
... and "Resolves #93" (I think?)

Maybe #767 too

fdwr added 5 commits December 18, 2024 19:08

Wave 3 skeleton

cabfc94

Add cumulativeSum

3e0be03

Merge with main

42b6522

Update more operators

9d0d231

Updates

8789b79

fdwr changed the title ~~Opset wave3~~ Operator set wave 3 Jan 16, 2025

inexorabletash reviewed Jan 16, 2025

View reviewed changes

Copy pasta cleanup - feedback updates pending

99ece12

fdwr commented Jan 17, 2025

View reviewed changes

Rank typo for Q and DQ

4da4feb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operator set wave 3 #805

Operator set wave 3 #805

fdwr commented Jan 16, 2025 •

edited by pr-preview bot

Loading

inexorabletash commented Jan 16, 2025

inexorabletash commented Jan 16, 2025

inexorabletash left a comment

inexorabletash Jan 16, 2025

anssiko Jan 20, 2025

inexorabletash Jan 16, 2025

inexorabletash Jan 16, 2025

inexorabletash Jan 16, 2025

inexorabletash Jan 16, 2025

inexorabletash Jan 16, 2025

inexorabletash Jan 16, 2025

inexorabletash Jan 16, 2025

inexorabletash Jan 16, 2025

inexorabletash Jan 16, 2025

inexorabletash Jan 16, 2025

inexorabletash Jan 17, 2025

inexorabletash Jan 17, 2025

fdwr Jan 17, 2025

inexorabletash Jan 17, 2025 •

edited

Loading

fdwr Jan 17, 2025 •

edited

Loading

inexorabletash Jan 17, 2025

fdwr commented Jan 17, 2025 •

edited

Loading

inexorabletash commented Jan 17, 2025 •

edited

Loading

	Whether to include or exclude the current value in the output, meaning inclusive presum addition (see https://en.wikipedia.org/wiki/Prefix_sum) or exclusive post-sum addition. Given input [1,2,3,4], inclusive addition would yield an output of [1,3,6,10] whereas exclusive would yield [0,1,3,6]. The default is inclusive.
	Whether to include or exclude the current value in the output, meaning inclusive prefix sum addition or exclusive prefix sum addition [[Prefix-sum]]. Given input [1,2,3,4], inclusive addition would yield an output of [1,3,6,10] whereas exclusive would yield [0,1,3,6]. The default is inclusive.

		@@ -8537,6 +9372,8 @@ Operations present in other neural network inference APIs can often be emulated

		<p class="note">{{Float16Array}} is at <a href="https://tc39.es/process-document/">ECMA Stage 3</a> signaling its design is finished. Implementers wanting to enable this type ahead native implementations can emulate the type by passing raw bits via {{Uint16Array}}. <a href="https://github.com/webmachinelearning/webnn/issues/373">[Issue webnn#373]</a></p>

		<p class="note">There is no Uint4Array/Int4Array class. Nybbles are stored in byte arrays of {{Uint8Array}} with the lower nybble in the lower bits, meaning tensor element 0 would be found in byte 0 bits 0-3, and tensor element 1 in byte 0 bits 4-7 (and so on, with tensor element 5 in byte 2 bits 4-7). Odd tensor element counts are rounded up to whole bytes, and last nybble is ignored, meaning a 5 element tensor uses 3 bytes.</a></p>

Operator set wave 3 #805

Are you sure you want to change the base?

Operator set wave 3 #805

Conversation

fdwr commented Jan 16, 2025 • edited by pr-preview bot Loading

inexorabletash commented Jan 16, 2025

inexorabletash commented Jan 16, 2025

inexorabletash left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

inexorabletash Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

fdwr Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fdwr commented Jan 17, 2025 • edited Loading

inexorabletash commented Jan 17, 2025 • edited Loading

fdwr commented Jan 16, 2025 •

edited by pr-preview bot

Loading

inexorabletash Jan 17, 2025 •

edited

Loading

fdwr Jan 17, 2025 •

edited

Loading

fdwr commented Jan 17, 2025 •

edited

Loading

inexorabletash commented Jan 17, 2025 •

edited

Loading