Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator set wave 3 #805

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Operator set wave 3 #805

wants to merge 7 commits into from

Conversation

fdwr
Copy link
Collaborator

@fdwr fdwr commented Jan 16, 2025

Adds the following operators, per #375:

Some TODO's remain:

  • finishing algorithm steps
  • examples to the scattering/gathering operators
  • data type tables
partial interface MLGraphBuilder
{
    ...
    MLOperand cumulativeSum(MLOperand input, unsigned long axis, optional MLCumulativeSumOptions options = {});
    MLOperand sign(MLOperand input, optional MLOperatorOptions options = {});
    MLOperand tile(MLOperand input, sequence<unsigned long> repetitions, optional MLOperatorOptions options = {});

    // Extends the family beyond the existing gather.
    MLOperand gatherElements(MLOperand input, MLOperand indices, optional MLGatherOptions options = {});
    MLOperand scatterElements(MLOperand input, MLOperand indices, MLOperand updates, optional MLScatterOptions options = {});
    MLOperand gatherND(MLOperand input, MLOperand indices, optional MLOperatorOptions options = {});
    MLOperand scatterND(MLOperand input, MLOperand indices, MLOperand updates, optional MLOperatorOptions options = {});

    MLOperand dequantizeLinear(MLOperand input, MLOperand scale, MLOperand zeroPoint, optional MLOperatorOptions options = {});
    MLOperand quantizeLinear(MLOperand input, MLOperand scale, MLOperand zeroPoint, optional MLOperatorOptions options = {});

    MLOperand logicalAnd(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
    MLOperand logicalOr(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
    MLOperand logicalXor(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});
    MLOperand notEqual(MLOperand a, MLOperand b, optional MLOperatorOptions options = {});

    MLOperand reverse(MLOperand input, optional MLReverseOptions options = {});
    MLOperand slice(
        MLOperand input,
        sequence<[EnforceRange] unsigned long> starts,
        sequence<[EnforceRange] unsigned long> sizes,
        optional MLSliceOptions options = {} // Now includes steps
    );
    ...
}
dictionary MLCumulativeSumOptions : MLOperatorOptions
{
    bool exclusive = false; // Post-sum addition rather than inclusive pre-sum. https://en.wikipedia.org/wiki/Prefix_sum
    bool reversed = false; // Reverse the summation direction
}

// Already exists for `gather`. Reuse for `gatherElements` too.
dictionary MLGatherOptions : MLOperatorOptions
{
    unsigned long axis = 0;
};

dictionary MLScatterOptions : MLOperatorOptions
{
    unsigned long axis = 0;
};

dictionary MLReverseOptions : MLOperatorOptions {
  sequence<[EnforceRange] unsigned long> axes;
};

dictionary MLSliceOptions : MLOperatorOptions {
  sequence<[EnforceRange] unsigned long> strides;
};

Preview | Diff

@inexorabletash
Copy link
Member

Another "TODO" - the new ops need "constraints" tables

@fdwr fdwr changed the title Opset wave3 Operator set wave 3 Jan 16, 2025
@inexorabletash
Copy link
Member

Note from discussion w/ @a-sully - CoreML has restrictions on the dequantize op that we'll need to think about.

  • scale and bias must be constant
  • the input must be int8 or uint8 (note: CoreML has a different operator for (u)int4, but it requires everything - including the input - to be constant)
  • scale and bias must be either scalars or 1D
  • scale and bias must be the same rank (so, both scalars or both 1D)
  • scale must be the same data type as the output (note: the existing WPTs appear to assert the scale is always float32 (and the description of that test appears to have a typo))
  • scale must be positive

Re-emphasizing that dequantizing (u)int4 in CoreML is extremely limited (input must be const). @mwyrzykowski - any thoughts about how we can handle the proposed ops efficiently?

Copy link
Member

@inexorabletash inexorabletash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pass.

<dl dfn-type=dict-member dfn-for=MLCumulativeSumOptions>
: <dfn>exclusive</dfn>
::
Whether to include or exclude the current value in the output, meaning inclusive presum addition (see https://en.wikipedia.org/wiki/Prefix_sum) or exclusive post-sum addition. Given input *[1,2,3,4]*, inclusive addition would yield an output of *[1,3,6,10]* whereas exclusive would yield *[0,1,3,6]*. The default is inclusive.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent between "presum" vs. "post-sum"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "inclusive prefix sum addition" and "exclusive prefix sum addition" (or w/o the "addition" part)?

Suggested change
Whether to include or exclude the current value in the output, meaning inclusive presum addition (see https://en.wikipedia.org/wiki/Prefix_sum) or exclusive post-sum addition. Given input *[1,2,3,4]*, inclusive addition would yield an output of *[1,3,6,10]* whereas exclusive would yield *[0,1,3,6]*. The default is inclusive.
Whether to include or exclude the current value in the output, meaning inclusive prefix sum addition or exclusive prefix sum addition [[Prefix-sum]]. Given input *[1,2,3,4]*, inclusive addition would yield an output of *[1,3,6,10]* whereas exclusive would yield *[0,1,3,6]*. The default is inclusive.

Here's the biblio entry to be added at the end of the <pre class="biblio"> block:

+  },
+  "Prefix-Sum": {
+    "href": "https://en.wikipedia.org/wiki/Prefix_sum",
+    "title": "Prefix Sum",
+    "authors": ["The Wikipedia community"],
+    "date": "January 2025"
   }
 }
 </pre>

The Wikipedia article seems quite good. I believe this operation has been known for centuries(?), so hard to hunt down the canonical reference :-)

The reason why it is good to keep the spec references in their own section is to help assess their stability, licensing etc. during certain transitions. Especially important for normative references: https://www.w3.org/guide/process/tilt/normative-references.html


: <dfn>reversed</dfn>
::
Whether to reverse the summation direction along the active axis to instead start from the high coordinate to low coordinate. Given input *[1,2,3,4]*, inclusive forward addition would yield an output of *[1,3,6,10]* whereas backward summation would yield *[10,9,7,4]*. The default is exclusive.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The default is exclusive" - should be "The default is forward." ?

Also, inconsistent phrasing "inclusive forward addition" vs. "backward summation"

- *greater*: Compare if the values of the first input tensor is greater, element-wise.
- *greaterOrEqual*: Compare if the values of the first input tensor is greater or equal, element-wise.
- *lesser*: Compare if the values of the first input tensor is lesser, element-wise.
- *lesserOrEqual*: Compare if the values of the first input tensor is lesser or equal, element-wise.
- *logicalNot*: Invert the values of the input tensor to values 0 or 1, element-wise. Specifically, when the input value is non-zero, invert it to 0. Conversely, for a zero input value, invert it to 1.
- *logicalAnd*: Compute the logical *and* operator, element-wise, treating any non-zero value as true and returning elements of 0 or 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, how about "of the two input tensors" instead of "operator" ?

MLBinarySupportLimits greater;
MLBinarySupportLimits greaterOrEqual;
MLBinarySupportLimits lesser;
MLBinarySupportLimits lesserOrEqual;
MLLogicalNotSupportLimits logicalNot;
MLLogicalNotSupportLimits logicalAnd;
MLLogicalNotSupportLimits logicalOr;
MLLogicalNotSupportLimits logicalXor;
};
</script>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add the new ops to the <div dfn-for... just below this line.

This ensures the arguments in the IDL get linked up appropriately.

@@ -3482,6 +3630,13 @@ partial dictionary MLOpSupportLimits {
1. Return |output|.
</div>

<div algorithm>
The <dfn method for=MLGraphBuilder>sign(|input|, |options|)</dfn> method steps are:
1. Let |output| be the result of running the [=MLGraphBuilder/element-wise-unary-op | create element-wise unary operation=] given "sign", |input|, signed types « {{MLOperandDataType/"float32"}}, {{MLOperandDataType/"float16"}}, {{MLOperandDataType/"int32"}}, {{MLOperandDataType/"int8"}} », and |options|.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We don't include flavor text like "signed types" before the list of supported types elsewhere.

index.bs Outdated
### dequantizeLinear ### {#api-mlgraphbuilder-dequantizelinear}
Dequantizes an integer tensor to floating point space using the scale and zero-point bias, where `output = (input - zeroPoint) * scale`.

The operation will be [=broadcast=] according to [[!numpy-broadcasting-rule]]. The input tensors must be [=bidirectionally broadcastable=]. The [=MLOperand/rank=] of the output tensor is the maximum [=MLOperand/rank=] of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors, and each dimension must be blockwise compatible with the output (e.g. given an input shape [12], scales of the following shapes are blockwise compatible {[1], [3], [4], [6], [12]} as they are all multiples of the input dimensions, but a shape of [5] would not be).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe introduce "blockwise compatible" as a definition, to avoid duplicating it between dequantizeLinear() and quantizeLinear()?

index.bs Outdated
1. [=list/For each=] |size| of |shapeInput|:
1. If |dimCount| is less than or equal to |axis| then [=iteration/continue=].
1. Set |shapeOutput|[|rankOutput| + |dimCount| - |axis| - 1] to |size|.
1. Increment |dimCount| by one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dimCount isn't used after it is calculated?

(Pre-existing issue!)

index.bs Outdated
@@ -7878,6 +8674,45 @@ partial dictionary MLOpSupportLimits {
</details>
</div>

### tile ### {#api-mlgraphbuilder-tile}
Repeat a tensor the number of times along each dimension.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wording "a number" rather than "the number" ?

index.bs Outdated
<div dfn-for="MLGraphBuilder/tile(input, options)" dfn-type=argument>
**Arguments:**
- <dfn>input</dfn>: an {{MLOperand}}. The input N-D tensor.
- <dfn>repetitions</dfn>: A count per each dimension of how many times to repeat that dimension. The repetitions count must match the input rank, using 1's for any axis that should retain the same size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wording: "per each" seems redundant. Either "count per dimension" or "count for each dimension" ?

@@ -8537,6 +9372,8 @@ Operations present in other neural network inference APIs can often be emulated

<p class="note">{{Float16Array}} is at <a href="https://tc39.es/process-document/">ECMA Stage 3</a> signaling its design is finished. Implementers wanting to enable this type ahead native implementations can emulate the type by passing raw bits via {{Uint16Array}}. <a href="https://github.com/webmachinelearning/webnn/issues/373">[Issue webnn#373]</a></p>

<p class="note">There is no Uint4Array/Int4Array class. Nybbles are stored in byte arrays of {{Uint8Array}} with the lower nybble in the lower bits, meaning tensor element 0 would be found in byte 0 bits 0-3, and tensor element 1 in byte 0 bits 4-7 (and so on, with tensor element 5 in byte 2 bits 4-7). Odd tensor element counts are rounded up to whole bytes, and last nybble is ignored, meaning a 5 element tensor uses 3 bytes.</a></p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "nibble" is the preferred spelling.

Also, the examples may be a bit confusing, as the first example mentions tensor element 5 (so, a 6-or-more element tensor) which would pack the that element's bits into the upper nibble, whereas the second example mentions a 5 element tensor which would pack the last element's bits into the lower nibble. Maybe intentionally pick a number other than 5 for the first or second example?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definitions of byte length and validate buffer with descriptor should get updated. Probably more places too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to MLOperandDataType enum too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the definition of the cast() operator and the cast algorithm - even just to call out that uint4 is not permitted.

<tr>
<td>{{scale}}</td>
<td>{{MLOperandDataType/"float32"}}, {{MLOperandDataType/"float16"}}</td>
<td>0 to {{input}}'s [=MLOperand/rank=]</td>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce testing complexity some, we could require the scale and zeroPoint are explicitly reshaped to the same rank as input ahead of time? That would also nicely resolve variant oddities in how the caller express axes ahead of time🤔.

Copy link
Member

@inexorabletash inexorabletash Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing other ops, requiring scale and zeroPoint to be the same rank as the input would be consistent. Easy enough for frameworks to inject a reshape().

If someone was manually coding against the API then supporting scalars would be convenient, but I don't think we should prioritize that.

<tr>
<td>{{indices}}</td>
<td>{{MLOperandDataType/"int32"}}, {{MLOperandDataType/"uint32"}}, {{MLOperandDataType/"int64"}}</td>
<td>&gt; 1</td>
Copy link
Collaborator Author

@fdwr fdwr Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see other occurrences of > 1. Should I use 2 to N instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"2 to N" I guess?

While "> 1" is compact and readable, making it "2 to N" is probably what we should do, since N is a defined term and we may allow implementations to define a maximum N.

Please take a peek at the definition of "allowed ranks" and reword if you think it's needed. It already mentions ranges, so might be okay.

@fdwr
Copy link
Collaborator Author

fdwr commented Jan 17, 2025

Another "TODO" - the new ops need "constraints" tables

Added data type tables.

Initial pass.

Thanks - will address more tomorrow after the weekend.

@inexorabletash
Copy link
Member

inexorabletash commented Jan 17, 2025

Add "Resolves #779" to the summary, so that issue will get linked to this PR and auto-closed when this merges?

... and "Resolves #773"
... and "Resolves #772"
... and "Resolves #467"
... and "Resolves #93" (I think?)

Maybe #767 too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants