Skip to content

Commit

Permalink
[spec/interpreter/tests] Prep merge of function ref proposal (WebAsse…
Browse files Browse the repository at this point in the history
  • Loading branch information
rossberg authored Oct 14, 2023
1 parent c15f46c commit 8f93acc
Show file tree
Hide file tree
Showing 101 changed files with 7,196 additions and 2,519 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci-interpreter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
- name: Setup OCaml
uses: ocaml/setup-ocaml@v2
with:
ocaml-compiler: 4.12.x
ocaml-compiler: 4.14.x
- name: Setup OCaml tools
run: opam install --yes ocamlfind.1.9.5 js_of_ocaml.4.0.0 js_of_ocaml-ppx.4.0.0
- name: Setup Node.js
Expand Down
197 changes: 162 additions & 35 deletions document/core/appendix/algorithm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,64 +21,139 @@ The algorithm is expressed in typed pseudo code whose semantics is intended to b
Data Structures
~~~~~~~~~~~~~~~

Types are representable as an enumeration.
Types
.....

Value types are representable as a set of enumerations.

.. code-block:: pseudo
type val_type = I32 | I64 | F32 | F64 | V128 | Funcref | Externref
type num_type = I32 | I64 | F32 | F64
type vec_type = V128
type heap_type = Def(idx : nat) | Func | Extern | Bot
type ref_type = Ref(heap : heap_type, null : bool)
type val_type = num_type | vec_type | ref_type | Bot
func is_num(t : val_type | Unknown) : bool =
return t = I32 || t = I64 || t = F32 || t = F64 || t = Unknown
func is_num(t : val_type) : bool =
return t = I32 || t = I64 || t = F32 || t = F64 || t = Bot
func is_vec(t : val_type | Unknown) : bool =
return t = V128 || t = Unknown
func is_vec(t : val_type) : bool =
return t = V128 || t = Bot
func is_ref(t : val_type | Unknown) : bool =
return t = Funcref || t = Externref || t = Unknown
func is_ref(t : val_type) : bool =
return not (is_num t || is_vec t) || t = Bot
The algorithm uses two separate stacks: the *value stack* and the *control stack*.
The former tracks the :ref:`types <syntax-valtype>` of operand values on the :ref:`stack <stack>`,
the latter surrounding :ref:`structured control instructions <syntax-instr-control>` and their associated :ref:`blocks <syntax-instr-control>`.
Equivalence and subtyping checks can be defined on these types.

.. code-block:: pseudo
type val_stack = stack(val_type | Unknown)
func eq_def(n1, n2) =
... // check that both type definitions are equivalent (TODO)
func matches_null(null1 : bool, null2 : bool) : bool =
return null1 = null2 || null2
func matches_heap(t1 : heap_type, t2 : heap_type) : bool =
switch (t1, t2)
case (Def(n1), Def(n2))
return eq_def(n1, n2)
case (Def(_), Func)
return true
case (Bot, _)
return true
case (_, _)
return t1 = t2
func matches_ref(t1 : ref_type, t2 : ref_type) : bool =
return matches_heap(t1.heap, t2.heap) && matches_null(t1.null, t2.null)
func matches(t1 : val_type, t2 : val_type) : bool =
switch (t1, t2)
case (Ref(_), Ref(_))
return matches_ref(t1, t2)
case (Bot, _)
return true
case (_, _)
return t1 = t2
Context
.......

Validation requires a :ref:`context <context>` for checking uses of :ref:`indices <syntax-index>`.
For the purpose of presenting the algorithm, it is maintained in a set of global variables:

.. code-block:: pseudo
var locals : array(val_type)
var locals_init : array(bool)
var globals : array(global_type)
var funcs : array(func_type)
var tables : array(table_type)
var mems : array(mem_type)
This assumes suitable representations for the various :ref:`types <syntax-type>` besides :code:`val_type`, which are omitted here.

For locals, there is an additional array recording the initialization status of each local.

Stacks
......

The algorithm uses three separate stacks: the *value stack*, the *control stack*, and the *initialization stack*.
The value stack tracks the :ref:`types <syntax-valtype>` of operand values on the :ref:`stack <stack>`.
The control stack tracks surrounding :ref:`structured control instructions <syntax-instr-control>` and their associated :ref:`blocks <syntax-instr-control>`.
The initialization stack records all :ref:`locals <syntax-local>` that have been initialized since the beginning of the function.

.. code-block:: pseudo
type val_stack = stack(val_type)
type init_stack = stack(u32)
type ctrl_stack = stack(ctrl_frame)
type ctrl_frame = {
opcode : opcode
start_types : list(val_type)
end_types : list(val_type)
height : nat
val_height : nat
init_height : nat
unreachable : bool
}
For each value, the value stack records its :ref:`value type <syntax-valtype>`, or :code:`Unknown` when the type is not known.

For each entered block, the control stack records a *control frame* with the originating opcode, the types on the top of the operand stack at the start and end of the block (used to check its result as well as branches), the height of the operand stack at the start of the block (used to check that operands do not underflow the current block), and a flag recording whether the remainder of the block is unreachable (used to handle :ref:`stack-polymorphic <polymorphism>` typing after branches).
For each entered block, the control stack records a *control frame* with the originating opcode, the types on the top of the operand stack at the start and end of the block (used to check its result as well as branches), the height of the operand stack at the start of the block (used to check that operands do not underflow the current block), the height of the initialization stack at the start of the block (used to reset initialization status at the end of the block), and a flag recording whether the remainder of the block is unreachable (used to handle :ref:`stack-polymorphic <polymorphism>` typing after branches).

For the purpose of presenting the algorithm, the operand and control stacks are simply maintained as global variables:
For the purpose of presenting the algorithm, these stacks are simply maintained as global variables:

.. code-block:: pseudo
var vals : val_stack
var inits : init_stack
var ctrls : ctrl_stack
However, these variables are not manipulated directly by the main checking function, but through a set of auxiliary functions:

.. code-block:: pseudo
func push_val(type : val_type | Unknown) =
func push_val(type : val_type) =
vals.push(type)
func pop_val() : val_type | Unknown =
if (vals.size() = ctrls[0].height && ctrls[0].unreachable) return Unknown
func pop_val() : val_type =
if (vals.size() = ctrls[0].height && ctrls[0].unreachable) return Bot
error_if(vals.size() = ctrls[0].height)
return vals.pop()
func pop_val(expect : val_type | Unknown) : val_type | Unknown =
func pop_val(expect : val_type) : val_type =
let actual = pop_val()
error_if(not matches(actual, expect))
return actual
func pop_num() : num_type | Bot =
let actual = pop_val()
error_if(actual =/= expect && actual =/= Unknown && expect =/= Unknown)
error_if(not is_num(actual))
return actual
func pop_ref() : ref_type =
let actual = pop_val()
error_if(not is_ref(actual))
if (actual = Bot) return Ref(Bot, false)
return actual
func push_vals(types : list(val_type)) = foreach (t in types) push_val(t)
Expand All @@ -92,10 +167,10 @@ Pushing an operand value simply pushes the respective type to the value stack.
Popping an operand value checks that the value stack does not underflow the current block and then removes one type.
But first, a special case is handled where the block contains no known values, but has been marked as unreachable.
That can occur after an unconditional branch, when the stack is typed :ref:`polymorphically <polymorphism>`.
In that case, an unknown type is returned.
In that case, the :code:`Bot` type is returned, because that is a *principal* choice trivially satisfying all use constraints.

A second function for popping an operand value takes an expected type, which the actual operand type is checked against.
The types may differ in case one of them is Unknown.
The types may differ by subtyping, including the case where the actual type is :code:`Bot`, and thereby matches unconditionally.
The function returns the actual type popped from the stack.

Finally, there are accumulative functions for pushing or popping multiple operand types.
Expand All @@ -105,25 +180,49 @@ Finally, there are accumulative functions for pushing or popping multiple operan
so that, e.g., :code:`ctrls[0]` accesses the element pushed last.


The initialization stack and the initialization status of locals is manipulated through the following functions:

.. code-block:: pseudo
func get_local(idx : u32) =
error_if(not locals_init[idx])
func set_local(idx : u32) =
if (not locals_init[idx])
inits.push(idx)
locals_init[idx] := true
func reset_locals(height : nat) =
while (inits.size() > height)
locals_init[inits.pop()] := false
Getting a local verifies that it is known to be initialized.
When a local is set that was not set already,
then its initialization status is updated and the change is recorded in the initialization stack.
Thus, the initialization status of all locals can be reset to a previous state by denoting a specific height in the initialization stack.

The size of the initialization stack is bounded by the number of (non-defaultable) locals in a function, so can be preallocated by an algorithm.

The control stack is likewise manipulated through auxiliary functions:

.. code-block:: pseudo
func push_ctrl(opcode : opcode, in : list(val_type), out : list(val_type)) =
let frame = ctrl_frame(opcode, in, out, vals.size(), false)
let frame = ctrl_frame(opcode, in, out, vals.size(), inits.size(), false)
ctrls.push(frame)
push_vals(in)
func pop_ctrl() : ctrl_frame =
error_if(ctrls.is_empty())
let frame = ctrls[0]
pop_vals(frame.end_types)
error_if(vals.size() =/= frame.height)
error_if(vals.size() =/= frame.val_height)
reset_locals(frame.init_height)
ctrls.pop()
return frame
func label_types(frame : ctrl_frame) : list(val_types) =
return (if frame.opcode == loop then frame.start_types else frame.end_types)
return (if (frame.opcode = loop) frame.start_types else frame.end_types)
func unreachable() =
vals.resize(ctrls[0].height)
Expand All @@ -135,6 +234,7 @@ It allocates a new frame record recording them along with the current height of
Popping a frame first checks that the control stack is not empty.
It then verifies that the operand stack contains the right types of values expected at the end of the exited block and pops them off the operand stack.
Afterwards, it checks that the stack has shrunk back to its initial height.
Finally, it undoes all changes to the initialization status of locals that happend inside the block.

The type of the :ref:`label <syntax-label>` associated with a control frame is either that of the stack at the start or the end of the frame, determined by the opcode that it originates from.

Expand All @@ -146,7 +246,8 @@ it is an invariant of the validation algorithm that there always is at least one
.. note::
Even with the unreachable flag set, consecutive operands are still pushed to and popped from the operand stack.
That is necessary to detect invalid :ref:`examples <polymorphism>` like :math:`(\UNREACHABLE~(\I32.\CONST)~\I64.\ADD)`.
However, a polymorphic stack cannot underflow, but instead generates :code:`Unknown` types as needed.
However, a polymorphic stack cannot underflow, but instead generates :code:`Bot` types as needed.



.. index:: opcode
Expand All @@ -157,10 +258,6 @@ Validation of Opcode Sequences
The following function shows the validation of a number of representative instructions that manipulate the stack.
Other instructions are checked in a similar manner.

.. note::
Various instructions not shown here will additionally require the presence of a validation :ref:`context <context>` for checking uses of :ref:`indices <syntax-index>`.
That is an easy addition and therefore omitted from this presentation.

.. code-block:: pseudo
func validate(opcode) =
Expand All @@ -177,16 +274,32 @@ Other instructions are checked in a similar manner.
pop_val(I32)
let t1 = pop_val()
let t2 = pop_val()
error_if(not ((is_num(t1) && is_num(t2)) || (is_vec(t1) && is_vec(t2))))
error_if(t1 =/= t2 && t1 =/= Unknown && t2 =/= Unknown)
push_val(if (t1 = Unknown) t2 else t1)
error_if(not (is_num(t1) && is_num(t2) || is_vec(t1) && is_vec(t2)))
error_if(t1 =/= t2 && t1 =/= Bot && t2 =/= Bot)
push_val(if (t1 = Bot) t2 else t1)
case (select t)
pop_val(I32)
pop_val(t)
pop_val(t)
push_val(t)
case (ref.is_null)
pop_ref()
push_val(I32)
case (ref.as_non_null)
let rt = pop_ref()
push_val(Ref(rt.heap, false))
case (local.get x)
get_local(x)
push_val(locals[x])
case (local.set x)
pop_val(locals[x])
set_local(x)
case (unreachable)
unreachable()
Expand Down Expand Up @@ -234,6 +347,20 @@ Other instructions are checked in a similar manner.
pop_vals(label_types(ctrls[m]))
unreachable()
case (br_on_null n)
error_if(ctrls.size() < n)
let rt = pop_ref()
pop_vals(label_types(ctrls[n]))
push_vals(label_types(ctrls[n]))
push_val(Ref(rt.heap, false))
case (call_ref)
let rt = pop_ref()
if (rt.heap =/= Bot)
error_if(not is_def(rt.heap))
let ft = funcs[rt.heap.idx]
pop_vals(ft.params)
push_vals(ft.results)
.. note::
It is an invariant under the current WebAssembly instruction set that an operand of :code:`Unknown` type is never duplicated on the stack.
Expand Down
29 changes: 28 additions & 1 deletion document/core/appendix/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Added |FUNCREF| and |EXTERNREF| as new value types and respective instructions [

* New :ref:`reference instructions <syntax-instr-ref>`: |REFNULL|, |REFFUNC|, |REFISNULL|

* Enrich :ref:`parametric instruction <syntax-instr-parametric>`: |SELECT| with optional type immediate
* Extended :ref:`parametric instruction <syntax-instr-parametric>`: |SELECT| with optional type immediate

* New :ref:`declarative <syntax-elemmode>` form of :ref:`element segment <syntax-elem>`

Expand Down Expand Up @@ -170,5 +170,32 @@ Added instructions to perform tail calls [#proposal-tailcall]_.
* New :ref:`control instructions <syntax-instr-control>`: :math:`RETURNCALL` and :math:`RETURNCALLINDIRECT`


.. index:: reference, reference type, heap type, value type, local, local type, instruction, instruction type, table, function, function type, matching, subtyping

Typeful References
..................

Added more precise types for references [#proposal-typedref]_.

* New generalised form of :ref:`reference types <syntax-reftype>`: :math:`(\REF~\NULL^?~\heaptype)`

* New class of :ref:`heap types <syntax-heaptype>`: |FUNC|, |EXTERN|, :math:`\typeidx`

* Basic :ref:`subtyping <match>` on :ref:`reference <match-reftype>` and :ref:`value <match-valtype>` types

* New :ref:`reference instructions <syntax-instr-ref>`: |REFASNONNULL|, |BRONNULL|, |BRONNONNULL|

* New :ref:`control instruction <syntax-instr-control>`: |CALLREF|

* Refined typing of :ref:`reference instruction <syntax-instr-ref>` |REFFUNC| with more precise result type

* Refined typing of :ref:`local instructions <valid-instr-variable>` and :ref:`instruction sequences <valid-instr-seq>` to track the :ref:`initialization status <syntax-init>` of :ref:`locals <syntax-local>` with non-:ref:`defaultable <valid-defaultable>` type

* Extended :ref:`table definitions <syntax-table>` with optional initializer expression


.. [#proposal-tailcall]
https://github.com/WebAssembly/spec/tree/main/proposals/tail-call/
.. [#proposal-typedref]
https://github.com/WebAssembly/spec/tree/main/proposals/function-references/
Loading

0 comments on commit 8f93acc

Please sign in to comment.