Skip to content

Commit

Permalink
Convert binary format chapter
Browse files Browse the repository at this point in the history
  • Loading branch information
rossberg committed Jun 5, 2024
1 parent edbe4a7 commit f11d7aa
Show file tree
Hide file tree
Showing 31 changed files with 1,334 additions and 1,358 deletions.
62 changes: 23 additions & 39 deletions document/core/binary/conventions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,24 +39,24 @@ Grammar

The following conventions are adopted in defining grammar rules for the binary format.
They mirror the conventions used for :ref:`abstract syntax <grammar>`.
In order to distinguish symbols of the binary syntax from symbols of the abstract syntax, :math:`\mathtt{typewriter}` font is adopted for the former.
In order to distinguish symbols of the binary syntax from symbols of the abstract syntax, ${grammar-case: Btypewriter} font is adopted for the former.

* Terminal symbols are :ref:`bytes <syntax-byte>` expressed in hexadecimal notation: :math:`\hex{0F}`.
* Terminal symbols are :ref:`bytes <syntax-byte>` expressed in hexadecimal notation: ${grammar-case: 0x0F}.

* Nonterminal symbols are written in typewriter font: :math:`\B{valtype}, \B{instr}`.
* Nonterminal symbols are written in typewriter font: ${grammar-case: Bvaltype}, ${grammar-case: Binstr}.

* :math:`B^n` is a sequence of :math:`n\geq 0` iterations of :math:`B`.
* ${grammar-case: $(B)^n} is a sequence of ${:n>=0} iterations of ${:B}.

* :math:`B^\ast` is a possibly empty sequence of iterations of :math:`B`.
(This is a shorthand for :math:`B^n` used where :math:`n` is not relevant.)
* ${grammar-case: $(B)*} is a possibly empty sequence of iterations of ${:B}.
(This is a shorthand for ${:B^n} used where ${:n} is not relevant.)

* :math:`B^?` is an optional occurrence of :math:`B`.
(This is a shorthand for :math:`B^n` where :math:`n \leq 1`.)
* ${grammar-case: $(B)?} is an optional occurrence of ${:B}.
(This is a shorthand for ${:B^n} where ${:n<=1}.)

* :math:`x{:}B` denotes the same language as the nonterminal :math:`B`, but also binds the variable :math:`x` to the attribute synthesized for :math:`B`.
A pattern may also be used instead of a variable, e.g., :math:`7{:}B`.
* ${grammar-case: x:$(B)} denotes the same language as the nonterminal ${:B}, but also binds the variable ${:x} to the attribute synthesized for ${:B}.
A pattern may also be used instead of a variable, e.g., ${grammar-case: 7:$(B)}.

* Productions are written :math:`\B{sym} ::= B_1 \Rightarrow A_1 ~|~ \dots ~|~ B_n \Rightarrow A_n`, where each :math:`A_i` is the attribute that is synthesized for :math:`\B{sym}` in the given case, usually from attribute variables bound in :math:`B_i`.
* Productions are written ${grammar: Bsym}, where each ${:A_i} is the attribute that is synthesized for ${grammar-case: Bsym} in the given case, usually from attribute variables bound in ${:B_i}.

* Some productions are augmented by side conditions in parentheses, which restrict the applicability of the production. They provide a shorthand for a combinatorial expansion of the production into many separate cases.

Expand All @@ -66,31 +66,19 @@ In order to distinguish symbols of the binary syntax from symbols of the abstrac
.. note::
For example, the :ref:`binary grammar <binary-numtype>` for :ref:`number types <syntax-numtype>` is given as follows:

.. math::
\begin{array}{llcll@{\qquad\qquad}l}
\production{number types} & \Bnumtype &::=&
\hex{7F} &\Rightarrow& \I32 \\ &&|&
\hex{7E} &\Rightarrow& \I64 \\ &&|&
\hex{7D} &\Rightarrow& \F32 \\ &&|&
\hex{7C} &\Rightarrow& \F64 \\
\end{array}
Consequently, the byte :math:`\hex{7F}` encodes the type |I32|,
:math:`\hex{7E}` encodes the type |I64|, and so forth.
$${grammar: Bnumtype}

Consequently, the byte ${grammar-case: 0x7F} encodes the type ${numtype:I32},
${grammar-case: 0x7E} encodes the type ${numtype: I64}, and so forth.
No other byte value is allowed as the encoding of a number type.

The :ref:`binary grammar <binary-limits>` for :ref:`limits <syntax-limits>` is defined as follows:

.. math::
\begin{array}{llclll}
\production{limits} & \Blimits &::=&
\hex{00}~~n{:}\Bu32 &\Rightarrow& \{ \LMIN~n, \LMAX~\epsilon \} \\ &&|&
\hex{01}~~n{:}\Bu32~~m{:}\Bu32 &\Rightarrow& \{ \LMIN~n, \LMAX~m \} \\
\end{array}
$${grammar: Blimits}

That is, a limits pair is encoded as either the byte :math:`\hex{00}` followed by the encoding of a |U32| value,
or the byte :math:`\hex{01}` followed by two such encodings.
The variables :math:`n` and :math:`m` name the attributes of the respective |Bu32| nonterminals, which in this case are the actual :ref:`unsigned integers <syntax-uint>` those decode into.
That is, a limits pair is encoded as either the byte ${:0x00} followed by the encoding of a ${:u32} value,
or the byte ${grammar-case: 0x01} followed by two such encodings.
The variables ${:n} and ${:m} name the attributes of the respective ${grammar-case: Bu32} nonterminals, which in this case are the actual :ref:`unsigned integers <syntax-uint>` those decode into.
The attribute of the complete production then is the abstract syntax for the limit, expressed in terms of the former values.


Expand All @@ -101,9 +89,9 @@ Auxiliary Notation

When dealing with binary encodings the following notation is also used:

* :math:`\epsilon` denotes the empty byte sequence.
* ${grammar-case: eps} denotes the empty byte sequence.

* :math:`||B||` is the length of the byte sequence generated from the production :math:`B` in a derivation.
* ${:||B||} is the length of the byte sequence generated from the production ${grammar-case: B} in a derivation.


.. index:: list
Expand All @@ -113,10 +101,6 @@ When dealing with binary encodings the following notation is also used:
Lists
~~~~~

:ref:`Lists <syntax-list>` are encoded with their |Bu32| length followed by the encoding of their element sequence.
:ref:`Lists <syntax-list>` are encoded with their ${grammar-case:Bu32} length followed by the encoding of their element sequence.

.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{list} & \Blist(\B{B}) &::=&
n{:}\Bu32~~(x{:}\B{B})^n &\Rightarrow& x^n \\
\end{array}
$${grammar: Blist}
2 changes: 1 addition & 1 deletion document/core/binary/instructions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Control Instructions

:ref:`Control instructions <syntax-instr-control>` have varying encodings. For structured instructions, the instruction sequences forming nested blocks are terminated with explicit opcodes for |END| and |ELSE|.

:ref:`Block types <syntax-blocktype>` are encoded in special compressed form, by either the byte :math:`\hex{40}` indicating the empty type, as a single :ref:`value type <binary-valtype>`, or as a :ref:`type index <binary-typeidx>` encoded as a positive :ref:`signed integer <binary-sint>`.
:ref:`Block types <syntax-blocktype>` are encoded in special compressed form, by either the byte ${:0x40} indicating the empty type, as a single :ref:`value type <binary-valtype>`, or as a :ref:`type index <binary-typeidx>` encoded as a positive :ref:`signed integer <binary-sint>`.

.. _binary-blocktype:
.. _binary-castflags:
Expand Down
177 changes: 24 additions & 153 deletions document/core/binary/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Types

.. note::
In some places, possible types include both type constructors or types denoted by :ref:`type indices <syntax-typeidx>`.
Thus, the binary format for type constructors corresponds to the encodings of small negative :math:`\xref{binary/values}{binary-sint}{\sN}` values, such that they can unambiguously occur in the same place as (positive) type indices.
Thus, the binary format for type constructors corresponds to the encodings of small negative ${:sN(N)} values, such that they can unambiguously occur in the same place as (positive) type indices.


.. index:: number type
Expand All @@ -19,14 +19,7 @@ Number Types

:ref:`Number types <syntax-numtype>` are encoded by a single byte.

.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{number type} & \Bnumtype &::=&
\hex{7F} &\Rightarrow& \I32 \\ &&|&
\hex{7E} &\Rightarrow& \I64 \\ &&|&
\hex{7D} &\Rightarrow& \F32 \\ &&|&
\hex{7C} &\Rightarrow& \F64 \\
\end{array}
$${grammar: Bnumtype}


.. index:: vector type
Expand All @@ -38,11 +31,7 @@ Vector Types

:ref:`Vector types <syntax-vectype>` are also encoded by a single byte.

.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{vector type} & \Bvectype &::=&
\hex{7B} &\Rightarrow& \V128 \\
\end{array}
$${grammar: Bvectype}


.. index:: heap type
Expand All @@ -55,23 +44,10 @@ Heap Types

:ref:`Heap types <syntax-reftype>` are encoded as either a single byte, or as a :ref:`type index <binary-typeidx>` encoded as a positive :ref:`signed integer <binary-sint>`.

.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{abstract heap type} & \Babsheaptype &::=&
\hex{73} &\Rightarrow& \NOFUNC \\ &&|&
\hex{72} &\Rightarrow& \NOEXTERN \\ &&|&
\hex{71} &\Rightarrow& \NONE \\ &&|&
\hex{70} &\Rightarrow& \FUNC \\ &&|&
\hex{6F} &\Rightarrow& \EXTERN \\ &&|&
\hex{6E} &\Rightarrow& \ANY \\ &&|&
\hex{6D} &\Rightarrow& \EQT \\ &&|&
\hex{6C} &\Rightarrow& \I31 \\ &&|&
\hex{6B} &\Rightarrow& \STRUCT \\ &&|&
\hex{6A} &\Rightarrow& \ARRAY \\
\production{heap type} & \Bheaptype &::=&
\X{ht}{:}\Babsheaptype &\Rightarrow& \X{ht} \\ &&|&
x{:}\Bs33 &\Rightarrow& x & (\iff x \geq 0) \\
\end{array}
$${grammar: Babsheaptype Bheaptype}

.. note::
The heap type ${heaptype: BOT} cannot occur in a module.


.. index:: reference type
Expand All @@ -83,13 +59,7 @@ Reference Types

:ref:`Reference types <syntax-reftype>` are either encoded by a single byte followed by a :ref:`heap type <binary-heaptype>`, or, as a short form, directly as an :ref:`abstract heap type <binary-absheaptype>`.

.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{reference type} & \Breftype &::=&
\hex{64}~~\X{ht}{:}\Bheaptype &\Rightarrow& \REF~\X{ht} \\ &&|&
\hex{63}~~\X{ht}{:}\Bheaptype &\Rightarrow& \REF~\NULL~\X{ht} \\ &&|&
\X{ht}{:}\Babsheaptype &\Rightarrow& \REF~\NULL~\X{ht} \\
\end{array}
$${grammar: Breftype}


.. index:: value type, number type, reference type
Expand All @@ -101,19 +71,13 @@ Value Types

:ref:`Value types <syntax-valtype>` are encoded with their respective encoding as a :ref:`number type <binary-numtype>`, :ref:`vector type <binary-vectype>`, or :ref:`reference type <binary-reftype>`.

.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{value type} & \Bvaltype &::=&
t{:}\Bnumtype &\Rightarrow& t \\ &&|&
t{:}\Bvectype &\Rightarrow& t \\ &&|&
t{:}\Breftype &\Rightarrow& t \\
\end{array}
$${grammar: Bvaltype}

.. note::
The type :math:`\BOT` cannot occur in a module.
The value type ${valtype: BOT} cannot occur in a module.

Value types can occur in contexts where :ref:`type indices <syntax-typeidx>` are also allowed, such as in the case of :ref:`block types <binary-blocktype>`.
Thus, the binary format for types corresponds to the |SignedLEB128|_ :ref:`encoding <binary-sint>` of small negative :math:`\sN` values, so that they can coexist with (positive) type indices in the future.
Thus, the binary format for types corresponds to the |SignedLEB128|_ :ref:`encoding <binary-sint>` of small negative ${:sN(N)} values, so that they can coexist with (positive) type indices in the future.


.. index:: result type, value type
Expand All @@ -125,92 +89,33 @@ Result Types

:ref:`Result types <syntax-resulttype>` are encoded by the respective :ref:`lists <binary-list>` of :ref:`value types <binary-valtype>`.

.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{result type} & \Bresulttype &::=&
t^\ast{:\,}\Blist(\Bvaltype) &\Rightarrow& [t^\ast] \\
\end{array}
.. index:: function type, value type, result type
pair: binary format; function type
.. _binary-functype:

Function Types
~~~~~~~~~~~~~~

:ref:`Function types <syntax-functype>` are encoded by the respective :ref:`lists <binary-list>` of parameter and result types.

.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{function type} & \Bfunctype &::=&
\X{rt}_1{:\,}\Bresulttype~~\X{rt}_2{:\,}\Bresulttype
&\Rightarrow& \X{rt}_1 \to \X{rt}_2 \\
\end{array}
$${grammar: Bresulttype}


.. index:: aggregate type, value type, structure type, array type, field type, storage type, packed type, mutability
.. index:: composite type, aggregate type, structure type, array type, function type, result type, value type, field type, storage type, packed type, mutability
pair: binary format; composite type
pair: binary format; aggregate type
pair: binary format; function type
pair: binary format; structure type
pair: binary format; array type
pair: binary format; field type
pair: binary format; storage type
pair: binary format; packed type
.. _binary-comptype:
.. _binary-aggrtype:
.. _binary-functype:
.. _binary-structtype:
.. _binary-arraytype:
.. _binary-fieldtype:
.. _binary-storagetype:
.. _binary-packtype:

Aggregate Types
~~~~~~~~~~~~~~~

:ref:`Aggregate types <syntax-aggrtype>` are encoded with their respective :ref:`field types <syntax-fieldtype>`.

.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{array type} & \Barraytype &::=&
\X{ft}{:\,}\Bfieldtype
&\Rightarrow& \X{ft} \\
\production{structure type} & \Bstructtype &::=&
\X{ft}^\ast{:\,}\Blist(\Bfieldtype)
&\Rightarrow& \X{ft}^\ast \\
\production{field type} & \Bfieldtype &::=&
\X{st}{:}\Bstoragetype~~m{:}\Bmut
&\Rightarrow& m~\X{st} \\
\production{storage type} & \Bstoragetype &::=&
t{:}\Bvaltype
&\Rightarrow& t \\ &&|&
t{:}\Bpacktype
&\Rightarrow& t \\
\production{packed type} & \Bpacktype &::=&
\hex{78}
&\Rightarrow& \I8 \\ &&|&
\hex{77}
&\Rightarrow& \I16 \\
\end{array}
.. index:: composite type, structure type, array type, function type
pair: binary format; composite type
.. _binary-comptype:

Composite Types
~~~~~~~~~~~~~~~

:ref:`Composite types <syntax-comptype>` are encoded by a distinct byte followed by a type encoding of the respective form.

.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{composite type} & \Bcomptype &::=&
\hex{5E}~~\X{at}{:}\Barraytype
&\Rightarrow& \TARRAY~\X{at} \\ &&|&
\hex{5F}~~\X{st}{:}\Bstructtype
&\Rightarrow& \TSTRUCT~\X{st} \\ &&|&
\hex{60}~~\X{ft}{:}\Bfunctype
&\Rightarrow& \TFUNC~\X{ft} \\
\end{array}
$${grammar: Bmut Bcomptype Bfieldtype Bstoragetype Bpacktype}


.. index:: recursive type, sub type, composite type
Expand All @@ -222,24 +127,10 @@ Composite Types
Recursive Types
~~~~~~~~~~~~~~~

:ref:`Recursive types <syntax-rectype>` are encoded by the byte :math:`\hex{4E}` followed by a :ref:`list <binary-list>` of :ref:`sub types <syntax-subtype>`.
:ref:`Recursive types <syntax-rectype>` are encoded by the byte ${:0x4E} followed by a :ref:`list <binary-list>` of :ref:`sub types <syntax-subtype>`.
Additional shorthands are recognized for unary recursions and sub types without super types.

.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{recursive type} & \Brectype &::=&
\hex{4E}~~\X{st}^\ast{:\,}\Blist(\Bsubtype)
&\Rightarrow& \TREC~\X{st}^\ast \\ &&|&
\X{st}{:}\Bsubtype
&\Rightarrow& \TREC~\X{st} \\
\production{sub type} & \Bsubtype &::=&
\hex{50}~~x^\ast{:\,}\Blist(\Btypeidx)~~\X{ct}{:}\Bcomptype
&\Rightarrow& \TSUB~x^\ast~\X{ct} \\ &&|&
\hex{4F}~~x^\ast{:\,}\Blist(\Btypeidx)~~\X{ct}{:}\Bcomptype
&\Rightarrow& \TSUB~\TFINAL~x^\ast~\X{ct} \\ &&|&
\X{ct}{:}\Bcomptype
&\Rightarrow& \TSUB~\TFINAL~\epsilon~\X{ct} \\
\end{array}
$${grammar: Brectype Bsubtype}


.. index:: limits
Expand All @@ -251,12 +142,7 @@ Limits

:ref:`Limits <syntax-limits>` are encoded with a preceding flag indicating whether a maximum is present.

.. math::
\begin{array}{llclll}
\production{limits} & \Blimits &::=&
\hex{00}~~n{:}\Bu32 &\Rightarrow& \{ \LMIN~n, \LMAX~\epsilon \} \\ &&|&
\hex{01}~~n{:}\Bu32~~m{:}\Bu32 &\Rightarrow& \{ \LMIN~n, \LMAX~m \} \\
\end{array}
$${grammar: Blimits}


.. index:: memory type, limits, page size
Expand All @@ -268,11 +154,7 @@ Memory Types

:ref:`Memory types <syntax-memtype>` are encoded with their :ref:`limits <binary-limits>`.

.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{memory type} & \Bmemtype &::=&
\X{lim}{:}\Blimits &\Rightarrow& \X{lim} \\
\end{array}
$${grammar: Bmemtype}


.. index:: table type, reference type, limits
Expand All @@ -284,11 +166,7 @@ Table Types

:ref:`Table types <syntax-tabletype>` are encoded with their :ref:`limits <binary-limits>` and the encoding of their element :ref:`reference type <syntax-reftype>`.

.. math::
\begin{array}{llclll}
\production{table type} & \Btabletype &::=&
\X{et}{:}\Breftype~~\X{lim}{:}\Blimits &\Rightarrow& \X{lim}~\X{et} \\
\end{array}
$${grammar: Btabletype}


.. index:: global type, mutability, value type
Expand All @@ -302,11 +180,4 @@ Global Types

:ref:`Global types <syntax-globaltype>` are encoded by their :ref:`value type <binary-valtype>` and a flag for their :ref:`mutability <syntax-mut>`.

.. math::
\begin{array}{llclll}
\production{global type} & \Bglobaltype &::=&
t{:}\Bvaltype~~m{:}\Bmut &\Rightarrow& m~t \\
\production{mutability} & \Bmut &::=&
\hex{00} &\Rightarrow& \MCONST \\ &&|&
\hex{01} &\Rightarrow& \MVAR \\
\end{array}
$${grammar: Bglobaltype}
Loading

0 comments on commit f11d7aa

Please sign in to comment.