Skip to content

Latest commit

 

History

History
934 lines (746 loc) · 39 KB

formatting.md

File metadata and controls

934 lines (746 loc) · 39 KB

DRAFT MessageFormat 2.0 Formatting

Introduction

This document defines the behaviour of a MessageFormat 2.0 implementation when formatting a message for display in a user interface, or for some later processing.

To start, we presume that a message has either been parsed from its syntax or created from a data model description. If the resulting message is not well-formed, a Syntax Error is emitted. If the resulting message is well-formed but is not valid, a Data Model Error is emitted.

The formatting of a message is defined by the following operations:

  • Pattern Selection determines which of a message's patterns is formatted. For a message with no selectors, this is simple as there is only one pattern. With selectors, this will depend on their resolution.

  • Formatting takes the resolved values of the text and placeholder parts of the selected pattern, and produces the formatted result for the message. Depending on the implementation, this result could be a single concatenated string, an array of objects, an attributed string, or some other locally appropriate data type.

  • Expression and Markup Resolution determines the value of an expression or markup, with reference to the current formatting context. This can include multiple steps, such as looking up the value of a variable and calling formatting functions. The form of the resolved value is implementation defined and the value might not be evaluated or formatted yet. However, it needs to be "formattable", i.e. it contains everything required by the eventual formatting.

    The resolution of text is rather straightforward, and is detailed under literal resolution.

Implementations are not required to expose the expression resolution and pattern selection operations to their users, or even use them in their internal processing, as long as the final formatting result is made available to users and the observable behavior of the formatting matches that described here.

Attributes MUST NOT have any effect on the formatted output of a message, nor be made available to function handlers.

Important

This specification does not require either eager or lazy expression resolution of message parts; do not construe any requirement in this document as requiring either.

Implementations are not required to evaluate all parts of a message when parsing, processing, or formatting. In particular, an implementation MAY choose not to evaluate or resolve the value of a given expression until it is actually used by a selection or formatting process. However, when an expression is resolved, it MUST behave as if all preceding declarations affecting variables referenced by that expression have already been evaluated in the order in which the relevant declarations appear in the message. An implementation MUST ensure that every expression in a message is evaluated at most once.

Note

Implementations with lazy evaluation MUST NOT use a call-by-name evaluation strategy. Instead, they must evaluate expressions at most once ("call-by-need"). This is to prevent expressions from having different values when used in different parts of a given message. Function handlers are not necessarily pure: they can access external mutable state such as the current system clock time. Thus, evaluating the same expression more than once could yield different results. That behavior violates this specification.

Important

Implementations and users SHOULD NOT create function handlers that mutate external program state, particularly since such a function handler can present a remote execution hazard.

Formatting Context

A message's formatting context represents the data and procedures that are required for the message's expression resolution, pattern selection and formatting.

At a minimum, it includes:

  • Information on the current locale, potentially including a fallback chain of locales. This will be passed on to formatting functions.

  • Information on the base directionality of the message and its text tokens. This will be used by strategies for bidirectional isolation, and can be used to set the base direction of the message upon display.

  • An input mapping of string identifiers to values, defining variable values that are available during variable resolution. This is often determined by a user-provided argument of a formatting function call.

  • The function registry, providing the function handlers of the functions referred to by message functions.

  • Optionally, a fallback string to use for the message if it is not valid.

Implementations MAY include additional fields in their formatting context.

Resolved Values

A resolved value is the result of resolving a text, literal, variable, expression, or markup. The resolved value is determined using the formatting context. The form of the resolved value is implementation-defined.

In a declaration, the resolved value of an expression is bound to a variable, which makes it available for use in later expressions and markup options.

For example, in

.input {$a :number minimumFractionDigits=3}
.local $b = {$a :integer notation=compact}
.match $a
0 {{The value is zero.}}
* {{In compact form, the value {$a} is rendered as {$b}.}}

the resolved value bound to $a is used as the operand of the :integer function when resolving the value of the variable $b, as a selector in the .match statement, as well as for formatting the placeholder {$a}.

In an input-declaration, the variable operand of the variable-expression identifies not only the name of the external input value, but also the variable to which the resolved value of the variable-expression is bound.

In a pattern, the resolved value of an expression or markup is used in its formatting.

The form that resolved values take is implementation-dependent, and different implementations MAY choose to perform different levels of resolution.

While this specification does not require it, a resolved value could be implemented by requiring each function handler to return a value matching the following interface:

interface MessageValue {
  formatToString(): string
  formatToX(): X // where X is an implementation-defined type
  getValue(): unknown
  resolvedOptions(): { [key: string]: MessageValue }
  selectKeys(keys: string[]): string[]
}

With this approach:

  • An expression could be used as a placeholder if calling the formatToString() or formatToX() method of its resolved value did not emit an error.
  • A variable could be used as a selector if calling the selectKeys(keys) method of its resolved value did not emit an error.
  • Using a variable, the resolved value of an expression could be used as an operand or option value if calling the getValue() method of its resolved value did not emit an error. In this use case, the resolvedOptions() method could also provide a set of option values that could be taken into account by the called function.

Extensions of the base MessageValue interface could be provided for different data types, such as numbers or strings, for which the unknown return type of getValue() and the generic MessageValue type used in resolvedOptions() could be narrowed appropriately. An implementation could also allow MessageValue values to be passed in as input variables, or automatically wrap each variable as a MessageValue to provide a uniform interface for custom functions.

Expression and Markup Resolution

Expressions are used in declarations and patterns. Markup is only used in patterns.

Depending on the presence or absence of a variable or literal operand and a function, the resolved value of the expression is determined as follows:

If the expression contains a function, its resolved value is defined by function resolution.

Else, if the expression consists of a variable, its resolved value is defined by variable resolution. An implementation MAY perform additional processing when resolving the value of an expression that consists only of a variable.

For example, it could apply function resolution using a function and a set of options chosen based on the value or type of the variable. So, given a message like this:

Today is {$date}

If the value passed in the variable were a date object, such as a JavaScript Date or a Java java.util.Date or java.time.Temporal, the implementation could interpret the placeholder {$date} as if the pattern included the function :datetime with some set of default options.

Else, the expression consists of a literal. Its resolved value is defined by literal resolution.

Note

This means that a literal value with no function is always treated as a string. To represent values that are not strings as a literal, a function needs to be provided:

.local $aNumber = {1234 :number}
.local $aDate = {|2023-08-30| :datetime}
.local $aFoo = {|some foo| :foo}
{{You have {42 :number}}}

Literal Resolution

The resolved value of a text or a literal contains the character sequence of the text or literal after any character escape has been converted to the escaped character.

When a literal is used as an operand or on the right-hand side of an option, the formatting function MUST treat its resolved value the same whether its value was originally a quoted literal or an unquoted literal.

For example, the option foo=42 and the option foo=|42| are treated as identical.

For example, in a JavaScript formatter the resolved value of a text or a literal could have the following implementation:

class MessageLiteral implements MessageValue {
  constructor(value: string) {
    this.formatToString = () => value;
    this.getValue = () => value;
  }
  resolvedOptions: () => ({});
  selectKeys(_keys: string[]) {
    throw Error("Selection on unannotated literals is not supported");
  }
}

Variable Resolution

To resolve the value of a variable, its name is used to identify either a local variable or an input variable. If a declaration exists for the variable, its resolved value is used. Otherwise, the variable is an implicit reference to an input value, and its value is looked up from the formatting context input mapping.

The resolution of a variable fails if no value is identified for its name. If this happens, an Unresolved Variable error is emitted. If a variable would resolve to a fallback value, this MUST also be considered a failure.

Function Resolution

To resolve an expression with a function, the following steps are taken:

  1. If the expression includes an operand, resolve its value. If this fails, use a fallback value for the expression.

  2. Resolve the identifier of the function and, based on the starting sigil, find the appropriate function handler to call. If the implementation cannot find the function handler, or if the identifier includes a namespace that the implementation does not support, emit an Unknown Function error and use a fallback value for the expression.

    Implementations are not required to implement namespaces or installable function registries.

  3. Perform option resolution.

  4. Determine the function context for calling the function handler.

    The function context contains the context necessary for the function handler to resolve the expression. This includes:

    • The current locale, potentially including a fallback chain of locales.
    • The base directionality of the expression. By default, this is undefined or empty.

    If the resolved mapping of options includes any u: options supported by the implementation, process them as specified. Such u: options MAY be removed from the resolved mapping of options.

  5. Call the function implementation with the following arguments:

    • The function context.
    • The resolved mapping of options.
    • If the expression includes an operand, its resolved value.

    The form that resolved operand and option values take is implementation-defined.

    An implementation MAY pass additional arguments to the function handler, as long as reasonable precautions are taken to keep the function interface simple and minimal, and avoid introducing potential security vulnerabilities.

  6. If the call succeeds, resolve the value of the expression as the result of that function call.

    If the call fails or does not return a valid value, emit the appropriate Message Function Error for the failure.

    Implementations MAY provide a mechanism for the function handler to provide additional detail about internal failures. Specifically, if the cause of the failure was that the datatype, value, or format of the operand did not match that expected by the function, the function SHOULD cause a Bad Operand error to be emitted.

    In all failure cases, use the fallback value for the expression as its resolved value.

Function Handler

A function handler is an implementation-defined process such as a function or method which accepts a set of arguments and returns a resolved value. A function handler is required to resolve a function.

An implementation MAY define its own functions and their handlers. An implementation MAY allow custom functions to be defined by users.

Implementations that provide a means for defining custom functions MUST provide a means for function handlers to return resolved values that contain enough information to be used as operands or option values in subsequent expressions.

The resolved value returned by a function handler MAY be different from the value of the operand of the function. It MAY be an implementation specified type. It is not required to be the same type as the operand.

A function handler MAY include resolved options in its resolved value. The resolved options MAY be different from the options of the function.

A function handler SHOULD emit a Bad Operand error for operands whose resolved value or type is not supported.

Function handler access to the formatting context MUST be minimal and read-only, and execution time SHOULD be limited.

Implementation-defined functions SHOULD use an implementation-defined namespace.

Option Resolution

Option resolution is the process of computing the options for a given expression. Option resolution results in a mapping of string identifiers to values. The order of options MUST NOT be significant.

For example, the following message treats both both placeholders identically:

{$x :function option1=foo option2=bar} {$x :function option2=bar option1=foo}

For each option:

  • Resolve the identifier of the option.
  • If the option's right-hand side successfully resolves to a value, bind the identifier of the option to the resolved value in the mapping.
  • Otherwise, bind the identifier of the option to an unresolved value in the mapping. Implementations MAY later remove this value before calling the function. (Note that an Unresolved Variable error will have been emitted.)

Errors MAY be emitted during option resolution, but it always resolves to some mapping of string identifiers to values. This mapping can be empty.

Markup Resolution

Unlike functions, the resolution of markup is not customizable.

The resolved value of markup includes the following fields:

  • The type of the markup: open, standalone, or close
  • The identifier of the markup
  • The resolved options values after option resolution.

If the resolved mapping of options includes any u: options supported by the implementation, process them as specified. Such u: options MAY be removed from the resolved mapping of options.

The resolution of markup MUST always succeed.

Fallback Resolution

A fallback value is the resolved value for an expression that fails to resolve.

An expression fails to resolve when:

  • A variable used as an operand (with or without a function) fails to resolve.
    • Note that this does not include a variable used as an option value.
  • A function fails to resolve.

The fallback value depends on the contents of the expression:

  • expression with a literal operand (either quoted or unquoted) U+007C VERTICAL LINE | followed by the value of the literal with escaping applied to U+005C REVERSE SOLIDUS \ and U+007C VERTICAL LINE |, and then by U+007C VERTICAL LINE |.

    Examples: In a context where :func fails to resolve, {42 :func} resolves to the fallback value |42| and {|C:\\| :func} resolves to the fallback value |C:\\|.

  • expression with variable operand referring to a local declaration (with or without a function): the value to which it resolves (which may already be a fallback value)

    Examples: In a context where :func fails to resolve, the pattern's expression in .local $var={|val|} {{{$var :func}}} resolves to the fallback value |val| and the message formats to {|val|}. In a context where :now fails to resolve but :datetime does not, the pattern's expression in

    .local $t = {:now format=iso8601}
    .local $pretty_t = {$t :datetime}
    {{{$pretty_t}}}
    

    (transitively) resolves to the fallback value :now and the message formats to {:now}.

  • expression with variable operand not referring to a local declaration (with or without a function): U+0024 DOLLAR SIGN $ followed by the name of the variable

    Examples: In a context where $var fails to resolve, {$var} and {$var :number} both resolve to the fallback value $var. In a context where :func fails to resolve, the pattern's expression in .input $arg {{{$arg :func}}} resolves to the fallback value $arg and the message formats to {$arg}.

  • function expression with no operand: U+003A COLON : followed by the function identifier

    Examples: In a context where :func fails to resolve, {:func} resolves to the fallback value :func. In a context where :ns:func fails to resolve, {:ns:func} resolves to the fallback value :ns:func.

  • Otherwise: the U+FFFD REPLACEMENT CHARACTER

    This is not currently used by any expression, but may apply in future revisions.

Option identifiers and values are not included in the fallback value.

Pattern selection is not supported for fallback values.

For example, in a JavaScript formatter the fallback value could have the following implementation, where source is one of the above-defined strings:

class MessageFallback implements MessageValue {
  constructor(source: string) {
    this.formatToString = () => `{${source}}`;
    this.getValue = () => undefined;
  }
  resolvedOptions: () => ({});
  selectKeys(_keys: string[]) {
    throw Error("Selection on fallback values is not supported");
  }
}

Pattern Selection

If the message being formatted is not well-formed and valid, the result of pattern selection is a pattern consisting of a single fallback value using the message's fallback string defined in the formatting context or if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER .

If the message being formatted does not contain a matcher, the result of pattern selection is its pattern value.

When a message contains a matcher with one or more selectors, the implementation needs to determine which variant will be used to provide the pattern for the formatting operation. This is done by ordering and filtering the available variant statements according to their key values and selecting the first one.

Note

At least one variant is required to have all of its keys consist of the fallback value *. Some selectors might be implemented in a way that the key value * cannot be selected in a valid message. In other cases, this key value might be unreachable only in certain locales. This could result in the need in some locales to create one or more variants that do not make sense grammatically for that language.

For example, in the pl (Polish) locale, this message cannot reach the * variant:

.input {$num :integer}
.match $num
0    {{ }}
one  {{ }}
few  {{ }}
many {{ }}
*    {{Only used by fractions in Polish.}}

In the Tech Preview, feedback from users and implementers is desired about whether to relax the requirement that such a "fallback variant" appear in every message, versus the potential for a message to fail at runtime because no matching variant is available.

The number of keys in each variant MUST equal the number of selectors.

Each key corresponds to a selector by its position in the variant.

For example, in this message:

.input {$one :number}
.input {$two :number}
.input {$three :number}
.match $one $two $three
1 2 3 {{ ... }}

The first key 1 corresponds to the first selector ($one), the second key 2 to the second selector ($two), and the third key 3 to the third selector ($three).

To determine which variant best matches a given set of inputs, each selector is used in turn to order and filter the list of variants.

Each variant with a key that does not match its corresponding selector is omitted from the list of variants. The remaining variants are sorted according to the selector's key-ordering preference. Earlier selectors in the matcher's list of selectors have a higher priority than later ones.

When all of the selectors have been processed, the earliest-sorted variant in the remaining list of variants is selected.

This selection method is defined in more detail below. An implementation MAY use any pattern selection method, as long as its observable behavior matches the results of the method defined here.

Resolve Selectors

First, resolve the values of each selector:

  1. Let res be a new empty list of resolved values that support selection.
  2. For each selector sel, in source order,
    1. Let rv be the resolved value of sel.
    2. If selection is supported for rv:
      1. Append rv as the last element of the list res.
    3. Else:
      1. Let nomatch be a resolved value for which selection always fails.
      2. Append nomatch as the last element of the list res.
      3. Emit a Bad Selector error.

The form of the resolved values is determined by each implementation, along with the manner of determining their support for selection.

Resolve Preferences

Next, using res, resolve the preferential order for all message keys:

  1. Let pref be a new empty list of lists of strings.
  2. For each index i in res:
    1. Let keys be a new empty list of strings.
    2. For each variant var of the message:
      1. Let key be the var key at position i.
      2. If key is not the catch-all key '*':
        1. Assert that key is a literal.
        2. Let ks be the resolved value of key in Unicode Normalization Form C.
        3. Append ks as the last element of the list keys.
    3. Let rv be the resolved value at index i of res.
    4. Let matches be the result of calling the method MatchSelectorKeys(rv, keys)
    5. Append matches as the last element of the list pref.

The method MatchSelectorKeys is determined by the implementation. It takes as arguments a resolved selector value rv and a list of string keys keys, and returns a list of string keys in preferential order. The returned list MUST contain only unique elements of the input list keys. The returned list MAY be empty. The most-preferred key is first, with each successive key appearing in order by decreasing preference.

The resolved value of each key MUST be in Unicode Normalization Form C ("NFC"), even if the literal for the key is not.

If calling MatchSelectorKeys encounters any error, a Bad Selector error is emitted and an empty list is returned.

Filter Variants

Then, using the preferential key orders pref, filter the list of variants to the ones that match with some preference:

  1. Let vars be a new empty list of variants.
  2. For each variant var of the message:
    1. For each index i in pref:
      1. Let key be the var key at position i.
      2. If key is the catch-all key '*':
        1. Continue the inner loop on pref.
      3. Assert that key is a literal.
      4. Let ks be the resolved value of key.
      5. Let matches be the list of strings at index i of pref.
      6. If matches includes ks:
        1. Continue the inner loop on pref.
      7. Else:
        1. Continue the outer loop on message variants.
    2. Append var as the last element of the list vars.

Sort Variants

Finally, sort the list of variants vars and select the pattern:

  1. Let sortable be a new empty list of (integer, variant) tuples.
  2. For each variant var of vars:
    1. Let tuple be a new tuple (-1, var).
    2. Append tuple as the last element of the list sortable.
  3. Let len be the integer count of items in pref.
  4. Let i be len - 1.
  5. While i >= 0:
    1. Let matches be the list of strings at index i of pref.
    2. Let minpref be the integer count of items in matches.
    3. For each tuple tuple of sortable:
      1. Let matchpref be an integer with the value minpref.
      2. Let key be the tuple variant key at position i.
      3. If key is not the catch-all key '*':
        1. Assert that key is a literal.
        2. Let ks be the resolved value of key.
        3. Let matchpref be the integer position of ks in matches.
      4. Set the tuple integer value as matchpref.
    4. Set sortable to be the result of calling the method SortVariants(sortable).
    5. Set i to be i - 1.
  6. Let var be the variant element of the first element of sortable.
  7. Select the pattern of var.

SortVariants is a method whose single argument is a list of (integer, variant) tuples. It returns a list of (integer, variant) tuples. Any implementation of SortVariants is acceptable as long as it satisfies the following requirements:

  1. Let sortable be an arbitrary list of (integer, variant) tuples.
  2. Let sorted be SortVariants(sortable).
  3. sorted is the result of sorting sortable using the following comparator:
    1. (i1, v1) <= (i2, v2) if and only if i1 <= i2.
  4. The sort is stable (pairs of tuples from sortable that are equal in their first element have the same relative order in sorted).

Examples

This section is non-normative.

Example 1

Presuming a minimal implementation which only supports :string function which matches keys by using string comparison, and a formatting context in which the variable reference $foo resolves to the string 'foo' and the variable reference $bar resolves to the string 'bar', pattern selection proceeds as follows for this message:

.input {$foo :string}
.input {$bar :string}
.match $foo $bar
bar bar {{All bar}}
foo foo {{All foo}}
* * {{Otherwise}}
  1. For the first selector:
    The value of the selector is resolved to be 'foo'.
    The available keys « 'bar', 'foo' » are compared to 'foo',
    resulting in a list « 'foo' » of matching keys.

  2. For the second selector:
    The value of the selector is resolved to be 'bar'.
    The available keys « 'bar', 'foo' » are compared to 'bar',
    resulting in a list « 'bar' » of matching keys.

  3. Creating the list vars of variants matching all keys:
    The first variant bar bar is discarded as its first key does not match the first selector.
    The second variant foo foo is discarded as its second key does not match the second selector.
    The catch-all keys of the third variant * * always match, and this is added to vars,
    resulting in a list « * * » of variants.

  4. As the list vars only has one entry, it does not need to be sorted.
    The pattern Otherwise of the third variant is selected.

Example 2

Alternatively, with the same implementation and formatting context as in Example 1, pattern selection would proceed as follows for this message:

.input {$foo :string}
.input {$bar :string}
.match $foo $bar
* bar {{Any and bar}}
foo * {{Foo and any}}
foo bar {{Foo and bar}}
* * {{Otherwise}}
  1. For the first selector:
    The value of the selector is resolved to be 'foo'.
    The available keys « 'foo' » are compared to 'foo',
    resulting in a list « 'foo' » of matching keys.

  2. For the second selector:
    The value of the selector is resolved to be 'bar'.
    The available keys « 'bar' » are compared to 'bar',
    resulting in a list « 'bar' » of matching keys.

  3. Creating the list vars of variants matching all keys:
    The keys of all variants either match each selector exactly, or via the catch-all key,
    resulting in a list « * bar, foo *, foo bar, * * » of variants.

  4. Sorting the variants:
    The list sortable is first set with the variants in their source order and scores determined by the second selector:
    « ( 0, * bar ), ( 1, foo * ), ( 0, foo bar ), ( 1, * * ) »
    This is then sorted as:
    « ( 0, * bar ), ( 0, foo bar ), ( 1, foo * ), ( 1, * * ) ».
    To sort according to the first selector, the scores are updated to:
    « ( 1, * bar ), ( 0, foo bar ), ( 0, foo * ), ( 1, * * ) ».
    This is then sorted as:
    « ( 0, foo bar ), ( 0, foo * ), ( 1, * bar ), ( 1, * * ) ».

  5. The pattern Foo and bar of the most preferred foo bar variant is selected.

Example 3

A more-complex example is the matching found in selection APIs such as ICU's PluralFormat. Suppose that this API is represented here by the function :number. This :number function can match a given numeric value to a specific number literal and also to a plural category (zero, one, two, few, many, other) according to locale rules defined in CLDR.

Given a variable reference $count whose value resolves to the number 1 and an en (English) locale, the pattern selection proceeds as follows for this message:

.input {$count :number}
.match $count
one {{Category match for {$count}}}
1   {{Exact match for {$count}}}
*   {{Other match for {$count}}}
  1. For the selector:
    The value of the selector is resolved to an implementation-defined value that is capable of performing English plural category selection on the value 1.
    The available keys « 'one', '1' » are passed to the implementation's MatchSelectorKeys method,
    resulting in a list « '1', 'one' » of matching keys.

  2. Creating the list vars of variants matching all keys:
    The keys of all variants are included in the list of matching keys, or use the catch-all key,
    resulting in a list « one, 1, * » of variants.

  3. Sorting the variants:
    The list sortable is first set with the variants in their source order and scores determined by the selector key order:
    « ( 1, one ), ( 0, 1 ), ( 2, * ) »
    This is then sorted as:
    « ( 0, 1 ), ( 1, one ), ( 2, * ) »

  4. The pattern Exact match for {$count} of the most preferred 1 variant is selected.

Formatting

After pattern selection, each text and placeholder part of the selected pattern is resolved and formatted.

Resolved values cannot always be formatted by a given implementation. When such an error occurs during formatting, an appropriate Message Function Error is emitted and a fallback value is used for the placeholder with the error.

Implementations MAY represent the result of formatting using the most appropriate data type or structure. Some examples of these include:

  • A single string concatenated from the parts of the resolved pattern.
  • A string with associated attributes for portions of its text.
  • A flat sequence of objects corresponding to each resolved value.
  • A hierarchical structure of objects that group spans of resolved values, such as sequences delimited by markup-open and markup-close placeholders.

Implementations SHOULD provide formatting result types that match user needs, including situations that require further processing of formatted messages. Implementations SHOULD encourage users to consider a formatted localised string as an opaque data structure, suitable only for presentation.

When formatting to a string, the default representation of all markup MUST be an empty string. Implementations MAY offer functionality for customizing this, such as by emitting XML-ish tags for each markup.

Examples

This section is non-normative.

  1. An implementation might choose to return an interstitial object so that the caller can "decorate" portions of the formatted value. In ICU4J, the NumberFormatter class returns a FormattedNumber object, so a pattern such as This is my number {42 :number} might return the character sequence This is my number followed by a FormattedNumber object representing the value 42 in the current locale.

  2. A formatter in a web browser could format a message as a DOM fragment rather than as a representation of its HTML source.

Formatting Fallback Values

If the resolved pattern includes any fallback values and the formatting result is a concatenated string or a sequence of strings, the string representation of each fallback value MUST be the concatenation of a U+007B LEFT CURLY BRACKET {, the fallback value as a string, and a U+007D RIGHT CURLY BRACKET }.

For example, a message that is not well-formed would format to a string as {�}, unless a fallback string is defined in the formatting context, in which case that string would be used instead.

Handling Bidirectional Text

Messages contain text. Any text can be bidirectional text. That is, the text can can consist of a mixture of left-to-right and right-to-left spans of text. The display of bidirectional text is defined by the Unicode Bidirectional Algorithm [UAX9].

The directionality of the formatted message as a whole is provided by the formatting context.

Note

Keep in mind the difference between the formatted output of a message, which is the topic of this section, and the syntax of message prior to formatting. The processing of a message depends on the logical sequence of Unicode code points, not on the presentation of the message. Affordances to allow users appropriate control over the appearance of the message's syntax have been provided.

When a message is formatted, placeholders are replaced with their formatted representation. Applying the Unicode Bidirectional Algorithm to the text of a formatted message (including its formatted parts) can result in unexpected or undesirable spillover effects. Applying bidi isolation to each affected formatted value helps avoid this spillover in a formatted message.

Note that both the message and, separately, each placeholder need to have direction metadata for this to work. If an implementation supports formatting to something other than a string (such as a sequence of parts), the directionality of each formatted placeholder needs to be available to the caller.

If a formatted expression itself contains spans with differing directionality, its formatter SHOULD perform any necessary processing, such as inserting controls or isolating such parts to ensure that the formatted value displays correctly in a plain text context.

For example, an implementation could provide a :currency formatting function which inserts strongly directional characters, such as U+200F RIGHT-TO-LEFT MARK (RLM), U+200E LEFT-TO-RIGHT MARK (LRM), or U+061C ARABIC LETTER MARKER (ALM), to coerce proper display of the sign and currency symbol next to a formatted number. An example of this is formatting the value -1234.56 as the currency AED in the ar-AE locale. The formatted value appears like this:

‎-1,234.56 د.إ.‏

The code point sequence for this string, as produced by the ICU4J NumberFormat function, includes U+200F U+200E at the start and U+200F at the end of the string. If it did not do this, the same string would appear like this instead:

image

A bidirectional isolation strategy is functionality in the formatter's processing of a message that produces bidirectional output text that is ready for display.

The Default Bidi Strategy is a bidirectional isolation strategy that uses isolating Unicode control characters around placeholder's formatted values. It is primarily intended for use in plain-text strings, where markup or other mechanisms are not available. Implementations MUST provide the Default Bidi Strategy as one of the bidirectional isolation strategies.

Implementations MAY provide other bidirectional isolation strategies.

Implementations MAY supply a bidirectional isolation strategy that performs no processing.

The Default Bidi Strategy is defined as follows:

  1. Let msgdir be the directionality of the whole message, one of « 'LTR', 'RTL', 'unknown' ». These correspond to the message having left-to-right directionality, right-to-left directionality, and to the message's directionality not being known.
  2. For each expression exp in pattern:
    1. Let fmt be the formatted string representation of the resolved value of exp.
    2. Let dir be the directionality of fmt, one of « 'LTR', 'RTL', 'unknown' », with the same meanings as for msgdir.
    3. If dir is 'LTR':
      1. If msgdir is 'LTR' in the formatted output, let fmt be itself
      2. Else, in the formatted output, prefix fmt with U+2066 LEFT-TO-RIGHT ISOLATE and postfix it with U+2069 POP DIRECTIONAL ISOLATE.
    4. Else, if dir is 'RTL':
      1. In the formatted output, prefix fmt with U+2067 RIGHT-TO-LEFT ISOLATE and postfix it with U+2069 POP DIRECTIONAL ISOLATE.
    5. Else:
      1. In the formatted output, prefix fmt with U+2068 FIRST STRONG ISOLATE and postfix it with U+2069 POP DIRECTIONAL ISOLATE.