Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode MessageFormat 2.0 #1042

Open
1 task done
aphillips opened this issue Jan 21, 2025 · 3 comments
Open
1 task done

Unicode MessageFormat 2.0 #1042

aphillips opened this issue Jan 21, 2025 · 3 comments

Comments

@aphillips
Copy link
Contributor

こんにちは TAG-さん!

I'm requesting a TAG review of Unicode's MessageFormat 2.0.

Software needs to construct messages that incorporate various pieces of information. The complexities of the world's languages make this challenging. MessageFormat 2 defines the data model, syntax, processing, and conformance requirements for the next generation of dynamic messages. It is intended for adoption by programming languages, software libraries, and software localization tooling. It enables the integration of internationalization APIs (such as date or number formats), and grammatical matching (such as plurals or genders). It is extensible, allowing software developers to create formatting or message selection logic that add on to the core capabilities. Its data model provides a means of representing existing syntaxes, thus enabling gradual adoption by users of older formatting systems. The goal is to allow developers and translators to create natural-sounding, grammatically-correct, user interfaces that can appear in any language and support the needs of diverse cultures.

See also blog which includes links to implementations.

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • Previous early design review, if any: N/A
  • Relevant time constraints or deadlines: 12 February 2025 is v47 Beta. 26 February is v47 public Beta. Before these dates would be ideal, but we recognize that this might not be possible.
  • The group where the work on this specification is currently being done: Unicode MessageFormat Working Group [repo](https://github.com/unicode-org/message-format-wg)\]
  • The group where standardization of this work is intended to be done (if different from the current group):
  • Major unresolved issues with or opposition to this specification: None
  • This work is being funded by: Unicode

You should also know that...

This specification was reviewed somewhat informally at TPAC 2023 (very many significant changes have occurred since).

@aphillips
Copy link
Contributor Author

@jyasskin @torgo Checking in on progress. The next version of LDML (which contains MessageFormat) will release shortly. This version will stabilize MF2 (think of it like REC status). Let me know if there is any way we can help.

v47 pending (replaces the 46.1 link above): https://www.unicode.org/reports/tr35/dev/tr35-messageFormat.html

@jyasskin
Copy link
Contributor

jyasskin commented Mar 6, 2025

I'll post my review now, in case any of the minor notes can help while finalizing the spec. The soonest we can check whether there's TAG consensus on this is the week of March 17, which might be too late.


I appreciate the clarity in the stability guarantees and the design goals.

I think the programming model is that, for each message in the code, the code author writes one MessageFormat 2.0 string in their language which seeds a translation database. Then the translator for a given locale writes a whole new MessageFormat 2.0 string representing the translation of that locale. Then at runtime, the code pulls that translated MF2 string out of the database and formats it with the appropriate dynamic values. This model is implied by documentation like https://messageformat.dev/docs/translators/, but AFAIC it's not explicitly stated anywhere, and it probably should be.

The specification overall looks solid and well-considered. I think we're more likely to have feedback for the Javascript integration in https://github.com/tc39/proposal-intl-messageformat, although I don't see any immediate problems with that either.

Some minor notes on the spec:

https://www.unicode.org/reports/tr35/dev/tr35-messageFormat.html#function uses "function" to refer to what I'd call a "function call" in other specs. This is a little inconsistent with https://www.unicode.org/reports/tr35/dev/tr35-messageFormat.html#default-functions, which uses "function" the way I'd expect. "Function handler" also seems to have the meaning I'd associate with "function", but it seems to only be used with user-defined functions. This isn't a big deal, but maybe a clarifying note would fit somewhere.

"A message with markup that should not be copied:" uses "@can-copy" to mark the text that should not be copied?

I'm concerned about how loose the "resolved value" interface is. Without care, specifications using resolved values will depend on attributes that aren't guaranteed to exist. Perhaps the extra detail in the Intl proposal will insulate the Web from this ambiguity.

In option resolution, "If rv is a fallback value: If supported, emit a Bad Option error." doesn't say what to do if the BadOption error isn't supported.

"The resolution of markup MUST always succeed.", but it calls option resolution which can fail?

@aphillips
Copy link
Contributor Author

I'll post my review now, in case any of the minor notes can help while finalizing the spec. The soonest we can check whether there's TAG consensus on this is the week of March 17, which might be too late.

Thank you very much for this early review! A few (personal) comments:

Some minor notes on the spec:
...

"A message with markup that should not be copied:" uses "@can-copy" to mark the text that should not be copied?

I think that's an example. Good point tho'.

In option resolution, "If rv is a fallback value: If supported, emit a Bad Option error." doesn't say what to do if the BadOption error isn't supported.

That appears to be an indentation problem. The step "Set res[id] to be rv" should be executed even if rv is a fallback value.

"The resolution of markup MUST always succeed.", but it calls option resolution which can fail?

Note that:

The result of option resolution MUST be a (possibly empty) mapping of string identifiers to values; that is, errors MAY be emitted, but such errors MUST NOT be fatal.

So option resolution cannot fail (but it may emit diagnostic errors).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment