-
Notifications
You must be signed in to change notification settings - Fork 70
l10n data: *.lol
The Mozilla l20n project aims to supersede l10n formats like Gettext, mostly by allowing C-style macros in l10n resources. An implementation is in progress: https://github.com/zbraniecki/l20n
The main issue comes from the specific 'LOL' format that is specific to the l20n project. Here’s a proposal to make the 'LOL' format less weird and more extensible.
The 'LOL' format is a mix of:
- C-style comments and logical expressions
- JSON-like values: string | array | list
- DTD-like entities (ugh!)
Entities:
<identifier[index]? content? attributes?>
- identifier = key string + optional [index] list
- content = value (generally: HTML element content)
- attributes = « key: value » sequence
where:
- each index is a C-style logical expression
- each value is a JSON object: string | array | list
Macros:
<identifier(params) {expression}>
Comments:
/* C-style comment */
Identifiers, values, expressions and comments make perfect sense, but to meet our needs we need a nestable format for entities instead of the 'LOL' one.
By design, 'LOL' entities are not nestable. This gets in the developer’s way when dealing with more complex structures than just HTML elements.
For our project, we have to group entities by language and by media queries; this would be straight-forward with a JSON/YAML-like variant (see the lang.json and lang.yaml files in this directory), but impossible with the current LOL syntax.
More generally, there’s no reason why an l10n format should be specific to HTML or XML, or why it should set a limitation to translatable data structures: a template engine or UI toolkit library might benefit from nested entities.
'LOL' entities are written like this: <ID content? attributes?>
. Grouping / nesting entities would be straight-forward if entities used the ID: value
form, which is common to web-related languages (CSS, JSON, YAML…).
In order to use ID: value
instead of <ID content? attributes?>
for l10n entities, we have to express the content and attributes as a single object. As an example:
<test
"hello, world!"
style: "color: red;"
title: "failed">
can be written in JSON like this:
test: {
&textContent: "hello, world!",
.style: "color: red;",
.title: "failed"
}
…which is what we do in our *.intl
format proposal: attributes are prefixed with .
(like in 'LOL' expressions), properties are prefixed with &
.
In 'LOL' entities, the value is assigned to the element’s innerHTML
property, which raises performance and security issues. The more generic &
prefix is more verbose but sharper, as it can be used to specify whether the value should be applied to innerHTML
or textContent
(the latter being much safer).
The main point of this ID: value
approach is that it is more generic, and it allows to group/nest entities, as required by our project. It also allows to use a much more consistent and conventional syntax.
The l20n project brings a generic solution (expressions) to handle complex grammatical rules, which is great. It relies on a specific and unconventional syntax, which feels wrong — and not needed. As an example, here’s what the same data looks like…
By design, LOL entities are not nestable. This gets in the developer’s way when dealing with more complex structures than just HTML elements. For our project, we have to group entities by language and by media queries; this would be straight-forward with a JSON/YAML-like variant (see the lang.json and lang.yaml examples), but impossible with the current 'LOL' syntax.
There’s a point where a high WTF-factor becomes a technical issue. We believe we’re beyond that point.