Runtime vs Tooling ASTs #1067
Replies: 5 comments 4 replies
-
Let's try to look at the needs of users of ICU4X. The value proposition of ICU4X is that we have fast, low-memory, and small-code modules. For clients who just care about good i18n, they don't care how we achieve that goal. These users aren't looking to introspect the ASTs or Patterns. In fact, many of these users will be over FFI, where it is unlikely that we expose ASTs or Patterns at all. The users who might care about introspecting the ASTs and Patterns (who you describe as "tooling") are very specific libraries that are trying to extend ICU4X functionality in some way. I imagine Fluent might be an example of that. The line I would draw is that we should focus primarily on the clients who come to us because of our value proposition and just want the good i18n. We should support the "tooling" customers only to the extent that they don't reduce the value to the primary target audience. |
Beta Was this translation helpful? Give feedback.
-
I don't think of Fluent per se. I see general ast/parser/serializer package as a general enabler of technological advancement by the community. A straw man use case is not Fluent to me, but rather a linter that wants to read date time patterns, modify them and save back the result. Or some CAT augmentation that wants to do syntax coloring of the plural rules. I agree that the challenge is that I'm not coming with a particular use case, but rather with a claim that if we were to provide the value, then we'd enable tools to be created. Thats a challenging claim to make since you can say that there are none that we know of, and i may say that it's because such bundle does not exist. My thinking now is going in the direction of writing the code we need (runtime) but shaping the crates with space for the tooling part to be added if/when the need arises. That would mostly mean using module structure that denotes a rules::runtime::parser/ast/resolver and leaves space for rules::tooling::parser/ast/serializer to be written later. Would you be ok with using such convention ? |
Beta Was this translation helpful? Give feedback.
-
To expand on my thinking. I imagine an architecture for such scenarios where we have a data that may be useful for tooling to look like this: Additional An esoteric data that we do not expect anyone to care about beyond us may look like this: And for PluralRules, where we already have reference/tooling parser/ast/serializer, we'd do: We have the full parser for the reference AST which produces data not needed at runtime like Samples, and we have a serializer. In result the "reference" module would be useful for general PluralRules tooling, while Runtime would basically only have AST and Resolver. Thanks to data driven tests it would be relatively easy to make sure that both sides work as one is a rountrip test and the other is resolver test and then AST<->AST can be also tested the same way. I think this would greatly benefit our system making ICU4X not just a reference runtime implementation but also creating space for ICU4X to serve as a backbone for any I18n data tooling. How does it sound to the group? |
Beta Was this translation helpful? Give feedback.
-
Okay, I see. Here is how I would architect this.
In other words, the "reference" data structure becomes a Builder for the "runtime" data structure. I do not see value in allowing PluralRules to operate on the "reference" data structure directly. Clients who require the additional functionality can just convert it to "runtime" before handing it to PluralRules. Even though the conversion step requires some overhead, the end result may be an overall performance win, since the PluralRules select function will be at a performance maxima. |
Beta Was this translation helpful? Give feedback.
-
Dumping a thought from the ICU4X call today - there is an option to have a single parser producing two AST's depending on the flag. My concern here is that the needs of parsers producing two AST's may slowly diverge and if we have it as a flag it may incentivize working that around by having a suboptimal parser for both needs as a compromise. Having a clean "two parsers" approach means that instead, if we have just one, we need to "loop around", like And if we have enough reason to have a runtime parser, it should justify writing |
Beta Was this translation helpful? Give feedback.
-
In my recent work in #615 and #519 I'm adapting parsed structures for the zero copy model optimal for the needs of Data Provider
and a model which gives us great performance and lowest memory overhead.
That models comes at a tradeoff. The tradeoff is that we're moving away from the simple, canonical representation of the data structure as defined by the specification.
The value of canonical representation, as I see it, is two fold:
I see the AST, in the Parser/AST/Resolver+Serializer as a natural tension system between two target audiences of the AST.
That tension is not unique to us, every known PL parser/runtime struggles with it, and everywhere I go I see some compromise between the needs of those two groups often shading in one or the other direction.
I see three paths forward from here:
A) Focus only on Runtime
We could forgo the tooling needs as a focus point, create hyper-optimized ZeroVec backed structures that squeeze bytes and bits everywhere possible based on reasonable judgement (example: "The modulo can probably be stored in a single byte, since all instances are 1 to a power of 10"), and let tooling authors handle our data model, or write their own tools.
B) Focus on Tooling
We could stick to the "canonical" representation and sacrifice some of the performance gains we'd get from zero copy.
C) Handle both
We can also take the high-road and build support for two ASTs. One consumable by Resolver, the other by Serializer and with
From/Into
between them.D) Another way?
=================
My hunch is to go for (C) with an assumption that we're responsible for building a full foundational library and both runtime and tooling ecosystem needs are best served by the unified ICU4X and DCE/features can ensure noone pays for what they don't use.
The cost is an overhead on maintenance of the library.
Thoughts?
p.s. I suspect that the challenges around that are not unique to DateTimeFormat and PluralRules, and we should assume that the conclusion of this conversation will carry over to more cases.
Beta Was this translation helpful? Give feedback.
All reactions