Runtime vs Tooling ASTs #1067

zbraniecki · 2021-09-17T19:19:49Z

zbraniecki
Sep 17, 2021
Maintainer

In my recent work in #615 and #519 I'm adapting parsed structures for the zero copy model optimal for the needs of Data Provider
and a model which gives us great performance and lowest memory overhead.

That models comes at a tradeoff. The tradeoff is that we're moving away from the simple, canonical representation of the data structure as defined by the specification.

The value of canonical representation, as I see it, is two fold:

Following the spec closely we're ensuring that the model is as expressive as the spec describes
Expressive AST provides better DX for tooling authors

I see the AST, in the Parser/AST/Resolver+Serializer as a natural tension system between two target audiences of the AST.

Runtime wants to optimize for edge performance, minimal memory overhead and size
Tooling wants to have an easy way to reason about the data structure, modify it, and serialize it back

That tension is not unique to us, every known PL parser/runtime struggles with it, and everywhere I go I see some compromise between the needs of those two groups often shading in one or the other direction.

I see three paths forward from here:

A) Focus only on Runtime

We could forgo the tooling needs as a focus point, create hyper-optimized ZeroVec backed structures that squeeze bytes and bits everywhere possible based on reasonable judgement (example: "The modulo can probably be stored in a single byte, since all instances are 1 to a power of 10"), and let tooling authors handle our data model, or write their own tools.

B) Focus on Tooling

We could stick to the "canonical" representation and sacrifice some of the performance gains we'd get from zero copy.

C) Handle both

We can also take the high-road and build support for two ASTs. One consumable by Resolver, the other by Serializer and with From/Into between them.

D) Another way?

=================

My hunch is to go for (C) with an assumption that we're responsible for building a full foundational library and both runtime and tooling ecosystem needs are best served by the unified ICU4X and DCE/features can ensure noone pays for what they don't use.

The cost is an overhead on maintenance of the library.

Thoughts?

p.s. I suspect that the challenges around that are not unique to DateTimeFormat and PluralRules, and we should assume that the conclusion of this conversation will carry over to more cases.

sffc · 2021-09-17T19:46:48Z

sffc
Sep 17, 2021
Maintainer

Let's try to look at the needs of users of ICU4X.

The value proposition of ICU4X is that we have fast, low-memory, and small-code modules. For clients who just care about good i18n, they don't care how we achieve that goal. These users aren't looking to introspect the ASTs or Patterns. In fact, many of these users will be over FFI, where it is unlikely that we expose ASTs or Patterns at all.

The users who might care about introspecting the ASTs and Patterns (who you describe as "tooling") are very specific libraries that are trying to extend ICU4X functionality in some way. I imagine Fluent might be an example of that.

The line I would draw is that we should focus primarily on the clients who come to us because of our value proposition and just want the good i18n. We should support the "tooling" customers only to the extent that they don't reduce the value to the primary target audience.

1 reply

sffc Sep 17, 2021
Maintainer

I would go one step further, which is to say that I don't see value in us supporting the general class of tooling customers without specific customers in mind. We can avoid a lot of tech debt if we focus on the one clear customer we have (those who just need good i18n), and discuss customers who need additional features on an as-needed basis.

zbraniecki · 2021-09-17T20:12:14Z

zbraniecki
Sep 17, 2021
Maintainer Author

I don't think of Fluent per se. I see general ast/parser/serializer package as a general enabler of technological advancement by the community.

A straw man use case is not Fluent to me, but rather a linter that wants to read date time patterns, modify them and save back the result.

Or some CAT augmentation that wants to do syntax coloring of the plural rules.

I agree that the challenge is that I'm not coming with a particular use case, but rather with a claim that if we were to provide the value, then we'd enable tools to be created. Thats a challenging claim to make since you can say that there are none that we know of, and i may say that it's because such bundle does not exist.

My thinking now is going in the direction of writing the code we need (runtime) but shaping the crates with space for the tooling part to be added if/when the need arises.

That would mostly mean using module structure that denotes a rules::runtime::parser/ast/resolver and leaves space for rules::tooling::parser/ast/serializer to be written later.

Would you be ok with using such convention ?

0 replies

zbraniecki · 2021-09-18T00:57:46Z

zbraniecki
Sep 18, 2021
Maintainer Author

To expand on my thinking. I imagine an architecture for such scenarios where we have a data that may be useful for tooling to look like this:

Additional From/Into may be introduced as needed, but initially, I'd focus on an architecture that allows us to fill any combination of those blocks as we see fit.

An esoteric data that we do not expect anyone to care about beyond us may look like this:

And for PluralRules, where we already have reference/tooling parser/ast/serializer, we'd do:

We have the full parser for the reference AST which produces data not needed at runtime like Samples, and we have a serializer. In result the "reference" module would be useful for general PluralRules tooling, while Runtime would basically only have AST and Resolver.
I would not plan to write a runtime parser, because we do not want to parse at runtime, so the cost of reference::Parser->reference::AST->runtime::AST is acceptable, but if we ever identify a need for runtime parser, we can write one.

Thanks to data driven tests it would be relatively easy to make sure that both sides work as one is a rountrip test and the other is resolver test and then AST<->AST can be also tested the same way.

I think this would greatly benefit our system making ICU4X not just a reference runtime implementation but also creating space for ICU4X to serve as a backbone for any I18n data tooling.

How does it sound to the group?

0 replies

sffc · 2021-09-19T16:30:09Z

sffc
Sep 19, 2021
Maintainer

Okay, I see. Here is how I would architect this.

PluralRules should operate on an optimized, immutable data structure ("runtime").
A more expressive, mutable data structure can be optionally provided ("reference").
You should be able to convert back and forth between "runtime" and "reference".

In other words, the "reference" data structure becomes a Builder for the "runtime" data structure.

I do not see value in allowing PluralRules to operate on the "reference" data structure directly. Clients who require the additional functionality can just convert it to "runtime" before handing it to PluralRules. Even though the conversion step requires some overhead, the end result may be an overall performance win, since the PluralRules select function will be at a performance maxima.

3 replies

zbraniecki Sep 20, 2021
Maintainer Author

Am I correct that your comment is 100% aligned with the chart I posted above for PluralRules (where PluralRules struct is runtime)?

sffc Sep 20, 2021
Maintainer

Mostly, except your diagram has an arrow directly between PluralRules and Reference. My comment suggests treating Reference as a builder for Runtime, without allowing it to be invoked directly by PluralRules.

zbraniecki Sep 20, 2021
Maintainer Author

Ah, the intention was aligned, the visual representation incorrect. The "PluralRule" box described what data we're covering in the diagram, and not the name of the struct/class to handle it. The 'runtime" is "PluralRule" here just like what you're saying.

zbraniecki · 2021-09-30T18:43:16Z

zbraniecki
Sep 30, 2021
Maintainer Author

Dumping a thought from the ICU4X call today - there is an option to have a single parser producing two AST's depending on the flag.

My concern here is that the needs of parsers producing two AST's may slowly diverge and if we have it as a flag it may incentivize working that around by having a suboptimal parser for both needs as a compromise.

Having a clean "two parsers" approach means that instead, if we have just one, we need to "loop around", like reference::Parser->reference::AST->runtime::AST instead of having unified::Parser(runtime:true)->runtime::AST, but I think this is acceptable.

And if we have enough reason to have a runtime parser, it should justify writing runtime::Parser alongside reference::Parser and over time I'd expect them to diverge where reference stays "true" to the spec and completeness, while runtime accrues optimizations possible when non-runtime bits are ignored/discarded.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime vs Tooling ASTs #1067

{{title}}

Replies: 5 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Runtime vs Tooling ASTs #1067

zbraniecki Sep 17, 2021 Maintainer

A) Focus only on Runtime

B) Focus on Tooling

C) Handle both

D) Another way?

Replies: 5 comments · 4 replies

sffc Sep 17, 2021 Maintainer

sffc Sep 17, 2021 Maintainer

zbraniecki Sep 17, 2021 Maintainer Author

zbraniecki Sep 18, 2021 Maintainer Author

sffc Sep 19, 2021 Maintainer

zbraniecki Sep 20, 2021 Maintainer Author

sffc Sep 20, 2021 Maintainer

zbraniecki Sep 20, 2021 Maintainer Author

zbraniecki Sep 30, 2021 Maintainer Author

zbraniecki
Sep 17, 2021
Maintainer

Replies: 5 comments 4 replies

sffc
Sep 17, 2021
Maintainer

sffc Sep 17, 2021
Maintainer

zbraniecki
Sep 17, 2021
Maintainer Author

zbraniecki
Sep 18, 2021
Maintainer Author

sffc
Sep 19, 2021
Maintainer

zbraniecki Sep 20, 2021
Maintainer Author

sffc Sep 20, 2021
Maintainer

zbraniecki Sep 20, 2021
Maintainer Author

zbraniecki
Sep 30, 2021
Maintainer Author