-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing the size of WidgetState #706
Comments
Do you know how values are aligned in Rust? I looked it up and it's platform specific, but alignment could reduce any benefit from switching from f64 to f32 or f16. |
In general aggregates are aligned with their most-aligned field. So a (Also, just because a struct is N-aligned doesn't mean its fields are N-aligned. So for instance WidgetState is 8-aligned, but individual bool flags of WidgetState are 1-aligned.) |
Also Rust is able to do some optimizations, so for That said, I'd say benching and profiling first, before attempting potentially premature-optimizations (while I'm currently doing pretty much that in a different context^^). My guess is, that we have greater potential for optimization elsewhere for a long time. |
Right now WidgetState is a pretty chunky struct. VSCode with Rust Analyzer currently tells us:
Yeah. Three and a half widgets are enough to fill a kilobyte. Your usual L1 cache is about 64KB (from a quick Wikipedia check), which is 216 WidgetStates.
That's probably pretty bad. A lot of realistic applications have more than two hundred widgets, and we don't want them to overflow the L1 cache. (For reference, this github issue after two answers has 2467 html elements.)
What can we do to reduce the size? Some ideas:
Bitflags
There are 26 boolean flags.
They take 26 bytes, which we could get down to 4 bytes with bitflags.
However, bitflags are more annoying to use that booleans, and switching to bitflags would only be enough for a <10% size reduction.
Replace f64 values with f32s
We use a lot of f64 types:
size
: 2.origin
: 2window_origin
: 2paint_insets
: 4local_paint_rect
: 4baseline_offset
: 1ime_area
: 4clip_path
: 4translation
: 2That's a total of 25 f64 values, which is 200 bytes, two thirds of our size.
Realistically, virtually all of them could be replaced with f32s with almost no code change, saving us about 100 bytes.
Most of them deal with local coordinates where we expect that most of the range of f64s will be wasted. A widget is never gonna have billions of pixels of paint insets, for instance.
If we ever decide to switch to fractional coordinates, we could store a lot of them as
u16
, which would be yet more size savings.Optional fields
The fields
ime_area
andclip_path
are optional. Moreover, fields likepaint_insets
,baseline_offset
andtranslation
aren't optional, but their value is almost always zero.(
cursor
too, but the type itself is very lightweight.)We could store these fields in an
Option<Box<...>>
, and save memory for 90% of widgets.Struct-of-array
The most radical change would be to store WidgetStates in a struct-of-arrays pattern.
The problem with WidgetState right now is that a single instance occupies multiple cache lines. That means that accessing two widget states is always going to require touching multiple cache lines.
If we could instead store WidgetStates by field, the cruft would be a lot less of a problem: code iterating over flags would keep the flag array in L1 and wouldn't care that some massive amount of layout bytes is being kept in main memory somewhere.
Switching to struct-of-arrays would be a very ambitious change, probably requiring some radical improvements to the architecture of TreeArena. It would probably impact every single line of code currently using WidgetState.
Benchmark?
Before we do any of that, we probably want to check that Masonry's memory usage is actually bad, is actually bottlenecked on WidgetState, and that L1 misses have an actual impact on our performance. I'm not sure how we'd check that, suggestions are welcome.
I don't expect us to address these problems overnight. We should expect this to be a long-term issue.
The text was updated successfully, but these errors were encountered: