React Native Benchmark Suite #380

tido64 · 2020-01-16T17:59:29Z

tido64
Jan 16, 2020

Introduction

We are currently blind to any regressions that don't directly translate to pixels on the screen. Performance regressions, in particular, isn't always straight forward to measure as different platforms and devices may yield wildly different results depending on how you do the measurements.

The Core of It

A suite of benchmarks and tools should be run on all PRs, and on every commit to master, to catch regressions (or improvements) compared to a baseline.

What we should measure:

App Size
- Measure the app size and working set of a clean react-native init app or RNTester.
Performance Regressions
- React initialization time
  - Time to interaction (TTI)
  - Native module initialization
  - JS parsing
  - JS execution
- Number of thread jumps during initialization
- Blocked threads, and how long they were blocked for
- Memory usage
  - Working set
Memory Leaks
- Static analysis and sanitizers, e.g. run RNTester with MemorySanitizer
- Linters, e.g. Clang-Tidy
- Leverage online code analysis services such as LGTM
- Essentially, look to tools and automation to help us find bugs, memory leaks, and potential security issues.

How do we report this?

Make comments on PRs (and commits?) commenting on the delta change for all metrics.
Every night (or every commit on master) there should be a benchmark run to keep track of performance over time. The report should be publicly available, similar to the Firefox Performance Dashboard.

Suggested Timeline

Task	Estimate	Assignee	Issue #
Implement PR bot for posting comments from CI agents	1 week	@tido64	1 2 3
Nightly CI builds	1 week	@sweggersen	1
Add reporting for performance regressions compared to master	1 week	@tido64	1 2
Add reporting for app bundle size	1 week	@tido64	1
Add app memory usage tests	1 week	@tido64	1
Add reporting for app memory usage	1 week
Implement performance markers for React TTI	2 weeks
Implement performance markers for thread jumps/blocks	2 weeks
RNTester run with MemorySanitizer	2 weeks	@HeyImChris
RNTester run with UndefinedBehaviorSanitizer	2 weeks	@HeyImChris
Add reporting of static analysis results	1 week
Implement perf dashboards	???
Figure out whether we need a device farm or alternate solutions	???

Discussion points

Thoughts?
Missing metrics?
Milestones/timeline?

cc @sweggersen, @danrydholm

kelset · 2020-01-17T14:12:16Z

kelset
Jan 17, 2020
Maintainer

Hey @tido64 that sounds like a great idea!

I'll be working more closely on the topic of benchmarking & tooling from April but until then I think that starting this conversation is 🔥

I think that we should try to come up with at least a baseline of what is "acceptable" (ex. 200ms for a full-round from click to visual change), maybe on a range of different devices (ex. android+old, android+new, ios+old, ios+new).

The one benchmark that I can think of missing here is the "app bundling" benchmark, which could underline suboptimal (or improvements) of the native-side settings of the mobile app project; at least, this was an area I managed to find a few easy wins the last times I've done some performances improvements for a prod app I was working on -> https://gist.github.com/kelset/2eb61161a68d1ab35337d1eaa9a05e78

0 replies

kelset · 2020-01-17T15:05:25Z

kelset
Jan 17, 2020
Maintainer

Was totally forgetting, but @Kudo did a lot of work in the benchmarking space -> https://github.com/Kudo/react-native-js-benchmark

0 replies

tido64 · 2020-01-21T13:55:21Z

tido64
Jan 21, 2020
Author

@kelset That's great to hear. Do you have some sort of plan for what you want to do? Regarding the baseline, I like the idea of having goals. I was thinking more of having a diff against master but we should have both. Can you explain what you mean by "app bundling", and how it affects performance?

@Kudo How much work do you think it will take to bring this to iOS as well?

0 replies

oblador · 2020-01-21T18:02:46Z

oblador
Jan 21, 2020
Collaborator

I'm interested in this problem space and can help out 👍

0 replies

kelset · 2020-01-22T17:44:37Z

kelset
Jan 22, 2020
Maintainer

Do you have some sort of plan for what you want to do?

So the plan is a bit hand-wavey still but I'll be working 6 weeks full time on a react-native performances open source book (both from perspective of how to measure, finding the baselines, bottlenecks, etc and how to act upon those). So for your proposal I think that I'll be able to help you out with the sides of baseline & benchmarking (at least). But I'll only start in April AFAIK.

Regarding the baseline, I like the idea of having goals. I was thinking more of having a diff against master but we should have both.

Just for me to better understand - with diff you mean like a report of the "delta" between commits? Like, "this commit has made the TTI 30 ms slower"?

Can you explain what you mean by "app bundling", and how it affects performance?

So, by app bundling I mean the process that both iOS and Android need to perform to generate the .ipa/.apk. I think that by keeping an eye on this too we could make sure that whenever native-side configurations are touched, they are not leading to less optimal processes (ex. let's say a commit changes the compiler optimization, if we check the build time we could detect if it's actually improving the overall time VS making it more slower, etc).

Another side effect of this is that it would help detect easy wins in terms of the native-side project configuration: let's say that we never realised that we can use the App Bundling split feature of Android (can't recall the precise name of it atm, the one that only bundles for the precise architecture)(I may be missremembering completely btw, my brain is ded atm). By having a baseline (ex. a fresh new Android app takes 2 mins to generate the apk) we could detect that our config, because it was first created let's say 3 years ago, it's x10 slower because we didn't update that particular config to take advantage of the new native side tooling. And the side effect of this is that then the app will be built more efficiently for the device it will run against, leading to then better "in-app" perfs.

0 replies

tido64 · 2020-01-23T12:05:09Z

tido64
Jan 23, 2020
Author

Just for me to better understand - with diff you mean like a report of the "delta" between commits? Like, "this commit has made the TTI 30 ms slower"?

Correct. The baseline should the branch you're merging against, most commonly master branch. That way, we catch perf regressions before they get merged.

So, by app bundling I mean the process that both iOS and Android need to perform to generate the .ipa/.apk. I think that by keeping an eye on this too we could make sure that whenever native-side configurations are touched, they are not leading to less optimal processes (ex. let's say a commit changes the compiler optimization, if we check the build time we could detect if it's actually improving the overall time VS making it more slower, etc).

I'm not sure if this makes sense. If we misconfigured React Native to be built with with -Og instead of -Os, we will improve build time but performance will be worse for sure. Maybe it makes more sense to open up the bundles and see what's in there? E.g. for Android, we can check whether you've created bundles for each supported architecture, and that all the .so files only include code for the target architecture. The checks are more explicit and there will be less confusion.

0 replies

kelset · 2020-01-23T17:09:09Z

kelset
Jan 23, 2020
Maintainer

I see - makes sense!

Ok then let's leave the app bundling out of this... there's plenty already anyway ;)

0 replies

tido64 · 2020-01-30T11:20:02Z

tido64
Jan 30, 2020
Author

I've added a table of tasks and estimates. Let me know if I missed anything. Also feel free to assign yourself to any of them 😉

0 replies

elicwhite · 2020-01-30T16:17:50Z

elicwhite
Jan 30, 2020
Maintainer

Thanks for adding the table! The measurement that we get asked for a lot internally is what the “working set” is of React Native. All the people that ask for that are ex Microsoft :-). We aren’t fully sure how we are supposed to measure that, and there is no one better to help us get that set up than you! :-)

I’d like to make sure that is a part of this as that will help us get long awaited answers for our VPs asking for it.

0 replies

shergin · 2020-01-30T18:27:58Z

shergin
Jan 30, 2020

BTW, we have a couple of synth performance test in new Fabric infra, and we (I) plan to add a couple more that will cover the most performance-critical (hot paths) aspects of the UI part of the framework (prop parsing, diffing, layout, nodes cloning and so on). We use Google Benchmark for that.

The challenging (unexplored) part of that is running that continuously on CI and running that on ARM CPUs to get actual results for architectures we care about (x86 is usually fast enough).

0 replies

tido64 · 2020-02-05T13:01:42Z

tido64
Feb 5, 2020
Author

@shergin That's awesome! How much work is it to get them pushed to the open source repo? I think it's a good starting point regardless. We can figure out how to get non-x86 results after we have them running continuously. I've added a task for it.

@TheSavior Updated the table. I think we have a team internally who have solved this. I can hear with them.

0 replies

Kudo · 2020-02-05T16:08:31Z

Kudo
Feb 5, 2020
Maintainer

react-native-js-benchmark currently focused at JS engine comparison but I think it could support to change as CI regression monitoring.

For iOS support, the most difficult part is about running on real devices including the developer certificates and code signing.
Fortunately, we are focusing on CI regression, testing by simulators would be enough.
I could add iOS simulators support from my very limited free time, maybe 1.5 weeks estimation.

0 replies

danrydholm · 2020-02-05T16:15:50Z

danrydholm
Feb 5, 2020

From the MS side, we're going to look into funding CI using AppCenter which has all the real devices we would need for iOS and Android.

0 replies

Kudo · 2020-02-11T04:29:24Z

Kudo
Feb 11, 2020
Maintainer

Please let me clarify if my understanding is correct.
Does the benchmark focus at regression across different builds?
What I did for react-native-js-benchmark is a fixed pre-configured project that is not for regression.

I had a similar work which to use RN-cli to generate a pure project and patch project to test my V8 integration.
These scripts are written in Python and things like that I should rewrite a JavaScript version.

0 replies

tido64 · 2020-02-11T12:51:09Z

tido64
Feb 11, 2020
Author

@Kudo, I think it makes sense to just report the numbers in the first iteration. We can add diffing in the next iteration. @sweggersen had some ideas around how to do this.

0 replies

Kudo · 2020-02-12T07:18:13Z

Kudo
Feb 12, 2020
Maintainer

Got it, will start to scratch some works accordingly.

0 replies

grabbou · 2020-02-17T02:04:47Z

grabbou
Feb 17, 2020
Maintainer

This is really good idea! Looks like the nightly builds are already taken, but if you @sweggersen need any help from the CI/releases perspective, please let me know!

My main question is how are we planning on broadcasting regressions that happen on master? PRs are one thing, but I think that the additional benefit is being able to track and double-test the diffs that are being synced from Facebook every once in a while.

I am not sure if there's mechanism for that yet in place, but being able to notify early of CI failures (e.g. this commit has broken CI on master or this commit has caused certain regressions) could prevent situations where we are stuck with red CI for a long time and breakages just keep adding up.

0 replies

elicwhite · 2020-02-29T08:01:50Z

elicwhite
Feb 29, 2020
Maintainer

What's the next step for the the bot that is posting about the app sizes? I'm excited for the future plans but as-is, it's a bit...loud.

Is it possible to get the bot to delete all of its previous comments in a PR when it posts. Check out this PR for a particular example where since there are lots of changes, most of the comments are from the bot. facebook/react-native#27908

If it is going to be a while before the bot is able to start commenting with deltas which are a bit more actionable, does it make sense to temporarily disable the bot? I'm not sure the absolute numbers being shared on PRs are that helpful currently.

0 replies

tido64 · 2020-02-29T08:52:35Z

tido64
Feb 29, 2020
Author

@TheSavior: We've just merged some improvements to the bot. It should now update its posts instead. Let me know if it's still too loud, and we'll figure something out.

0 replies

Kudo · 2020-03-03T16:42:29Z

Kudo
Mar 3, 2020
Maintainer

I was busy for the past weeks and sorry not meet my original plan.
Anyway, I figured out it would be a good fit to leverage existing react-native-cli run-ios/run-android commands and the plugin mechanism.
Here is what I did: https://github.com/Kudo/react-native-cli-plugin-benchmark
So far there are three commands:

react-native get-appsize-ios
react-native get-appsize-android
react-native measure-ios

Thank for @grabbou to have the awesome plugin support in cli.
This is my first Typescript experience and also learn a lot from cli's code.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

React Native Benchmark Suite #380

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 20 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

React Native Benchmark Suite #380

tido64 Jan 16, 2020

Introduction

The Core of It

Suggested Timeline

Discussion points

Replies: 20 comments

kelset Jan 17, 2020 Maintainer

kelset Jan 17, 2020 Maintainer

tido64 Jan 21, 2020 Author

oblador Jan 21, 2020 Collaborator

kelset Jan 22, 2020 Maintainer

tido64 Jan 23, 2020 Author

kelset Jan 23, 2020 Maintainer

tido64 Jan 30, 2020 Author

elicwhite Jan 30, 2020 Maintainer

shergin Jan 30, 2020

tido64 Feb 5, 2020 Author

Kudo Feb 5, 2020 Maintainer

danrydholm Feb 5, 2020

Kudo Feb 11, 2020 Maintainer

tido64 Feb 11, 2020 Author

Kudo Feb 12, 2020 Maintainer

grabbou Feb 17, 2020 Maintainer

elicwhite Feb 29, 2020 Maintainer

tido64 Feb 29, 2020 Author

Kudo Mar 3, 2020 Maintainer

tido64
Jan 16, 2020

kelset
Jan 17, 2020
Maintainer

kelset
Jan 17, 2020
Maintainer

tido64
Jan 21, 2020
Author

oblador
Jan 21, 2020
Collaborator

kelset
Jan 22, 2020
Maintainer

tido64
Jan 23, 2020
Author

kelset
Jan 23, 2020
Maintainer

tido64
Jan 30, 2020
Author

elicwhite
Jan 30, 2020
Maintainer

shergin
Jan 30, 2020

tido64
Feb 5, 2020
Author

Kudo
Feb 5, 2020
Maintainer

danrydholm
Feb 5, 2020

Kudo
Feb 11, 2020
Maintainer

tido64
Feb 11, 2020
Author

Kudo
Feb 12, 2020
Maintainer

grabbou
Feb 17, 2020
Maintainer

elicwhite
Feb 29, 2020
Maintainer

tido64
Feb 29, 2020
Author

Kudo
Mar 3, 2020
Maintainer