-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TF-Lite GPU benchmark results? #91
Comments
As far as I know, TFLite only provides GPU acceleration via AndroidNN, which is available from Android 8.1. Unfortunately, the latest phones we have only support Android 8.0. If someone has a newer phone, we can provide instructions on how to benchmark TFLite there (specifically, MobileNets we are contributing to MLPerf). |
Thanks for clarification. For example checking ARM Mali-T830 in GPU dropbox show me benchmarks that are all on CPU and OpenCL(as far as I can see in Also I have found this ai benchmark for android smartphones: |
Hi @mrgloom . If I am correct, we had time to add 2 scenarios with GPU: Caffe (OpenCL version) and ArmCL: https://github.com/ctuning/ck-crowd-scenarios/tree/master/experiment.scenario.mobile . Note that our OpenCL versions work exclusively on GPU (I believe that we force it in scenarios - @psyhtest, can you please confirm?), so if you see OpenCL, you can assume that this scenario ran on GPU. I also guess that there is just a lack of data if you don't see many GPU points - this Android app was run by volunteers but we are not advertising it too much now. It was a proof-of-concept project and we are now trying to build a more user-friendly way of adding scenarios on top of our low-level CK plugins. However, maybe you can try to run it on your newer mobile and see if these GPU scenarios are still working (Caffe OpenCL and ArmCL). You can get Android app here: http://cknowledge.org/android-apps.html . Please, tell us if it works or not - I will be curious to see the results Thank you very much for your feedback! |
I have successfully run the app on smartphone with android 8.0.0 Here is the list with comments:
In my benchmarks TFLite CPU faster then ArmCL(for MobileNets v1 0.25 128) and Caffe CPU faster then Caffe OpenCL(for SqueezeNet 1.1): |
Now you can! Please take a look at our brand new dashboard functionality for the MobileNets implementations (which we are contributing to MLPerf Inference): http://cknowledge.org/dashboard The default workflow "MobileNets (highlights)" currently shows MobileNets v1/v2 with TFLite 0.1.7 on Firefly RK3399 and Linaro HiKey960, as well as best points for MobileNets v1 with Arm Compute Library v18.08 on HiKey960 (which can serve as a vendor submission example). By default, the The workflow "MobileNets (all)" (select from the dropdown menu) includes all ArmCL points exploring available options for the convolution method, data layout and kernel tuner choices. You can discern these options on the plot thanks to the Have fun! ... and please let us know if you have any questions or suggestions. |
The model itself is only ~2 MB but we bundle together the engine (i.e. the library and the client program). I suspect we include a debug build as we had issues on Android:
The good news is that the same engine is reused across all ArmCL OpenCL MobileNets samples. This means that if you add any other such sample model, you will only need to download a few MB of extra weights. /cc @Chunosov |
That's expected for very small models. There's simply not enough work to keep the GPU busy, and CPU caching works well. However, if you look at the MobileNets highlights, most GPU points (with dots) lie on the Pareto-optimal frontier: for any such point, to improve speed (move left), you need to loose accuracy (move down); similarly, to improve accuracy (move up), you need to loose speed (move right). |
Seems like Also seems google also have benchmark results for single phone(Pixel 1) for MobileNet variants and ShuffleNet. |
While HiKey960 is a development board, it has the same chip (Hisilicon Kirin960) that Huawei used in their several popular phones (including Mate 9 Pro and P10). I have results from a real Mate 10 Pro too. The graph in that repo is from the original MobileNets v2 paper but it's very crude: you can only guess which model is shown and estimate its peformance (e.g. ±1 ms) and accuracy (e.g. ±1%). Besides, it's very hard to reproduce: it's taken us several weeks to understand how to load the weights, how to preprocess the inputs and interpret the outputs. But now anyone can run experiments across many platforms, under different conditions, try different datasets and so on. You would be very welcome to contribute your experimental data to the dashboard. |
I've added TFLite results on Huawei Mate 10 Pro (HiSilicon Kirin 970) and Samsung Galaxy S8 US (Qualcomm Snapdragon 835). You may want to filter the results by
Note, however, that the Linux devices (HiKey960 and RK3399) had the CPU frequencies set to the maximum, while the Android devices (Mate 10 Pro and Galaxy S8 US) were non-rooted, so the CPU frequencies were managed automatically. |
Looks good, but it will be great if anyone can share link with current 'view' of dashboard. Also does peak memory usage is stored somewhere in benchmark logs? Are Update: |
Thanks for your feedback! Yes, supporting links with settings is on our roadmap.
Not at the moment. Storing would be easy, but we need to know how to measure this reliably. Do you have any suggestions?
Of course, the links are provided in the MobileNets-v1 and MobileNets-v2
As I explained above, however, you then need to perform many manual steps (which CK does behind the scenes). Also note that the TFLite Model Benchmarking Tool uses random data, so cannot be used to measure accuracy. |
Also a question are tflite models are benchmarked in single theaded mode? |
In the default mode which happens to be multithreaded. By the way, I think part of the variation in the results is due to thread migration between big and LITTLE cores. We are planning to set up thread affinity to reduce the variation. |
What is |
Sounds about right. Most high end mobile chips have 4 big cores, so if the 4 threads get allocate to those, you should get good enough performance. As I mentioned, tuning the number of threads and how they are pinned to cores (thread affinity) is something we want to do in the future. |
Are any TF-Lite GPU benchmark results for mobile phone are available?
The text was updated successfully, but these errors were encountered: