Auto e2e benchmarker. #372

raikonenfnu · 2024-01-25T21:12:20Z

Modifications to SharkLLM + Implementation of benchmarking script to track performance of SHARK-2.0 LLM models. Here is a sample output from the benchmarking script. https://gist.github.com/raikonenfnu/4120ddfdcb2964608c89d31079594d05

IanNod · 2024-01-25T21:58:55Z

I like the idea of this but concerned it took over 35 min for Test Turbine Models. Maybe this belongs more in a nightly than for every patch?

raikonenfnu · 2024-01-25T22:18:02Z

I like the idea of this but concerned it took over 35 min for Test Turbine Models. Maybe this belongs more in a nightly than for every patch?

Ah, thanks for the suggestion Ian. I think perhaps a good portion of the time is compiling the Stateless llama. let me try make it reuse the vmfb when possible. If that doesn't work, I can move it into some nightly action thing. :)

dan-garvey · 2024-01-25T23:38:17Z

@saienduri before you left your internship you were working on benchmark following Ben's fancy double vmfb thing. What happened to that?

raikonenfnu · 2024-01-25T23:42:08Z

@saienduri before you left your internship you were working on benchmark following Ben's fancy double vmfb thing. What happened to that?

Hey Dan, I think it's there, but it's using benchmark-module which is good for microbenchmarking as opposed to this one which tests perf on actual workload + e2e python.

raikonenfnu · 2024-01-26T19:41:50Z

I like the idea of this but concerned it took over 35 min for Test Turbine Models. Maybe this belongs more in a nightly than for every patch?

@IanNod I brought it down to 23minutes. I think before this test, it's ~18minutes. What do you think?

IanNod · 2024-01-27T00:56:51Z

I like the idea of this but concerned it took over 35 min for Test Turbine Models. Maybe this belongs more in a nightly than for every patch?

@IanNod I brought it down to 23minutes. I think before this test, it's ~18minutes. What do you think?

Huh, used to be ~10 mins. Wonder what brought it up to almost double that. I still feel this belongs more in a nightly but am fine with it for now as we have a lot of ramping up on CI work to do.

IanNod · 2024-01-27T00:48:16Z

python/turbine_models/tests/stateless_llama_test.py

+                hf_auth_token=None,
+                compile_to="vmfb",
+                external_weights="safetensors",
+                # external_weight_file="Llama-2-7b-chat-hf-function-calling-v2_f16_int4.safetensors", Do not export weights because this doesn't get quantized


Nit: remove commented code

IanNod · 2024-01-27T00:54:20Z

python/turbine_models/tests/stateless_llama_test.py

+        assert benchmark_result[1]["decoded_tokens"] == 25
+        assert benchmark_result[1]["num_iterations"] == 1
+        assert benchmark_result[1]["decode_speed(tok/s)"] > 0
+        assert benchmark_result[1]["prefill_speed(tok/s)"] > 0


Doesn't really test for regressions, just that it ran, right?

Auto e2e benchmarker.

d4e7db2

raikonenfnu requested review from dan-garvey and IanNod January 25, 2024 21:12

fix black lint.

09f030d

Reuse vmfb when possible.

a44c364

raikonenfnu added 3 commits January 25, 2024 16:24

Reduce num iteration for benchmark test.

47a817a

Further reduce to bare num tokens and iteration for fn test.

a05eff5

Fix to num iters.

41e1d8d

IanNod approved these changes Jan 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto e2e benchmarker. #372

Auto e2e benchmarker. #372

raikonenfnu commented Jan 25, 2024 •

edited

Loading

IanNod commented Jan 25, 2024

raikonenfnu commented Jan 25, 2024

dan-garvey commented Jan 25, 2024

raikonenfnu commented Jan 25, 2024

raikonenfnu commented Jan 26, 2024

IanNod commented Jan 27, 2024

IanNod Jan 27, 2024

IanNod Jan 27, 2024

Auto e2e benchmarker. #372

Are you sure you want to change the base?

Auto e2e benchmarker. #372

Conversation

raikonenfnu commented Jan 25, 2024 • edited Loading

IanNod commented Jan 25, 2024

raikonenfnu commented Jan 25, 2024

dan-garvey commented Jan 25, 2024

raikonenfnu commented Jan 25, 2024

raikonenfnu commented Jan 26, 2024

IanNod commented Jan 27, 2024

IanNod Jan 27, 2024

Choose a reason for hiding this comment

IanNod Jan 27, 2024

Choose a reason for hiding this comment

raikonenfnu commented Jan 25, 2024 •

edited

Loading