-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU Performance Regression? (Older version much faster) #2099
Comments
Thank you for the report. Can you provide what's your current OS and compiler? Could you try running That could make your new run potentially a bit more comparable to the old one. |
It is Ubuntu 22.04.4. All running on the same machine in different folders, fresh compiled. Without AVX512 it is a bit better indeed, but still not the same, somehow in the middle. total time = 5086.27 ms whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin' system_info: n_threads = 4 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 whisper_print_timings: load time = 56.47 ms |
You may want to try also with
|
It looks like the former default was beam-size=-1 ? bench doesnt support beam-size, so I am trying a real wav file, and it does improve the speed default branch (new, with AVX512): AVX512=1 beam-size 5 default total time = 20678.01 ms AVX512=1 beam-size 2 total time = 17052.18 ms AVX512=1 beam-size -1 total time = 15465.98 AVX512=0 beam-size 5 default total time = 19365.01 ms AVX512=0 beam-size 2 total time = 15219.21 ms AVX512=0 beam-size -1 total time = 13869.20 ms Old version: AVX512=0 beam-size 5 total time = 21862.52 ms AVX512=0 beam-size 2 total time = 14704.33 ms AVX512=0 beam-size -1 default total time = 12398.81 ms Interesting results, especially the AVX issue. We'll play around with it a bit. Thanks for your help! (note: beam-search default seems to have changed from -1 to 2 to 5 now : https://github.com/ggerganov/whisper.cpp/blob/master/whisper.cpp#L4625 ) |
I was referring to changes in So you may want to try Fox AVX-512 vs Ryzen let me mention: Ubuntu 22.04 has relatively old compiler. Results from more recent maybe could be different. I'm wondering if |
We are experiencing a similar behavior when comparing version 1.4.3 with the latest 1.5.5. Also we are not using beam search (by setting Seems the |
Could you do |
Ok, so my
Any idea how this commit could influence the performance in such a bad way? BTW: The performance drop on our side is about ~40% |
I dropped this comment to test if it will fix the performance problem but it didn't. Is there anything else I can try out based on these informations? |
@ggerganov, do you have any ideas what else @Linux13524 could try or tweak in pursuit of restoring whisper.cpp performance from 1.4.x in 1.5.x? |
Hm not sure. @Linux13524 Is this CPU-only or using CUDA / Metal backend? |
We first noticed it while testing the new CUDA performance, but my git bisects from above I did using CPU-only. |
Just a side comment and follow-up to my earlier comment:
I made: |
Any update on this? It seems we still have these performance issues on the latest version. |
I compared an older version from Nov 23 with Apr 24, and the older version is much faster.
total time = 6225.76 ms
vs
total time = 3817.54 ms
Same CPU, same compiler and settings, same test:
CPU: AMD Ryzen 9 7950X3D 16-Core
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_init_state: kv self size = 16.52 MB
whisper_init_state: kv cross size = 18.43 MB
whisper_init_state: compute buffer (conv) = 16.39 MB
whisper_init_state: compute buffer (encode) = 132.07 MB
whisper_init_state: compute buffer (cross) = 4.78 MB
whisper_init_state: compute buffer (decode) = 96.48 MB
system_info: n_threads = 4 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0
whisper_print_timings: load time = 64.61 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 878.59 ms / 1 runs ( 878.59 ms per run)
whisper_print_timings: decode time = 935.20 ms / 256 runs ( 3.65 ms per run)
whisper_print_timings: batchd time = 544.69 ms / 320 runs ( 1.70 ms per run)
whisper_print_timings: prompt time = 3865.51 ms / 4096 runs ( 0.94 ms per run)
whisper_print_timings: total time = 6225.76 ms
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: model ctx = 140.66 MB
whisper_model_load: model size = 140.54 MB
whisper_init_state: kv self size = 5.25 MB
whisper_init_state: kv cross size = 17.58 MB
whisper_init_state: compute buffer (conv) = 18.50 MB
whisper_init_state: compute buffer (encode) = 81.95 MB
whisper_init_state: compute buffer (cross) = 4.49 MB
whisper_init_state: compute buffer (decode) = 24.70 MB
system_info: n_threads = 4 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 |
whisper_print_timings: load time = 83.24 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 693.48 ms / 1 runs ( 693.48 ms per run)
whisper_print_timings: decode time = 874.80 ms / 256 runs ( 3.42 ms per run)
whisper_print_timings: prompt time = 2249.08 ms / 16 runs ( 140.57 ms per run)
whisper_print_timings: total time = 3817.54 ms
See #89 (comment)
The text was updated successfully, but these errors were encountered: