[Question]In the output results of attention_with_fused_qkv funcs, some slice accuracies are abnormal #3093

ifndefendif · 2025-01-17T03:49:35Z

❓ General Questions

Hello, I encountered an issue while deploying using mlc_llm in cpp.
The model is using Qwen2.5-0.5B.
kv_cache is created using "creat_tir_cged_kv_cache".
When performing a prefill, it was found that the calculation result of "paged_kv_cache.attention_with_fused_qkv" did not meet expectations.
The input of qkv here is normal, and the dimensions of the output are [b, s, hq * d], of which approximately [b, s * 0.78 : , hq * d]The results are abnormal (testing tokens of different lengths for prefill all follow this rule), but there is a significant accuracy error in the subsequent results.
What could be the reason?
Thanks~

MasterJH5574 · 2025-01-22T06:14:55Z

Hi @ifndefendif, thank you so much for bringing this up! Just want to make sure I understand your description, do you mind elaborating a bit on this?

the dimensions of the output are [b, s, hq * d], of which approximately [b, s * 0.78 : , hq * d]The results are abnormal

I'm wondering what the 0.78 factor here means.

ifndefendif · 2025-01-24T09:20:58Z

Hi @ifndefendif, thank you so much for bringing this up! Just want to make sure I understand your description, do you mind elaborating a bit on this?

the dimensions of the output are [b, s, hq * d], of which approximately [b, s * 0.78 : , hq * d]The results are abnormal

I'm wondering what the 0.78 factor here means.

It seems like there might be a proportional range issue. The dimensions of qkv are [batch_size, token_lens, hidden_state]. When there are 100 tokens, the accuracy is correct for the first 78 tokens, but problematic for the last 22. Similarly, with 200 tokens, the results are approximately correct for the first 156 tokens. This pattern suggests that the abnormal calculation accuracy could be related to the length of token_lens. I would like to understand what potential factors within the function itself might be influencing these calculation results.
Thanks～

MasterJH5574 · 2025-01-27T15:03:29Z

Thanks @ifndefendif! Did you observe this on CUDA or on other platforms? It would be very helpful if you can share the commands you used to run Qwen2.5-0.5B. It can be a TIR attention kernel bug in this case and we need to look into it.

MasterJH5574 · 2025-01-27T15:06:45Z

And just to make sure I'm understanding correctly--this observation is for prefill (not decode) right?

ifndefendif · 2025-01-31T02:18:52Z

Thanks @ifndefendif! Did you observe this on CUDA or on other platforms? It would be very helpful if you can share the commands you used to run Qwen2.5-0.5B. It can be a TIR attention kernel bug in this case and we need to look into it.

Thank you for your response. The hardware device is NVIDIA A10.

ifndefendif · 2025-01-31T02:21:22Z

And just to make sure I'm understanding correctly--this observation is for prefill (not decode) right?

Yes, the issue indeed occurs during the prefill stage. However, since there was a calculation error during the prefill stage, the correctness of the results in the decode stage could not be verified.

ifndefendif added the question Question about the usage label Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]In the output results of attention_with_fused_qkv funcs, some slice accuracies are abnormal #3093

[Question]In the output results of attention_with_fused_qkv funcs, some slice accuracies are abnormal #3093

ifndefendif commented Jan 17, 2025 •

edited

Loading

MasterJH5574 commented Jan 22, 2025

ifndefendif commented Jan 24, 2025

MasterJH5574 commented Jan 27, 2025

MasterJH5574 commented Jan 27, 2025

ifndefendif commented Jan 31, 2025

ifndefendif commented Jan 31, 2025

[Question]In the output results of attention_with_fused_qkv funcs, some slice accuracies are abnormal #3093

[Question]In the output results of attention_with_fused_qkv funcs, some slice accuracies are abnormal #3093

Comments

ifndefendif commented Jan 17, 2025 • edited Loading

❓ General Questions

MasterJH5574 commented Jan 22, 2025

ifndefendif commented Jan 24, 2025

MasterJH5574 commented Jan 27, 2025

MasterJH5574 commented Jan 27, 2025

ifndefendif commented Jan 31, 2025

ifndefendif commented Jan 31, 2025

ifndefendif commented Jan 17, 2025 •

edited

Loading