Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor code in preparation for SGLang PR #5

Merged
merged 22 commits into from
Jan 28, 2025

Conversation

daniel-geon-park
Copy link
Collaborator

@daniel-geon-park daniel-geon-park commented Jan 28, 2025

Motivation

We need to make our SGLang dev branch fit for making a pull request to the upstream SGLang repo.

Modifications

For maximum flexibility and extensibility once our code is merged to mainstream, I moved most of the HiP Attention-internal logic from this repo into DeepAuto-AI/hip-attention:

  • HiP Config (python/sglang/srt/layers/attention/hip_attention/hip_config.py)
  • HiP Mask refresh control mechanism (python/sglang/srt/managers/tp_worker_overlap_thread.py)
  • HiP paged attention interface (python/sglang/srt/layers/attention/hip_attention/hip_radix_attention.py)
  • Model-wide Offload cache (python/sglang/srt/mem_cache/hip_offload_kv_pool_mha.py)

Also, I remove the following files, and instead integrate their functionality into the existing SGLang classes:

  • python/sglang/srt/layers/attention/hip_attention/hip_cuda_graph_runner.py
  • python/sglang/srt/mem_cache/hip_memory_pool.py
  • python/sglang/srt/model_executor/hip_model_runner.py

Checklist

Copy link

@kbumsik kbumsik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving some code to hip-attention is a brilliant idea 👍
This should make us to contribute SGLang easier.

So DeepAuto-AI/hip-attention#25 should be merged as well?

@daniel-geon-park
Copy link
Collaborator Author

daniel-geon-park commented Jan 28, 2025

I think I need a review from Heejun before deciding to merge. Some changes might not be 100% correct.

@kbumsik
Copy link

kbumsik commented Jan 28, 2025

Sure thing! Every members needs to acknowledge code changes in deepauto/dev

@gmlwns2000
Copy link
Collaborator

gmlwns2000 commented Jan 28, 2025

It looks very GOOD!

However, I have a suggestion:

  • I think we should pass layer: RadixAttention and self: HiPRadixAttentionBackend to forward_paged_hip. We may need to access layer-wise and backend-wise variables (e.g., layer.id, self.hip_config, tp_rank, etc.) or set the variables in the forward function in the future. However, this does not mean I want to change the forward_paged_hip as a stateful function, but for extensibility and accessibility from the hip library.

What do you guys think about it? @daniel-geon-park @kbumsik

@daniel-geon-park
Copy link
Collaborator Author

daniel-geon-park commented Jan 28, 2025

If we pass a non-stable RadixAttention-type value into hip-attention, then if the sglang authors change the interface of the RadixAttention class in the future (for example, renaming tp_q_head_num into tp_q_heads or something like that), then our code will be broken.

If that happens, the sglang authors will have no way to fix this issue without modifying our code in the hip-attention repo.

So, I don't think it is wise to make a change where our hip-attention code depends on the internal interface of sglang.

Instead, we could future-proof it by explicitly passing more values that we might need in the future to forward_paged_hip().

@kbumsik
Copy link

kbumsik commented Jan 28, 2025

I am with @daniel-geon-park .

We need to avoid a cyclic dependency.

@gmlwns2000
Copy link
Collaborator

Thanks for the comment! I now understand it. Agree with @daniel-geon-park
Seems good to merge now, then!

@daniel-geon-park daniel-geon-park merged commit 3c53300 into deepauto/dev Jan 28, 2025
1 check passed
@daniel-geon-park daniel-geon-park deleted the deepauto/feat/refactor-code branch January 28, 2025 22:02
@daniel-geon-park
Copy link
Collaborator Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants