Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip sampling softmax #801

Merged
merged 15 commits into from
Feb 10, 2025

Conversation

attafosu
Copy link

@attafosu attafosu commented Feb 7, 2025

Fixes a bug in previous PR704 - add force_greedy_sample

  • For certain sampling configurations (e.g greedy or top_k=1), certain ops are not necessary.
  • Here we skip 3 such ops in the sampler: scaling by temperature, softmax of logits, and logsoftmax of logits, and work with the raw logits directly
  • The feature is enabled via env variable: export VLLM_FORCE_GREEDY_SAMPLE=1

Copy link

@jikunshang jikunshang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks your great job!

@attafosu attafosu requested a review from szutenberg February 8, 2025 18:25
@attafosu
Copy link
Author

attafosu commented Feb 9, 2025

@szutenberg Can you take a look so we can merge if it's okay?

@attafosu attafosu merged commit bdaf88f into HabanaAI:mlperf_features Feb 10, 2025
6 of 44 checks passed
attafosu added a commit that referenced this pull request Feb 10, 2025
Reverts #801

Run breaks with [Merge branch 'mszu/b3072' into
skip_sampling_softmax](9aec93d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants