Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose n_ubatch context param #504

Merged
merged 1 commit into from
Sep 23, 2024
Merged

Conversation

brittlewis12
Copy link
Contributor

follow-up to #284, closes #291

thanks again for your ongoing efforts on this project @MarcusDunn! i'm more than happy to add or change anything at your request

* n_batch is responsible for max number of tokens
  llama_decode can accept in a single call (a single
  "batch")
* n_ubatch is lower level, corresponding to hardware
  batch size during decoding. must be less than or
  equal to n_batch.
  - ggml-org/llama.cpp#6328 (comment)
  - https://github.com/ggerganov/llama.cpp/blob/557410b8f06380560155ac7fcb8316d71ddc9837/common/common.h#L58
@MarcusDunn MarcusDunn merged commit b1420f3 into utilityai:main Sep 23, 2024
2 of 5 checks passed
@brittlewis12 brittlewis12 deleted the ubatch branch September 25, 2024 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

expose n_ubatch
2 participants