New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Expose `n_ubatch` context param #504

Merged

MarcusDunn merged 1 commit into utilityai:main from brittlewis12:ubatch

Sep 23, 2024

Contributor

brittlewis12 commented Sep 22, 2024

follow-up to #284, closes #291

n_batch is responsible for max number of tokens llama_decode can accept in a single call (a single "batch")
n_ubatch is lower level, corresponding to hardware batch size during decoding. as such, should be less than or equal to n_batch.
- What's the difference between batch-size and ubatch-size? ggml-org/llama.cpp#6328 (comment)
- https://github.com/ggerganov/llama.cpp/blob/557410b8f06380560155ac7fcb8316d71ddc9837/common/common.h#L57-L58

thanks again for your ongoing efforts on this project @MarcusDunn! i'm more than happy to add or change anything at your request


          Expose n_ubatch context param

56625a6

* n_batch is responsible for max number of tokens
  llama_decode can accept in a single call (a single
  "batch")
* n_ubatch is lower level, corresponding to hardware
  batch size during decoding. must be less than or
  equal to n_batch.
  - ggml-org/llama.cpp#6328 (comment)
  - https://github.com/ggerganov/llama.cpp/blob/557410b8f06380560155ac7fcb8316d71ddc9837/common/common.h#L58

MarcusDunn approved these changes

View reviewed changes

MarcusDunn merged commit b1420f3 into utilityai:main

2 of 5 checks passed

brittlewis12 deleted the ubatch branch

September 25, 2024 04:04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet