Skip to content

Commit

Permalink
server : refactor slot input data, move tokenizer to HTTP thread (#10023
Browse files Browse the repository at this point in the history
)

* server : refactor slot input data, move tokenizer to HTTP thread

* move prompt_tokens.empty() check

* fix incorrect if branch

* fix infinite generation loop

* bring back infill validation

* add infill test

* try fixing format_infill

* fix test

* remove redundant code

* rename completion to inference

* update docs

* use llama_tokens everywhere
  • Loading branch information
ngxson authored Oct 24, 2024
1 parent 40f2555 commit 958367b
Show file tree
Hide file tree
Showing 5 changed files with 468 additions and 348 deletions.
12 changes: 12 additions & 0 deletions examples/server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,18 @@ node index.js
- The prompt is a string or an array with the first element given as a string
- The model's `tokenizer.ggml.add_bos_token` metadata is `true`

These input shapes and data type are allowed for `prompt`:

- Single string: `"string"`
- Single sequence of tokens: `[12, 34, 56]`
- Mixed tokens and strings: `[12, 34, "string", 56, 78]`

Multiple prompts are also supported. In this case, the completion result will be an array.

- Only strings: `["string1", "string2"]`
- Strings and sequences of tokens: `["string1", [12, 34, 56]]`
- Mixed types: `[[12, 34, "string", 56, 78], [12, 34, 56], "string"]`

`temperature`: Adjust the randomness of the generated text. Default: `0.8`

`dynatemp_range`: Dynamic temperature range. The final temperature will be in the range of `[temperature - dynatemp_range; temperature + dynatemp_range]` Default: `0.0`, which is disabled.
Expand Down
Loading

0 comments on commit 958367b

Please sign in to comment.