-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: streaming documentation (#5980)
Co-authored-by: Scott Martens <[email protected]>
- Loading branch information
1 parent
4af0308
commit e51ddca
Showing
4 changed files
with
156 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# Build a Streaming API for a Large Language Model | ||
```{include} ../../README.md | ||
:start-after: <!-- start llm-streaming-intro --> | ||
:end-before: <!-- end llm-streaming-intro --> | ||
``` | ||
|
||
## Service Schemas | ||
```{include} ../../README.md | ||
:start-after: <!-- start llm-streaming-schemas --> | ||
:end-before: <!-- end llm-streaming-schemas --> | ||
``` | ||
|
||
```{admonition} Note | ||
:class: note | ||
Thanks to DocArray's flexibility, you can implement very flexible services. For instance, you can use | ||
Tensor types to efficiently stream token logits back to the client and implement complex token sampling strategies on | ||
the client side. | ||
``` | ||
|
||
## Service initialization | ||
```{include} ../../README.md | ||
:start-after: <!-- start llm-streaming-init --> | ||
:end-before: <!-- end llm-streaming-init --> | ||
``` | ||
|
||
## Implement the streaming endpoint | ||
|
||
```{include} ../../README.md | ||
:start-after: <!-- start llm-streaming-endpoint --> | ||
:end-before: <!-- end llm-streaming-endpoint --> | ||
``` | ||
|
||
## Serve and send requests | ||
```{include} ../../README.md | ||
:start-after: <!-- start llm-streaming-serve --> | ||
:end-before: <!-- end llm-streaming-serve --> | ||
``` |