-
-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(blog): Add new post on llama-cpp-python
and instructor
library usage
#434
Conversation
llama-cpp-python
and instructor
library usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Reviewed entire PR up to commit bf826c3
Reviewed 121
lines of code across 1
files in 1 minute(s) and 3 second(s).
See details
- Skipped files: 0 (please contact us to request support for these files)
- Confidence threshold:
85%
- Drafted
0
additional comments. - Workflow ID:
wflow_b0ZkUnb5GvB5ybuj
Something look wrong? You can customize Ellipsis by editing the ellipsis.yaml for this repository.
Generated with ❤️ by ellipsis.dev
docs/blog/posts/llama-cpp-python.md
Outdated
|
||
Recently llama-cpp-python has made support structured outputs via JSON schema available. This is a time-saving alternative to extensive prompt engineering and can be used to obtain structured outputs. | ||
|
||
In this example we'll cover a more advanced use case of by using `JSON_SCHEMA` mode to stream out partial models. To learn more partial streaming check out [partial streaming](../../concepts/partial.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example we'll cover a more advanced use case of JSON_SCHEMA
mode to stream out partial models. To learn more partial streaming check out partial streaming.
docs/blog/posts/llama-cpp-python.md
Outdated
console.print(obj) | ||
``` | ||
|
||
1. We use `LlamaPromptLookupDecoding` to obtain structured outputs using JSON schema via a mixture of constrained sampling and speculative decoding. 10 is good for GPU, 2 is good for CPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use LlamaPromptLookupDecoding
to speed up structured output generation using speculative decoding. The draft model generates candidate tokens during generation 10 is good for GPU, 2 is good for CPU.
Summary:
This PR adds a new blog post discussing the use of
llama-cpp-python
for structured outputs and the enhancement ofcreate
calls with theinstructor
library, including a Python code example.Key points:
/docs/blog/posts/llama-cpp-python.md
llama-cpp-python
for structured outputscreate
calls withinstructor
libraryGenerated with ❤️ by ellipsis.dev