Skip to content

Commit

Permalink
Merge branch 'main' into langsmith
Browse files Browse the repository at this point in the history
  • Loading branch information
jxnl committed Feb 18, 2024
2 parents 4354bd8 + 66a8285 commit 8637ba8
Show file tree
Hide file tree
Showing 34 changed files with 2,836 additions and 162 deletions.
27 changes: 14 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,28 @@
# Welcome to Instructor - Your Gateway to Structured Outputs with OpenAI
# Instructor

_Pythonic Structured Outputs powered by LLM function calling and tool calling APIs. Designed for simplicity, transparency, and control._
_Structured outputs powered by llms. Designed for simplicity, transparency, and control._

---

[Star us on Github!](https://www.github.com/jxnl/instructor)

[![Twitter Follow](https://img.shields.io/twitter/follow/jxnlco?style=social)](https://twitter.com/jxnlco)
[![Downloads](https://img.shields.io/pypi/dm/instructor.svg)](https://pypi.python.org/pypi/instructor)
[![Documentation](https://img.shields.io/badge/docs-available-brightgreen)](https://jxnl.github.io/instructor)
[![Coverage Status](https://coveralls.io/repos/github/jxnl/instructor/badge.svg?branch=add-coveralls)](https://coveralls.io/github/jxnl/instructor?branch=add-coveralls)
[![Discord](https://img.shields.io/discord/1192334452110659664?label=discord)](https://discord.gg/CV8sPM5k5Y)
[![Downloads](https://img.shields.io/pypi/dm/instructor.svg)](https://pypi.python.org/pypi/instructor)

Instructor stands out for its simplicity, transparency, and user-centric design. We leverage Pydantic to do the heavy lifting, and we've built a simple, easy-to-use API on top of it by helping you manage [validation context](./concepts/reask_validation.md), retries with [Tenacity](./concepts/retrying.md), and streaming [Lists](./concepts/lists.md) and [Partial](./concepts/partial.md) responses.

Dive into the world of Python-based structured extraction, empowered by OpenAI's cutting-edge function calling API. Instructor stands out for its simplicity, transparency, and user-centric design. Whether you're a seasoned developer or just starting out, you'll find Instructor's approach intuitive and its results insightful.
Check us out in [Typescript](https://instructor-ai.github.io/instructor-js/) and [Elixir](https://github.com/thmsmlr/instructor_ex/).

## Ports to other languages
Instructor is not limited to the OpenAI API, we have support for many other backends that via patching. Check out more on [patching](./concepts/patching.md).

Check out ports to other languages below:
1. Wrap OpenAI's SDK
2. Wrap the create method

- [Typescript / Javascript](https://www.github.com/jxnl/instructor-js)
- [Elixir](https://github.com/thmsmlr/instructor_ex/)
Including but not limited to:

If you want to port Instructor to another language, please reach out to us on [Twitter](https://twitter.com/jxnlco) we'd love to help you get started!
- [Together](./blog/posts/together.md)
- [Ollama](./blog/posts/ollama.md)
- [AnyScale](./blog/posts/anyscale.md)
- [llama-cpp-python](./blog/posts/llama-cpp-python.md)

## Get Started in Moments

Expand Down
7 changes: 2 additions & 5 deletions docs/concepts/patching.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,7 @@ Instructor enhances client functionality with three new keywords for backwards c
- `max_retries`: Determines retry attempts for failed `chat.completions.create` validations.
- `validation_context`: Provides extra context to the validation process.

There are three methods for structured output:

1. **Function Calling**: The primary method. Use this for stability and testing.
2. **Tool Calling**: Useful in specific scenarios; lacks the reasking feature of OpenAI's tool calling API.
3. **JSON Mode**: Offers closer adherence to JSON but with more potential validation errors. Suitable for specific non-function calling clients.
The default mode is `instructor.Mode.TOOLS` which is the recommended mode for OpenAI clients. This mode is the most stable and is the most recommended for OpenAI clients. The other modes are for other clients and are not recommended for OpenAI clients.

## Tool Calling

Expand All @@ -30,6 +26,7 @@ Parallel tool calling is also an option but you must set `response_model` to be
```python
import instructor
from openai import OpenAI

client = instructor.patch(OpenAI(), mode=instructor.Mode.PARALLEL_TOOLS)
```

Expand Down
15 changes: 13 additions & 2 deletions docs/concepts/philosophy.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,20 @@ The instructor values [simplicity](https://eugeneyan.com/writing/simplicity/) an

> “Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better.” — Edsger Dijkstra
## The Bridge to Object-Oriented Programming
### Proof that its simple

`instructor` acts as a bridge converting text-based LLM interactions into a familiar object-oriented format. Its integration with Pydantic provides type hints, runtime validation, and robust IDE support; love and supported by many in the Python ecosystem. By treating LLMs as callable functions returning typed objects, instructor makes [language models backwards compatible with code](https://www.youtube.com/watch?v=yj-wSRJwrrc), making them practical for everyday use while being complex enough for advanced applications.
1. Most users will only need to learn `response_model` and `patch` to get started.
2. No new prompting language to learn, no new abstractions to learn.

### Proof that its transparent

1. We write very little prompts, and we don't try to hide the prompts from you.
2. We'll do better in the future to give you config over the 2 prompts we do write, Reasking and JSON_MODE prompts.

### Proof that its flexible

1. If you build a system with OpenAI dirrectly, it is easy to incrementally adopt instructor.
2. Add `response_model` and if you want to revert, just remove it.

## The zen of `instructor`

Expand Down
113 changes: 113 additions & 0 deletions docs/examples/extract_slides.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Data extraction from slides

In this guide, we demonstrate how to extract data from slides.

!!! tips "Motivation"

When we want to translate key information from slides into structured data, simply isolating the text and running extraction might not be enough. Sometimes the important data is in the images on the slides, so we should consider including them in our extraction pipeline.

## Defining the necessary Data Structures

Let's say we want to extract the competitors from various presentations and categorize them according to their respective industries.

Our data model will have `Industry` which will be a list of `Competitor`'s for a specific industry, and `Competition` which will aggregate the competitors for all the industries.

```python
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Optional, List

class Competitor(BaseModel):
name: str
features: Optional[List[str]]


# Define models
class Industry(BaseModel):
"""
Represents competitors from a specific industry extracted from an image using AI.
"""

name: str = Field(
description="The name of the industry"
)
competitor_list: List[Competitor] = Field(
description="A list of competitors for this industry"
)

class Competition(BaseModel):
"""
This class serves as a structured representation of
competitors and their qualities.
"""

industry_list: List[IndustryCompetition] = Field(
description="A list of industries and their competitors"
)
```

## Competitors extraction

To extract competitors from slides we will define a function which will read images from urls and extract the relevant information from them.

```python
import instructor
from openai import OpenAI

# Apply the patch to the OpenAI client
# enables response_model keyword
client = instructor.patch(
OpenAI(), mode=instructor.Mode.MD_JSON
)

# Define functions
def read_images(image_urls: List[str]) -> Competition:
"""
Given a list of image URLs, identify the competitors in the images.
"""
return client.chat.completions.create(
model="gpt-4-vision-preview",
response_model=Competition,
max_tokens=2048,
temperature=0,
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Identify competitors and generate key features for each competitor.",
},
*[
{"type": "image_url", "image_url": {"url": url}}
for url in image_urls
],
],
}
],
)
```

## Execution

Finally, we will run the previous function with a few sample slides to see the data extractor in action.

As we can see, our model extracted the relevant information for each competitor regardless of how this information was formatted in the original presentations.

```python
url = [
'https://miro.medium.com/v2/resize:fit:1276/0*h1Rsv-fZWzQUyOkt',
'https://earlygame.vc/wp-content/uploads/2020/06/startup-pitch-deck-5.jpg'
]
model = read_images(url)
print(model.model_json_dump(indent=2))
```
industry_list=[

Industry(name='Accommodation and Hospitality', competitor_list=[Competitor(name='CouchSurfing', features=['Affordable', 'Online Transaction']), Competitor(name='Craigslist', features=['Affordable', 'Offline Transaction']), Competitor(name='BedandBreakfast.com', features=['Affordable', 'Offline Transaction']), Competitor(name='AirBed&Breakfast', features=['Affordable', 'Online Transaction']), Competitor(name='Hostels.com', features=['Affordable', 'Online Transaction']), Competitor(name='VRBO', features=['Expensive', 'Offline Transaction']), Competitor(name='Rentahome', features=['Expensive', 'Online Transaction']), Competitor(name='Orbitz', features=['Expensive', 'Online Transaction']), Competitor(name='Hotels.com', features=['Expensive', 'Online Transaction'])]),

Industry(name='Wine E-commerce', competitor_list=[Competitor(name='WineSimple', features=['Ecommerce Retailers', 'True Personalized Selections', 'Brand Name Wine', 'No Inventory Cost', 'Target Mass Market']), Competitor(name='NakedWines', features=['Ecommerce Retailers', 'Target Mass Market']), Competitor(name='Club W', features=['Ecommerce Retailers', 'Brand Name Wine', 'Target Mass Market']), Competitor(name='Tasting Room', features=['Ecommerce Retailers', 'True Personalized Selections', 'Brand Name Wine']), Competitor(name='Drync', features=['Ecommerce Retailers', 'True Personalized Selections', 'No Inventory Cost']), Competitor(name='Hello Vino', features=['Ecommerce Retailers', 'Brand Name Wine', 'Target Mass Market'])])

]
```
```
6 changes: 3 additions & 3 deletions docs/examples/ollama.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Structured Outputs with Ollama

Open-source LLMS are gaining popularity, and the release of Ollama's OpenAI compatibility later it has made it possible to obtain structured outputs using JSON schema.
Open-source LLMS are gaining popularity, and with the release of Ollama's OpenAI compatibility layer, it has become possible to obtain structured outputs using JSON schema.

By the end of this blog post, you will learn how to effectively utilize instructor with ollama. But before we proceed, let's first explore the concept of patching.
By the end of this blog post, you will learn how to effectively utilize instructor with Ollama. But before we proceed, let's first explore the concept of patching.

## Patching

Instructor's patch enhances a openai api it with the following features:
Instructor's patch enhances an openai api with the following features:

- `response_model` in `create` calls that returns a pydantic model
- `max_retries` in `create` calls that retries the call if it fails by using a backoff strategy
Expand Down
92 changes: 92 additions & 0 deletions docs/hub/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Instructor Hub

Welcome to instructor hub, the goal of this project is to provide a set of tutorials and examples to help you get started, and allow you to pull in the code you need to get started with `instructor`

Make sure you're using the latest version of `instructor` by running:

```bash
pip install -U instructor
```

## Contributing

We welcome contributions to the instructor hub, if you have a tutorial or example you'd like to add, please open a pull request in `docs/hub` and we'll review it.

1. The code must be in a single file
2. Make sure that its referenced in the `mkdocs.yml`
3. Make sure that the code is unit tested.

### Using pytest_examples

By running the following command you can run the tests and update the examples. This ensures that the examples are always up to date.
Linted correctly and that the examples are working, make sure to include a `if __name__ == "__main__":` block in your code and add some asserts to ensure that the code is working.

```bash
poetry run pytest tests/openai/docs/test_hub.py --update-examples
```

## CLI Usage

Instructor hub comes with a command line interface (CLI) that allows you to view and interact with the tutorials and examples and allows you to pull in the code you need to get started with the API.

### List Cookbooks

By running `instructor hub list` you can see all the available tutorials and examples. By clickony (doc) you can see the full tutorial back on this website.

```bash
$ instructor hub list --sort
```

| hub_id | slug | title | n_downloads |
| ------ | ----------------------------- | ----------------------------- | ----------- |
| 2 | multiple_classification (doc) | Multiple Classification Model | 24 |
| 1 | single_classification (doc) | Single Classification Model | 2 |

### Searching for Cookbooks

You can search for a tutorial by running `instructor hub list -q <QUERY>`. This will return a list of tutorials that match the query.

```bash
$ instructor hub list -q multi
```

| hub_id | slug | title | n_downloads |
| ------ | ----------------------------- | ----------------------------- | ----------- |
| 2 | multiple_classification (doc) | Multiple Classification Model | 24 |

### Reading a Cookbook

To read a tutorial, you can run `instructor hub pull --id <hub_id> --page` to see the full tutorial in the terminal. You can use `j,k` to scroll up and down, and `q` to quit. You can also run it without `--page` to print the tutorial to the terminal.

```bash
$ instructor hub pull --id 2 --page
```

### Pulling in Code

You can pull in the code with `--py --output=<filename>` to save the code to a file, or you cal also run it without `--output` to print the code to the terminal.

```bash
$ instructor hub pull --id 2 --py --output=run.py
$ instructor hub pull --id 2 --py > run.py
```

You can run the code instantly if you `|` it to `python`:

```bash
$ instructor hub pull --id 2 --py | python
```

## Call for Contributions

We're looking for a bunch more hub examples, if you have a tutorial or example you'd like to add, please open a pull request in `docs/hub` and we'll review it.

- [ ] Converting the cookbooks to the new format
- [ ] Validator examples
- [ ] Data extraction examples
- [ ] Streaming examples (Iterable and Partial)
- [ ] Batch Parsing examples
- [ ] Open Examples, together, anyscale, ollama, llama-cpp, etc
- [ ] Query Expansion examples
- [ ] Batch Data Processing examples
- [ ] Batch Data Processing examples with Cache
51 changes: 51 additions & 0 deletions docs/hub/multiple_classification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
For multi-label classification, we introduce a new enum class and a different Pydantic model to handle multiple labels.

```python
import openai
import instructor

from typing import List, Literal
from pydantic import BaseModel, Field

# Apply the patch to the OpenAI client
# enables response_model keyword
client = instructor.patch(openai.OpenAI())

LABELS = Literal["ACCOUNT", "BILLING", "GENERAL_QUERY"]


class MultiClassPrediction(BaseModel):
labels: List[LABELS] = Field(
...,
description="Only select the labels that apply to the support ticket.",
)


def multi_classify(data: str) -> MultiClassPrediction:
return client.chat.completions.create(
model="gpt-4-turbo-preview", # gpt-3.5-turbo fails
response_model=MultiClassPrediction,
messages=[
{
"role": "system",
"content": f"You are a support agent at a tech company. Only select the labels that apply to the support ticket.",
},
{
"role": "user",
"content": f"Classify the following support ticket: {data}",
},
],
) # type: ignore


if __name__ == "__main__":
ticket = "My account is locked and I can't access my billing info."
prediction = multi_classify(ticket)
assert {"ACCOUNT", "BILLING"} == {label for label in prediction.labels}
print("input:", ticket)
#> input: My account is locked and I can't access my billing info.
print("labels:", LABELS)
#> labels: typing.Literal['ACCOUNT', 'BILLING', 'GENERAL_QUERY']
print("prediction:", prediction)
#> prediction: labels=['ACCOUNT', 'BILLING']
```
Loading

0 comments on commit 8637ba8

Please sign in to comment.