Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong concatenation of textwrap.dedent string when using custom system message #1128

Open
1 of 8 tasks
JohanBekker opened this issue Oct 28, 2024 · 1 comment
Open
1 of 8 tasks

Comments

@JohanBekker
Copy link
Contributor

JohanBekker commented Oct 28, 2024

  • This is actually a bug report.
  • I am not getting good LLM Results
  • I have tried asking for help in the community on discord or discussions and have not received a response.
  • I have tried searching the documentation and have not found an answer.

What Model are you using?

  • gpt-3.5-turbo
  • gpt-4-turbo
  • gpt-4
  • Other (please specify)

Describe the bug
A clear and concise description of what the bug is.

When using a custom system message, the concatenated Instructor system message, wrapped in textwrap.dedent, has wrong indentation:

You are a helpful assistant.


        As a genius expert, your task is to understand the content and provide
        the parsed objects in json that match the following json_schema:


        {
  "properties": {
    "name": {
      "title": "Name",
      "type": "string"
    },
    "age": {
      "title": "Age",
      "type": "integer"
    }
  },
  "required": [
    "name",
    "age"
  ],
  "title": "UserInfo",
  "type": "object"
}

        Make sure to return an instance of the JSON, not the schema itself

To Reproduce

import instructor
from dotenv import load_dotenv
from langfuse.openai import OpenAI
from pydantic import BaseModel

load_dotenv()


class UserInfo(BaseModel):
    name: str
    age: int


llm = OpenAI()
client = instructor.from_openai(llm, mode=instructor.Mode.MD_JSON)

system_message = "You are a helpful assistant."
user_message = "John Doe is 30 years old."

user_info = client.chat.completions.create(
    model="gpt-4o-mini",
    max_tokens=4000,
    temperature=0,
    response_model=UserInfo,
    messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message},
    ],
)

Expected behavior
I'm not sure how much it matters for performance though but it would be nice if the prompts are all neatly formatted.

@JohanBekker
Copy link
Contributor Author

JohanBekker commented Nov 1, 2024

It's actually not just with a custom system message, but always:

[
0: {
role: "system"
content: "
        As a genius expert, your task is to understand the content and provide
        the parsed objects in json that match the following json_schema:


        {
  "properties": {
    "name": {
      "title": "Name",
      "type": "string"
    },
    "age": {
      "title": "Age",
      "type": "integer"
    }
  },
  "required": [
    "name",
    "age"
  ],
  "title": "UserInfo",
  "type": "object"
}

        Make sure to return an instance of the JSON, not the schema itself
"
}
1: {
role: "user"
content: "John Doe is 30 years old.

Return the correct JSON response within a ```json codeblock. not the JSON_SCHEMA"
}
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant