improve structured output

souzatharsis · Nov 25, 2024 · 2c4f74c · 2c4f74c
1 parent f607f76
commit 2c4f74c
Show file tree

Hide file tree

Showing 17 changed files with 3,286 additions and 84 deletions.
diff --git a/tamingllms/_build/.doctrees/environment.pickle b/tamingllms/_build/.doctrees/environment.pickle
diff --git a/tamingllms/_build/.doctrees/notebooks/evals.doctree b/tamingllms/_build/.doctrees/notebooks/evals.doctree
diff --git a/tamingllms/_build/.doctrees/notebooks/structured_output.doctree b/tamingllms/_build/.doctrees/notebooks/structured_output.doctree
diff --git a/tamingllms/_build/html/_images/json.svg b/tamingllms/_build/html/_images/json.svg
diff --git a/tamingllms/_build/html/_sources/notebooks/structured_output.ipynb b/tamingllms/_build/html/_sources/notebooks/structured_output.ipynb
@@ -295,9 +295,20 @@
     "\n",
     "#### JSON Mode\n",
     "\n",
-    "JSON mode is a feature provided by some LLM APIs, such as OpenAI's, that allows the model to generate output in JSON format. This is particularly useful when you need structured data as a result, such as when parsing the output programmatically or integrating it with other systems that require JSON input.\n",
+    "JSON mode is a feature provided by some LLM APIs, such as OpenAI's, that allows the model to generate output in JSON format. This is particularly useful when you need structured data as a result, such as when parsing the output programmatically or integrating it with other systems that require JSON input. As depicted in {numref}`json-mode`, JSON mode is implemented by instructing theLLM model to use JSON as response format and optionally defining a target schema.\n",
     "\n",
-    "When using JSON mode, you must always instruct the model to produce JSON via some message in the conversation, for example via your system message. If you don't include an explicit instruction to generate JSON, the model may generate an unending stream of whitespace and the request may run continually until it reaches the token limit. To help ensure you don't forget, the API will throw an error if the string \"JSON\" does not appear somewhere in the context.\n",
+    "\n",
+    "```{figure} ../_static/structured_output/json.svg\n",
+    "---\n",
+    "name: json-mode\n",
+    "alt: JSON Mode\n",
+    "scale: 50%\n",
+    "align: center\n",
+    "---\n",
+    "Conceptual overview of JSON mode.\n",
+    "```\n",
+    "\n",
+    "When using JSON mode with OpenAI's API, it is recommended to instruct the model to produce JSON via some message in the conversation, for example via your system message. If you don't include an explicit instruction to generate JSON, the model may generate an unending stream of whitespace and the request may run continually until it reaches the token limit. To help ensure you don't forget, the API will throw an error if the string \"JSON\" does not appear somewhere in the context.\n",
     "\n",
     "\n"
    ]
@@ -587,7 +598,7 @@
    "source": [
     "### Outlines\n",
     "\n",
-    "Outlines is a library specifically focused on structured text generation from LLMs. Under the hood, Outlines works by adjusting the probability distribution of the model's output logits - the raw scores from the final layer of the neural network that are normally converted into text tokens. By introducing carefully crafted logit biases, Outlines can guide the model to prefer certain tokens over others, effectively constraining its outputs to a predefined set of valid options. This provides fine-grained control over the model's generation process. In that way, Outlines provides several powerful features:\n",
+    "Outlines {cite}`outlines2024`is a library specifically focused on structured text generation from LLMs. Under the hood, Outlines works by adjusting the probability distribution of the model's output logits - the raw scores from the final layer of the neural network that are normally converted into text tokens. By introducing carefully crafted logit biases, Outlines can guide the model to prefer certain tokens over others, effectively constraining its outputs to a predefined set of valid options. This provides fine-grained control over the model's generation process. In that way, Outlines provides several powerful features:\n",
     "\n",
     "* **Multiple Choice Generation**: Restrict the LLM output to a predefined set of options.\n",
     "* **Regex-based structured generation**: Guide the generation process using regular expressions.\n",
@@ -718,6 +729,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "## Discussion\n",
+    "\n",
     "### Comparing Solutions\n",
     "\n",
     "* **Simplicity vs. Control**: One-shot prompts are simple but offer limited control.  `LangChain`, and Outlines provide greater control but might have a steeper learning curve though quite manageable.\n",
@@ -733,7 +746,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Best Practices\n",
+    "### Best Practices\n",
     "\n",
     "\n",
     "* **Clear Schema Definition**: Define the desired output structure clearly. This can be done in several ways including schemas, types, or Pydantic models as appropriate. This ensures the LLM knows exactly what format is expected.\n",
@@ -746,6 +759,38 @@
     "\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Ongoing Debate on Structured Output from LLMs\n",
+    "\n",
+    "The use of structured output, like JSON or XML, for Large Language Models (LLMs) is a developing area. While structured output offers clear benefits in parsing, robustness, and integration, there is growing debate on whether it also potentially comes at the cost of reasoning abilities. \n",
+    "\n",
+    "Recent research {cite}`tam2024letspeakfreelystudy` suggests that imposing format restrictions on LLMs might impact their performance, particularly in reasoning-intensive tasks. Further evidence {cite}`aider2024codejson` suggests LLMs may produce lower quality code if they’re asked to return it as part of a structured JSON response:\n",
+    "\n",
+    "* **Potential performance degradation:**  Enforcing structured output, especially through constrained decoding methods like JSON-mode, can negatively impact an LLM's reasoning abilities. This is particularly evident in tasks that require multi-step reasoning or complex thought processes.\n",
+    "\n",
+    "* **Overly restrictive schemas:** Imposing strict schemas can limit the expressiveness of LLM outputs and may hinder their ability to generate creative or nuanced responses.  In certain cases, the strictness of the schema might outweigh the benefits of structured output.\n",
+    "\n",
+    "* **Increased complexity in prompt engineering:** Crafting prompts that effectively guide LLMs to generate structured outputs while maintaining performance can be challenging. It often requires careful consideration of the schema, the task instructions, and the desired level of detail in the response.\n",
+    "\n",
+    "On the other hand, those findings are not without criticism. The .txt team challenges the findings of the paper \"Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models\" {cite}`tam2024letspeakfreelystudy`. The paper claims that structured output formats, like JSON, negatively affect the performance of LLMs, particularly when it comes to reasoning. The rebuttal argues that **structured generation, when done correctly, actually *improves* performance**.\n",
+    "\n",
+    ".txt team identifies several flaws in the methodology of \"Let Me Speak Freely?\" that they believe led to inaccurate conclusions:\n",
+    "\n",
+    "*   The paper finds that structured output improves performance on classification tasks but doesn't reconcile this finding with its overall negative conclusion about structured output. \n",
+    "*   The prompts used for unstructured generation were different from those used for structured generation, making the comparison uneven. \n",
+    "*   The prompts used for structured generation, particularly in JSON-mode, didn't provide the LLM with sufficient information to properly complete the task. \n",
+    "*   The paper conflates \"structured generation\" with \"JSON-mode\", when they are not the same thing. \n",
+    "\n",
+    "It is important to note that, as with any rebuttal, .txt presents a specific perspective on the issue of structured output from LLMs. While their findings suggest potential benefits to structured generation, further research and exploration are needed to comprehensively understand the nuances and trade-offs involved in using structured output for various LLM tasks and applications.\n",
+    "\n",
+    "In summary, the debate surrounding structured output highlights the ongoing challenges in balancing LLM capabilities with real-world application requirements. While structured outputs offer clear benefits in parsing, robustness, and integration, their potential impact on performance, particularly in reasoning tasks is a topic of ongoing debate. \n",
+    "\n",
+    "The ideal approach likely involves a nuanced strategy that considers the specific task, the desired level of structure, and the available LLM capabilities. Further research and development efforts are needed to mitigate the potential drawbacks of structured output and unlock the full potential of LLMs for a wider range of applications. \n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -755,6 +800,16 @@
     "Extracting structured output from LLMs is crucial for integrating them into real-world applications. By understanding the challenges and employing appropriate strategies and tools, developers can improve the reliability and usability of LLM-powered systems, unlocking their potential to automate complex tasks and generate valuable insights. \n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## References\n",
+    "```{bibliography}\n",
+    ":filter: docname in docnames\n",
+    "```"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},

diff --git a/tamingllms/_build/html/_static/structured_output/json.d2 b/tamingllms/_build/html/_static/structured_output/json.d2
@@ -0,0 +1,109 @@
+schema: {
+  shape: rectangle
+  style.stroke: "#E8DCCA"
+  style.fill: "#FFFAF0"
+  label: |md
+    ```json
+    {
+      "type": "object",
+      "properties": {
+        "name": {
+          "type": "string"
+        },
+        "level": {
+          "type": "integer"
+        },
+        "skill": {
+          "type": "string",
+          "enum": [
+            "Python",
+            "Java",
+            "JavaScript",
+            "Rust"
+          ]
+        }
+      },
+      "required": [
+        "name",
+        "level",
+        "skill"
+      ]
+    }
+    ```
+  |
+}
+
+llm: Large\nLanguage\nModels {
+  shape: rectangle
+  style.fill: "#E6EEF8"
+  style.stroke: "#B8D1F3"
+}
+
+outputs: {
+  shape: rectangle
+  style.stroke: "#E8DCCA"
+  style.fill: "#FFFAF0"
+
+  json1: {
+    label: |md
+      ```json
+      {
+        "name": "Alice",
+        "level": 5,
+        "skill": "Python"
+      }
+      ```
+    |
+  }
+
+  json2: {
+    label: |md
+      ```json
+      {
+        "name": "Bob",
+        "level": 3,
+        "skill": "Java"
+      }
+      ```
+    |
+  }
+
+  json3: {
+    label: |md
+      ```json
+      {
+        "name": "Carol",
+        "level": 4,
+        "skill": "Rust"
+      }
+      ```
+    |
+  }
+}
+
+# Labels
+schema_label: JSON Schema {
+  style.font-size: 16
+}
+
+output_label: Generated JSONs {
+  style.font-size: 16
+}
+
+# Connections
+schema -> llm: + {
+  style.stroke: "#8B4513"
+  style.stroke-width: 2
+}
+
+llm -> outputs: Constrained\nGeneration {
+  style.stroke: "#D35400"
+  style.stroke-width: 2
+}
+
+# Label connections
+schema_label -> schema
+output_label -> outputs
+
+# Layout
+direction: right