How to create an effective eval of a knowledge graph extractor? #242

aastroza · 2023-11-30T14:03:51Z

aastroza
Nov 30, 2023

Hello! I'm Alonso, a data scientist who started collaborating with the repository last week. I've never contributed to an open source project before, but I like Instructor so much that I decided to start contributing something week by week (for now small things, but who knows in the future).

A particular topic that interests me is the extraction of knowledge graphs from text. Of course, I've experimented with the example material published, but I still don't fully understand how it works. Like several of the more complex tasks using LLMs, the result depends a lot on the model and the prompt. So, since I saw a few days ago that some evals were added for the most popular tasks, I thought it would be a very good way to learn by creating one on this subject.

Here's my current code:

import pytest
from itertools import product
from pydantic import BaseModel, Field
from typing import List
import instructor
from instructor.function_calls import Mode


class Node(BaseModel):
    id: int
    label: str
    color: str

class Edge(BaseModel):
    source: int
    target: int
    label: str
    color: str = "black"

class KnowledgeGraph(BaseModel):
    nodes: List[Node] = Field(..., default_factory=list)  # A list of nodes in the knowledge graph.
    edges: List[Edge] = Field(..., default_factory=list)  # A list of edges in the knowledge graph.


# Lists for models, test data, and modes
models = ["gpt-3.5-turbo", "gpt-4", "gpt-4-1106-preview"]
test_data = [
    ("Jason knows a lot about quantum mechanics. He is a physicist. He is a professor", 4, 3),
    ("Professors are smart.", 2, 1),
    ("Sarah knows Jason and is a student of his.", 3, 3),
    ("Sarah is a student at the University of Toronto. and UofT is in Canada.", 4, 3)
]
modes = [Mode.FUNCTIONS, Mode.JSON, Mode.TOOLS]


@pytest.mark.parametrize("model, data, mode", product(models, test_data, modes))
def test_extract(model, data, mode, client):
    sample_data, expected_nodes_number, expected_edges_number = data

    if (mode, model) in {
        (Mode.JSON, "gpt-3.5-turbo"),
        (Mode.JSON, "gpt-4"),
    }:
        pytest.skip(f"{mode} mode is not supported for {model}, skipping test")

    # Setting up the client with the instructor patch
    client = instructor.patch(client, mode=mode)

    # Calling the extract function with the provided model, sample data, and mode
    response = client.chat.completions.create(
        model=model,
        response_model=KnowledgeGraph,
        messages=[
            {
            "role": "system",
            "content": "You are a knowledge graph builder. You must extract nodes and edges from a given text. Try to reuse nodes as much as possible.",
            },
            {"role": "user",
             "content": f"Describe the following text as a detailed knowledge graph: {sample_data}"
             }
        ],
    )

    # Assertions
    assert (
        len(response.nodes) == expected_nodes_number
    ), f"Expected {expected_nodes_number} nodes, got {len(response.nodes)}"
    assert (
        len(response.edges) == expected_edges_number
    ), f"Expected {expected_edges_number} edges, got {len(response.edges)}"

My summarized pytest result:

============================================================ short test summary info ============================================================
FAILED test_knowledge_graphs.py::test_extract[gpt-3.5-turbo-data6-Mode.FUNCTIONS] - AssertionError: Expected 3 edges, got 2
FAILED test_knowledge_graphs.py::test_extract[gpt-3.5-turbo-data8-Mode.TOOLS] - AssertionError: Expected 3 edges, got 2
FAILED test_knowledge_graphs.py::test_extract[gpt-3.5-turbo-data11-Mode.TOOLS] - AssertionError: Expected 4 nodes, got 3
FAILED test_knowledge_graphs.py::test_extract[gpt-4-data21-Mode.FUNCTIONS] - AssertionError: Expected 4 nodes, got 5
FAILED test_knowledge_graphs.py::test_extract[gpt-4-data23-Mode.TOOLS] - AssertionError: Expected 4 nodes, got 5
FAILED test_knowledge_graphs.py::test_extract[gpt-4-1106-preview-data25-Mode.JSON] - pydantic_core._pydantic_core.ValidationError: 4 validation errors for KnowledgeGraph
FAILED test_knowledge_graphs.py::test_extract[gpt-4-1106-preview-data30-Mode.FUNCTIONS] - AssertionError: Expected 3 nodes, got 2
FAILED test_knowledge_graphs.py::test_extract[gpt-4-1106-preview-data33-Mode.FUNCTIONS] - AssertionError: Expected 4 nodes, got 5
FAILED test_knowledge_graphs.py::test_extract[gpt-4-1106-preview-data34-Mode.JSON] - AssertionError: Expected 4 nodes, got 5
FAILED test_knowledge_graphs.py::test_extract[gpt-4-1106-preview-data35-Mode.TOOLS] - AssertionError: Expected 4 nodes, got 5
======================================= 10 failed, 18 passed, 8 skipped, 29 warnings in 412.22s (0:06:52) =======================================

(Detailed pytest results in comments)

My questions:

In these kinds of tasks, is it right to be so demanding in the eval? Is it okay to expect accuracy in the number of nodes and edges, or should I relax the criteria? Or maybe be more demanding?
Is it okay for some models to pass the test on a particular data point and fail on others? I feel this is expected behavior. It's possible that the same prompt/API call to OpenAI can have very different results if I change from the GPT-3.5 to GPT-4 model. But how does this translate into an eval? And even more so, how to approach it in one that will be available to the entire community.
I don't understand the JSON mode, particularly the validations that are failing. Can anyone point me to some material to read to understand what's happening there?

aastroza · 2023-11-30T14:04:32Z

aastroza
Nov 30, 2023
Author

Pytest results:

============================================================== test session starts ==============================================================
platform linux -- Python 3.10.13, pytest-7.4.3, pluggy-1.3.0
rootdir: /mnt/d/repos/GitHub/instructor/tests/openai/evals
plugins: asyncio-0.21.1, anyio-3.7.1
asyncio: mode=strict
collected 36 items                                                                                                                              

test_knowledge_graphs.py .s..s.FsF.sF.s..s..s.FsF.F....F..FFF                                                                             [100%]

=================================================================== FAILURES ====================================================================
_______________________________________________ test_extract[gpt-3.5-turbo-data6-Mode.FUNCTIONS] ________________________________________________

model = 'gpt-3.5-turbo', data = ('Sarah knows Jason and is a student of his.', 3, 3), mode = <Mode.FUNCTIONS: 'function_call'>
client = <openai.OpenAI object at 0x7f4b301ef550>

    @pytest.mark.parametrize("model, data, mode", product(models, test_data, modes))
    def test_extract(model, data, mode, client):
        sample_data, expected_nodes_number, expected_edges_number = data
    
        if (mode, model) in {
            (Mode.JSON, "gpt-3.5-turbo"),
            (Mode.JSON, "gpt-4"),
        }:
            pytest.skip(f"{mode} mode is not supported for {model}, skipping test")
    
        # Setting up the client with the instructor patch
        client = instructor.patch(client, mode=mode)
    
        # Calling the extract function with the provided model, sample data, and mode
        response = client.chat.completions.create(
            model=model,
            response_model=KnowledgeGraph,
            messages=[
                {
                "role": "system",
                "content": "You are a knowledge graph builder. You must extract nodes and edges from a given text. Try to reuse nodes as much as possible.",
                },
                {"role": "user",
                 "content": f"Describe the following text as a detailed knowledge graph: {sample_data}"
                 }
            ],
        )
    
        # Assertions
        assert (
            len(response.nodes) == expected_nodes_number
        ), f"Expected {expected_nodes_number} nodes, got {len(response.nodes)}"
>       assert (
            len(response.edges) == expected_edges_number
        ), f"Expected {expected_edges_number} edges, got {len(response.edges)}"
E       AssertionError: Expected 3 edges, got 2
E       assert 2 == 3
E        +  where 2 = len([Edge(source=1, target=2, label='knows', color='gray'), Edge(source=1, target=2, label='is a student of', color='gray')])
E        +    where [Edge(source=1, target=2, label='knows', color='gray'), Edge(source=1, target=2, label='is a student of', color='gray')] = KnowledgeGraph(nodes=[Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Jason', color='blue'), Node(id=3, lab...dge(source=1, target=2, label='knows', color='gray'), Edge(source=1, target=2, label='is a student of', color='gray')]).edges

test_knowledge_graphs.py:68: AssertionError
_________________________________________________ test_extract[gpt-3.5-turbo-data8-Mode.TOOLS] __________________________________________________

model = 'gpt-3.5-turbo', data = ('Sarah knows Jason and is a student of his.', 3, 3), mode = <Mode.TOOLS: 'tool_call'>
client = <openai.OpenAI object at 0x7f4b301ef550>

    @pytest.mark.parametrize("model, data, mode", product(models, test_data, modes))
    def test_extract(model, data, mode, client):
        sample_data, expected_nodes_number, expected_edges_number = data
    
        if (mode, model) in {
            (Mode.JSON, "gpt-3.5-turbo"),
            (Mode.JSON, "gpt-4"),
        }:
            pytest.skip(f"{mode} mode is not supported for {model}, skipping test")
    
        # Setting up the client with the instructor patch
        client = instructor.patch(client, mode=mode)
    
        # Calling the extract function with the provided model, sample data, and mode
        response = client.chat.completions.create(
            model=model,
            response_model=KnowledgeGraph,
            messages=[
                {
                "role": "system",
                "content": "You are a knowledge graph builder. You must extract nodes and edges from a given text. Try to reuse nodes as much as possible.",
                },
                {"role": "user",
                 "content": f"Describe the following text as a detailed knowledge graph: {sample_data}"
                 }
            ],
        )
    
        # Assertions
        assert (
            len(response.nodes) == expected_nodes_number
        ), f"Expected {expected_nodes_number} nodes, got {len(response.nodes)}"
>       assert (
            len(response.edges) == expected_edges_number
        ), f"Expected {expected_edges_number} edges, got {len(response.edges)}"
E       AssertionError: Expected 3 edges, got 2
E       assert 2 == 3
E        +  where 2 = len([Edge(source=1, target=2, label='knows', color='black'), Edge(source=1, target=2, label='student of', color='black')])
E        +    where [Edge(source=1, target=2, label='knows', color='black'), Edge(source=1, target=2, label='student of', color='black')] = KnowledgeGraph(nodes=[Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Jason', color='blue'), Node(id=3, lab...=[Edge(source=1, target=2, label='knows', color='black'), Edge(source=1, target=2, label='student of', color='black')]).edges

test_knowledge_graphs.py:68: AssertionError
_________________________________________________ test_extract[gpt-3.5-turbo-data11-Mode.TOOLS] _________________________________________________

model = 'gpt-3.5-turbo', data = ('Sarah is a student at the University of Toronto. and UofT is in Canada.', 4, 3)
mode = <Mode.TOOLS: 'tool_call'>, client = <openai.OpenAI object at 0x7f4b301ef550>

    @pytest.mark.parametrize("model, data, mode", product(models, test_data, modes))
    def test_extract(model, data, mode, client):
        sample_data, expected_nodes_number, expected_edges_number = data
    
        if (mode, model) in {
            (Mode.JSON, "gpt-3.5-turbo"),
            (Mode.JSON, "gpt-4"),
        }:
            pytest.skip(f"{mode} mode is not supported for {model}, skipping test")
    
        # Setting up the client with the instructor patch
        client = instructor.patch(client, mode=mode)
    
        # Calling the extract function with the provided model, sample data, and mode
        response = client.chat.completions.create(
            model=model,
            response_model=KnowledgeGraph,
            messages=[
                {
                "role": "system",
                "content": "You are a knowledge graph builder. You must extract nodes and edges from a given text. Try to reuse nodes as much as possible.",
                },
                {"role": "user",
                 "content": f"Describe the following text as a detailed knowledge graph: {sample_data}"
                 }
            ],
        )
    
        # Assertions
>       assert (
            len(response.nodes) == expected_nodes_number
        ), f"Expected {expected_nodes_number} nodes, got {len(response.nodes)}"
E       AssertionError: Expected 4 nodes, got 3
E       assert 3 == 4
E        +  where 3 = len([Node(id=1, label='Sarah', color='blue'), Node(id=2, label='University of Toronto', color='green'), Node(id=3, label='Canada', color='yellow')])
E        +    where [Node(id=1, label='Sarah', color='blue'), Node(id=2, label='University of Toronto', color='green'), Node(id=3, label='Canada', color='yellow')] = KnowledgeGraph(nodes=[Node(id=1, label='Sarah', color='blue'), Node(id=2, label='University of Toronto', color='green'...dge(source=1, target=2, label='is a student at', color='gray'), Edge(source=2, target=3, label='is in', color='gray')]).nodes

test_knowledge_graphs.py:65: AssertionError
___________________________________________________ test_extract[gpt-4-data21-Mode.FUNCTIONS] ___________________________________________________

model = 'gpt-4', data = ('Sarah is a student at the University of Toronto. and UofT is in Canada.', 4, 3)
mode = <Mode.FUNCTIONS: 'function_call'>, client = <openai.OpenAI object at 0x7f4b301ef550>

    @pytest.mark.parametrize("model, data, mode", product(models, test_data, modes))
    def test_extract(model, data, mode, client):
        sample_data, expected_nodes_number, expected_edges_number = data
    
        if (mode, model) in {
            (Mode.JSON, "gpt-3.5-turbo"),
            (Mode.JSON, "gpt-4"),
        }:
            pytest.skip(f"{mode} mode is not supported for {model}, skipping test")
    
        # Setting up the client with the instructor patch
        client = instructor.patch(client, mode=mode)
    
        # Calling the extract function with the provided model, sample data, and mode
        response = client.chat.completions.create(
            model=model,
            response_model=KnowledgeGraph,
            messages=[
                {
                "role": "system",
                "content": "You are a knowledge graph builder. You must extract nodes and edges from a given text. Try to reuse nodes as much as possible.",
                },
                {"role": "user",
                 "content": f"Describe the following text as a detailed knowledge graph: {sample_data}"
                 }
            ],
        )
    
        # Assertions
>       assert (
            len(response.nodes) == expected_nodes_number
        ), f"Expected {expected_nodes_number} nodes, got {len(response.nodes)}"
E       AssertionError: Expected 4 nodes, got 5
E       assert 5 == 4
E        +  where 5 = len([Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Student', color='green'), Node(id=3, label='University of Toronto', color='red'), Node(id=4, label='UofT', color='red'), Node(id=5, label='Canada', color='yellow')])
E        +    where [Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Student', color='green'), Node(id=3, label='University of Toronto', color='red'), Node(id=4, label='UofT', color='red'), Node(id=5, label='Canada', color='yellow')] = KnowledgeGraph(nodes=[Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Student', color='green'), Node(id=3, ...rple'), Edge(source=2, target=3, label='at', color='purple'), Edge(source=4, target=5, label='is in', color='purple')]).nodes

test_knowledge_graphs.py:65: AssertionError
_____________________________________________________ test_extract[gpt-4-data23-Mode.TOOLS] _____________________________________________________

model = 'gpt-4', data = ('Sarah is a student at the University of Toronto. and UofT is in Canada.', 4, 3), mode = <Mode.TOOLS: 'tool_call'>
client = <openai.OpenAI object at 0x7f4b301ef550>

    @pytest.mark.parametrize("model, data, mode", product(models, test_data, modes))
    def test_extract(model, data, mode, client):
        sample_data, expected_nodes_number, expected_edges_number = data
    
        if (mode, model) in {
            (Mode.JSON, "gpt-3.5-turbo"),
            (Mode.JSON, "gpt-4"),
        }:
            pytest.skip(f"{mode} mode is not supported for {model}, skipping test")
    
        # Setting up the client with the instructor patch
        client = instructor.patch(client, mode=mode)
    
        # Calling the extract function with the provided model, sample data, and mode
        response = client.chat.completions.create(
            model=model,
            response_model=KnowledgeGraph,
            messages=[
                {
                "role": "system",
                "content": "You are a knowledge graph builder. You must extract nodes and edges from a given text. Try to reuse nodes as much as possible.",
                },
                {"role": "user",
                 "content": f"Describe the following text as a detailed knowledge graph: {sample_data}"
                 }
            ],
        )
    
        # Assertions
>       assert (
            len(response.nodes) == expected_nodes_number
        ), f"Expected {expected_nodes_number} nodes, got {len(response.nodes)}"
E       AssertionError: Expected 4 nodes, got 5
E       assert 5 == 4
E        +  where 5 = len([Node(id=1, label='Sarah', color='blue'), Node(id=2, label='student', color='green'), Node(id=3, label='University of Toronto', color='blue'), Node(id=4, label='UofT', color='blue'), Node(id=5, label='Canada', color='red')])
E        +    where [Node(id=1, label='Sarah', color='blue'), Node(id=2, label='student', color='green'), Node(id=3, label='University of Toronto', color='blue'), Node(id=4, label='UofT', color='blue'), Node(id=5, label='Canada', color='red')] = KnowledgeGraph(nodes=[Node(id=1, label='Sarah', color='blue'), Node(id=2, label='student', color='green'), Node(id=3, ...'black'), Edge(source=4, target=5, label='is in', color='black'), Edge(source=3, target=4, label='is', color='black')]).nodes

test_knowledge_graphs.py:65: AssertionError
_______________________________________________ test_extract[gpt-4-1106-preview-data25-Mode.JSON] _______________________________________________

model = 'gpt-4-1106-preview', data = ('Jason knows a lot about quantum mechanics. He is a physicist. He is a professor', 4, 3)
mode = <Mode.JSON: 'json_mode'>, client = <openai.OpenAI object at 0x7f4b301ef550>

    @pytest.mark.parametrize("model, data, mode", product(models, test_data, modes))
    def test_extract(model, data, mode, client):
        sample_data, expected_nodes_number, expected_edges_number = data
    
        if (mode, model) in {
            (Mode.JSON, "gpt-3.5-turbo"),
            (Mode.JSON, "gpt-4"),
        }:
            pytest.skip(f"{mode} mode is not supported for {model}, skipping test")
    
        # Setting up the client with the instructor patch
        client = instructor.patch(client, mode=mode)
    
        # Calling the extract function with the provided model, sample data, and mode
>       response = client.chat.completions.create(
            model=model,
            response_model=KnowledgeGraph,
            messages=[
                {
                "role": "system",
                "content": "You are a knowledge graph builder. You must extract nodes and edges from a given text. Try to reuse nodes as much as possible.",
                },
                {"role": "user",
                 "content": f"Describe the following text as a detailed knowledge graph: {sample_data}"
                 }
            ],
        )

test_knowledge_graphs.py:50: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../instructor/patch.py:268: in new_chatcompletion_sync
    response = retry_sync(
../../../instructor/patch.py:214: in retry_sync
    raise e
../../../instructor/patch.py:196: in retry_sync
    return process_response(
../../../instructor/patch.py:129: in process_response
    model = response_model.from_response(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = <class 'test_knowledge_graphs.KnowledgeGraph'>
completion = ChatCompletion(id='chatcmpl-8QbQDbpFoth1HgfNqFusEx4tLNYnO', choices=[Choice(finish_reason='stop', index=0, message=Cha...system_fingerprint='fp_a24b4d720c', usage=CompletionUsage(completion_tokens=256, prompt_tokens=776, total_tokens=1032))
validation_context = None, strict = None, mode = <Mode.JSON: 'json_mode'>, stream_multitask = False

    @classmethod
    def from_response(
        cls,
        completion,
        validation_context=None,
        strict: bool = None,
        mode: Mode = Mode.FUNCTIONS,
        stream_multitask: bool = False,
    ):
        """Execute the function from the response of an openai chat completion
    
        Parameters:
            completion (openai.ChatCompletion): The response from an openai chat completion
            throw_error (bool): Whether to throw an error if the function call is not detected
            validation_context (dict): The validation context to use for validating the response
            strict (bool): Whether to use strict json parsing
            mode (Mode): The openai completion mode
            stream_multitask (bool): Whether to stream a multitask response
    
        Returns:
            cls (OpenAISchema): An instance of the class
        """
        if stream_multitask:
            return cls.from_streaming_response(completion, mode)
    
        message = completion.choices[0].message
    
        if mode == Mode.FUNCTIONS:
            assert (
                message.function_call.name == cls.openai_schema["name"]
            ), "Function name does not match"
            return cls.model_validate_json(
                message.function_call.arguments,
                context=validation_context,
                strict=strict,
            )
        elif mode == Mode.TOOLS:
            assert (
                len(message.tool_calls) == 1
            ), "Instructor does not support multiple tool calls, use List[Model] instead."
            tool_call = message.tool_calls[0]
            assert (
                tool_call.function.name == cls.openai_schema["name"]
            ), "Tool name does not match"
            return cls.model_validate_json(
                tool_call.function.arguments,
                context=validation_context,
                strict=strict,
            )
        elif mode == Mode.JSON:
>           return cls.model_validate_json(
                message.content,
                context=validation_context,
                strict=strict,
            )
E           pydantic_core._pydantic_core.ValidationError: 4 validation errors for KnowledgeGraph
E           nodes.0.label
E             Field required [type=missing, input_value={'id': '1', 'type': 'Pers...Jason', 'color': 'Blue'}, input_type=dict]
E               For further information visit https://errors.pydantic.dev/2.5/v/missing
E           nodes.1.label
E             Field required [type=missing, input_value={'id': '2', 'type': 'Disc...nics', 'color': 'Green'}, input_type=dict]
E               For further information visit https://errors.pydantic.dev/2.5/v/missing
E           nodes.2.label
E             Field required [type=missing, input_value={'id': '3', 'type': 'Prof...sicist', 'color': 'Red'}, input_type=dict]
E               For further information visit https://errors.pydantic.dev/2.5/v/missing
E           nodes.3.label
E             Field required [type=missing, input_value={'id': '4', 'type': 'Titl...sor', 'color': 'Purple'}, input_type=dict]
E               For further information visit https://errors.pydantic.dev/2.5/v/missing

../../../instructor/function_calls.py:235: ValidationError
____________________________________________ test_extract[gpt-4-1106-preview-data30-Mode.FUNCTIONS] _____________________________________________

model = 'gpt-4-1106-preview', data = ('Sarah knows Jason and is a student of his.', 3, 3), mode = <Mode.FUNCTIONS: 'function_call'>
client = <openai.OpenAI object at 0x7f4b301ef550>

    @pytest.mark.parametrize("model, data, mode", product(models, test_data, modes))
    def test_extract(model, data, mode, client):
        sample_data, expected_nodes_number, expected_edges_number = data
    
        if (mode, model) in {
            (Mode.JSON, "gpt-3.5-turbo"),
            (Mode.JSON, "gpt-4"),
        }:
            pytest.skip(f"{mode} mode is not supported for {model}, skipping test")
    
        # Setting up the client with the instructor patch
        client = instructor.patch(client, mode=mode)
    
        # Calling the extract function with the provided model, sample data, and mode
        response = client.chat.completions.create(
            model=model,
            response_model=KnowledgeGraph,
            messages=[
                {
                "role": "system",
                "content": "You are a knowledge graph builder. You must extract nodes and edges from a given text. Try to reuse nodes as much as possible.",
                },
                {"role": "user",
                 "content": f"Describe the following text as a detailed knowledge graph: {sample_data}"
                 }
            ],
        )
    
        # Assertions
>       assert (
            len(response.nodes) == expected_nodes_number
        ), f"Expected {expected_nodes_number} nodes, got {len(response.nodes)}"
E       AssertionError: Expected 3 nodes, got 2
E       assert 2 == 3
E        +  where 2 = len([Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Jason', color='green')])
E        +    where [Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Jason', color='green')] = KnowledgeGraph(nodes=[Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Jason', color='green')], edges=[Edge(source=1, target=2, label='knows', color='black'), Edge(source=1, target=2, label='is a student of', color='black')]).nodes

test_knowledge_graphs.py:65: AssertionError
____________________________________________ test_extract[gpt-4-1106-preview-data33-Mode.FUNCTIONS] _____________________________________________

model = 'gpt-4-1106-preview', data = ('Sarah is a student at the University of Toronto. and UofT is in Canada.', 4, 3)
mode = <Mode.FUNCTIONS: 'function_call'>, client = <openai.OpenAI object at 0x7f4b301ef550>

    @pytest.mark.parametrize("model, data, mode", product(models, test_data, modes))
    def test_extract(model, data, mode, client):
        sample_data, expected_nodes_number, expected_edges_number = data
    
        if (mode, model) in {
            (Mode.JSON, "gpt-3.5-turbo"),
            (Mode.JSON, "gpt-4"),
        }:
            pytest.skip(f"{mode} mode is not supported for {model}, skipping test")
    
        # Setting up the client with the instructor patch
        client = instructor.patch(client, mode=mode)
    
        # Calling the extract function with the provided model, sample data, and mode
        response = client.chat.completions.create(
            model=model,
            response_model=KnowledgeGraph,
            messages=[
                {
                "role": "system",
                "content": "You are a knowledge graph builder. You must extract nodes and edges from a given text. Try to reuse nodes as much as possible.",
                },
                {"role": "user",
                 "content": f"Describe the following text as a detailed knowledge graph: {sample_data}"
                 }
            ],
        )
    
        # Assertions
>       assert (
            len(response.nodes) == expected_nodes_number
        ), f"Expected {expected_nodes_number} nodes, got {len(response.nodes)}"
E       AssertionError: Expected 4 nodes, got 5
E       assert 5 == 4
E        +  where 5 = len([Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Student', color='green'), Node(id=3, label='University of Toronto', color='red'), Node(id=4, label='UofT', color='red'), Node(id=5, label='Canada', color='orange')])
E        +    where [Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Student', color='green'), Node(id=3, label='University of Toronto', color='red'), Node(id=4, label='UofT', color='red'), Node(id=5, label='Canada', color='orange')] = KnowledgeGraph(nodes=[Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Student', color='green'), Node(id=3, ...), Edge(source=2, target=3, label='studies at', color='black'), Edge(source=4, target=5, label='is in', color='grey')]).nodes

test_knowledge_graphs.py:65: AssertionError
_______________________________________________ test_extract[gpt-4-1106-preview-data34-Mode.JSON] _______________________________________________

model = 'gpt-4-1106-preview', data = ('Sarah is a student at the University of Toronto. and UofT is in Canada.', 4, 3)
mode = <Mode.JSON: 'json_mode'>, client = <openai.OpenAI object at 0x7f4b301ef550>

    @pytest.mark.parametrize("model, data, mode", product(models, test_data, modes))
    def test_extract(model, data, mode, client):
        sample_data, expected_nodes_number, expected_edges_number = data
    
        if (mode, model) in {
            (Mode.JSON, "gpt-3.5-turbo"),
            (Mode.JSON, "gpt-4"),
        }:
            pytest.skip(f"{mode} mode is not supported for {model}, skipping test")
    
        # Setting up the client with the instructor patch
        client = instructor.patch(client, mode=mode)
    
        # Calling the extract function with the provided model, sample data, and mode
        response = client.chat.completions.create(
            model=model,
            response_model=KnowledgeGraph,
            messages=[
                {
                "role": "system",
                "content": "You are a knowledge graph builder. You must extract nodes and edges from a given text. Try to reuse nodes as much as possible.",
                },
                {"role": "user",
                 "content": f"Describe the following text as a detailed knowledge graph: {sample_data}"
                 }
            ],
        )
    
        # Assertions
>       assert (
            len(response.nodes) == expected_nodes_number
        ), f"Expected {expected_nodes_number} nodes, got {len(response.nodes)}"
E       AssertionError: Expected 4 nodes, got 5
E       assert 5 == 4
E        +  where 5 = len([Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Student', color='green'), Node(id=3, label='University of Toronto', color='red'), Node(id=4, label='Canada', color='red'), Node(id=5, label='UofT', color='orange')])
E        +    where [Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Student', color='green'), Node(id=3, label='University of Toronto', color='red'), Node(id=4, label='Canada', color='red'), Node(id=5, label='UofT', color='orange')] = KnowledgeGraph(nodes=[Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Student', color='green'), Node(id=3, ...ge(source=3, target=4, label='located_in', color='black'), Edge(source=5, target=3, label='refers_to', color='black')]).nodes

test_knowledge_graphs.py:65: AssertionError
______________________________________________ test_extract[gpt-4-1106-preview-data35-Mode.TOOLS] _______________________________________________

model = 'gpt-4-1106-preview', data = ('Sarah is a student at the University of Toronto. and UofT is in Canada.', 4, 3)
mode = <Mode.TOOLS: 'tool_call'>, client = <openai.OpenAI object at 0x7f4b301ef550>

    @pytest.mark.parametrize("model, data, mode", product(models, test_data, modes))
    def test_extract(model, data, mode, client):
        sample_data, expected_nodes_number, expected_edges_number = data
    
        if (mode, model) in {
            (Mode.JSON, "gpt-3.5-turbo"),
            (Mode.JSON, "gpt-4"),
        }:
            pytest.skip(f"{mode} mode is not supported for {model}, skipping test")
    
        # Setting up the client with the instructor patch
        client = instructor.patch(client, mode=mode)
    
        # Calling the extract function with the provided model, sample data, and mode
        response = client.chat.completions.create(
            model=model,
            response_model=KnowledgeGraph,
            messages=[
                {
                "role": "system",
                "content": "You are a knowledge graph builder. You must extract nodes and edges from a given text. Try to reuse nodes as much as possible.",
                },
                {"role": "user",
                 "content": f"Describe the following text as a detailed knowledge graph: {sample_data}"
                 }
            ],
        )
    
        # Assertions
>       assert (
            len(response.nodes) == expected_nodes_number
        ), f"Expected {expected_nodes_number} nodes, got {len(response.nodes)}"
E       AssertionError: Expected 4 nodes, got 5
E       assert 5 == 4
E        +  where 5 = len([Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Student', color='green'), Node(id=3, label='University of Toronto', color='red'), Node(id=4, label='UofT', color='red'), Node(id=5, label='Canada', color='orange')])
E        +    where [Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Student', color='green'), Node(id=3, label='University of Toronto', color='red'), Node(id=4, label='UofT', color='red'), Node(id=5, label='Canada', color='orange')] = KnowledgeGraph(nodes=[Node(id=1, label='Sarah', color='blue'), Node(id=2, label='Student', color='green'), Node(id=3, ...or='black'), Edge(source=2, target=3, label='at', color='black'), Edge(source=4, target=5, label='in', color='black')]).nodes

test_knowledge_graphs.py:65: AssertionError
=============================================================== warnings summary ================================================================
../../../../../../../../home/aastroza/anaconda3/envs/instructor/lib/python3.10/site-packages/pydantic_core/core_schema.py:3928
../../../../../../../../home/aastroza/anaconda3/envs/instructor/lib/python3.10/site-packages/pydantic_core/core_schema.py:3928
  /home/aastroza/anaconda3/envs/instructor/lib/python3.10/site-packages/pydantic_core/core_schema.py:3928: DeprecationWarning: `FieldValidationInfo` is deprecated, use `ValidationInfo` instead.
    warnings.warn(msg, DeprecationWarning, stacklevel=1)

test_knowledge_graphs.py: 27 warnings
  /mnt/d/repos/GitHub/instructor/instructor/patch.py:263: UserWarning: max_retries is not supported when using tool calls
    warnings.warn("max_retries is not supported when using tool calls")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

0 replies

jxnl · 2023-12-02T02:10:56Z

jxnl
Dec 2, 2023
Maintainer

probs not, i think the first level of email is just verifying that these models don't have validation errors and actually generate correctly

maybe len(edges) > 1, len(notes) > 1,

0 replies

aastroza · 2023-12-05T12:43:58Z

aastroza
Dec 5, 2023
Author

If anyone reads this in the future: the JSON mode validation issue was a problem with pydantic nested structures. The system prompt was only utilizing the schema of the top-level structure and not incorporating the information from the referenced structures, leading to hallucinations. It was resolved here: #249

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to create an effective eval of a knowledge graph extractor? #242

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How to create an effective eval of a knowledge graph extractor? #242

aastroza Nov 30, 2023

Replies: 3 comments

aastroza Nov 30, 2023 Author

jxnl Dec 2, 2023 Maintainer

aastroza Dec 5, 2023 Author

aastroza
Nov 30, 2023

aastroza
Nov 30, 2023
Author

jxnl
Dec 2, 2023
Maintainer

aastroza
Dec 5, 2023
Author