[Question]: How does Correctness Evaluator work? #8598

vishhvak · 2023-10-31T04:45:47Z

vishhvak
Oct 31, 2023

How does correctness evaluator work? How do I interpret the score? What is the range of the score?

@vishhvak · 2023-10-31T04:54:16Z

dosubot[bot]
bot Oct 31, 2023

🤖

Hello @vishhvak, it's good to see you again!

The Correctness Evaluator in the LlamaIndex framework is a class that evaluates the correctness of a question answering system. It requires a reference answer, a query string, and a response string. The evaluator outputs a score between 1 and 5, where 1 is the worst and 5 is the best, along with a reasoning for the score.

The scoring guidelines are as follows:

If the generated answer is not relevant to the user query, the score should be 1.
If the generated answer is relevant but contains mistakes, the score should be between 2 and 3.
If the generated answer is relevant and fully correct, the score should be between 4 and 5.

The evaluator also has a threshold score, which defaults to 4.0. If the score is greater than or equal to this threshold, the evaluation is considered passing.

The run_correctness_eval function is used to perform the evaluation. It takes in a query, a response, and a reference answer. If any of these are not provided, it raises a ValueError. The function then uses the LLM to generate an evaluation response, which includes the score and the reasoning. The score and reasoning are extracted from the response and returned as a dictionary.

Here is the relevant code:

from llama_index.llms import OpenAI

def run_correctness_eval(
    query_str: str,
    reference_answer: str,
    generated_answer: str,
    llm: OpenAI,
    threshold: float = 4.0,
) -> Dict:
    """Run correctness eval."""
    fmt_messages = eval_chat_template.format_messages(
        llm=llm,
        query=query_str,
        reference_answer=reference_answer,
        generated_answer=generated_answer,
    )
    chat_response = llm.chat(fmt_messages)
    raw_output = chat_response.message.content

    # Extract from response
    score_str, reasoning_str = raw_output.split("\n", 1)
    score = float(score_str)
    reasoning = reasoning_str.lstrip("\n")

    return {"passing": score >= threshold, "score": score, "reason": reasoning}

As for the eval_chat_template.format_messages function, I'm afraid I couldn't find specific information about it in the repository. It seems to be a function that formats the messages for the Correctness Evaluator, but without more context or information, I can't provide a more detailed explanation.

I hope this helps! If you have any more questions, feel free to ask.

Best,
Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

nerdai · 2023-11-01T14:35:30Z

nerdai
Nov 1, 2023

Hi @vishhvak! The CorrectnessEvaluator uses an LLM to assign a score between 1-5 to a generated answer (from another LLM usually) against a user query — note we also pass in a reference answer to the LLM judge for it to make its grading. Higher is better. In terms of interpretation, you can gain some insight on it through the default prompt template that is being used: see here

0 replies

shaileshredkar · 2024-11-04T04:49:47Z

shaileshredkar
Nov 4, 2024

does correctness Evaluator work with ollama model other than openai ?

1 reply

nerdai Nov 4, 2024

Yea, you can just use our ollama llm integration and supply that to the CorrectnessEvaluator:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: How does Correctness Evaluator work? #8598

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Question]: How does Correctness Evaluator work? #8598

vishhvak Oct 31, 2023

Replies: 3 comments · 1 reply

dosubot[bot] bot Oct 31, 2023

Sources

nerdai Nov 1, 2023

shaileshredkar Nov 4, 2024

nerdai Nov 4, 2024

vishhvak
Oct 31, 2023

Replies: 3 comments 1 reply

dosubot[bot]
bot Oct 31, 2023

nerdai
Nov 1, 2023

shaileshredkar
Nov 4, 2024