Replies: 3 comments 1 reply
-
🤖 Hello @vishhvak, it's good to see you again! The Correctness Evaluator in the LlamaIndex framework is a class that evaluates the correctness of a question answering system. It requires a reference answer, a query string, and a response string. The evaluator outputs a score between 1 and 5, where 1 is the worst and 5 is the best, along with a reasoning for the score. The scoring guidelines are as follows:
The evaluator also has a threshold score, which defaults to 4.0. If the score is greater than or equal to this threshold, the evaluation is considered passing. The Here is the relevant code: from llama_index.llms import OpenAI
def run_correctness_eval(
query_str: str,
reference_answer: str,
generated_answer: str,
llm: OpenAI,
threshold: float = 4.0,
) -> Dict:
"""Run correctness eval."""
fmt_messages = eval_chat_template.format_messages(
llm=llm,
query=query_str,
reference_answer=reference_answer,
generated_answer=generated_answer,
)
chat_response = llm.chat(fmt_messages)
raw_output = chat_response.message.content
# Extract from response
score_str, reasoning_str = raw_output.split("\n", 1)
score = float(score_str)
reasoning = reasoning_str.lstrip("\n")
return {"passing": score >= threshold, "score": score, "reason": reasoning} As for the I hope this helps! If you have any more questions, feel free to ask. Best, SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
Hi @vishhvak! The |
Beta Was this translation helpful? Give feedback.
-
does correctness Evaluator work with ollama model other than openai ? |
Beta Was this translation helpful? Give feedback.
-
How does correctness evaluator work? How do I interpret the score? What is the range of the score?
Beta Was this translation helpful? Give feedback.
All reactions