Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update job chat prompt and add prompt evaluation for issue 97 #118

Merged
merged 2 commits into from
Nov 14, 2024

Conversation

hanna-paasivirta
Copy link
Contributor

@hanna-paasivirta hanna-paasivirta commented Nov 13, 2024

Short Description

This PR primarily adjusts the system prompt in the job_chat service to be less strict about external information. It also adds a notebook to evaluate the online and the new prompts.

Fixes #97 #108 partially.

Implementation Details

To address issue #97 , the prompt was edited to allow the assistant to provide information on external platforms and services.

To address issue #108 , this PR also adds a notebook to generate a prompt test dataset targeting the issue in question. The notebook provides an initial case study of how we can track and evaluate the effects of changes to the LLM pipeline more thoroughly without relying only on qualitative evaluation and spot checking.

The small fully generated evaluation dataset is also added in this PR, and it can be used as part of a routine test and expanded on as we target different issue areas. The dataset shows the LLM outputs for the same set of questions using the online v1 and the candidate v2 prompts. The generated result column indicates whether the answer successfully answered the question, and can be used to calculate a success score. The dataset shows that our new prompt improves the success rate on the external information issue from 20% to 60%.

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our Responsible AI Policy

Copy link
Collaborator

@josephjclark josephjclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic Hanna, thank you.

As you say, we've got some way to go to make a generic test suite for this stuff. But it's an excellent way to validate these specific prompt improvements.

And it's not perfect, but I can see it is better! So we'll take it, thank you.

@josephjclark josephjclark merged commit 1a1b5a3 into main Nov 14, 2024
1 check passed
@josephjclark josephjclark deleted the issue_97 branch November 14, 2024 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

job chat: claude issues
2 participants