Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job chat: Add a prompt testing process #108

Open
hanna-paasivirta opened this issue Nov 12, 2024 · 1 comment
Open

job chat: Add a prompt testing process #108

hanna-paasivirta opened this issue Nov 12, 2024 · 1 comment
Assignees

Comments

@hanna-paasivirta
Copy link
Contributor

New prompts should be tested to evaluate their performance and minimise unexpected issues in production. This will likely involve accumulating generated test datasets targeting different issues, as well as using LLM-based evaluation to check if each test passed (T/F) to produce a score.

@hanna-paasivirta hanna-paasivirta self-assigned this Nov 12, 2024
@josephjclark
Copy link
Collaborator

For the record I would be happy with a manual test process which goes something like this:

  • Run a bunch of test questions through the assistant
  • save the responses to a file and check them into the repo
  • Make a change to the prompt
  • Re-run the test questions and MANUALLY review diffs in the answers
  • Check in the updated answers if we're happy

We may also need to factor in drift from the LLM end itself - as eg Anthropic updates its model, I don't know how tightly we can version lock, so we may see a natural variance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants