Evidence - FeedbackEvaluator Storage & Basic UI #12758

dandrabik · 2025-01-31T20:00:38Z

WHAT

Run the FeedbackEvaluator after trials complete, store in a DB table and roll-up, show in UI.

WHY

We want to show this info in the UI of the research tool to make prompt engineering on the feedback prompt more efficient.

HOW

Create a new table evidence_research_gen_ai_feedback_evaluations and run evaluations when GEvalScores are calculated. Show in UI.

Screenshots

### Notion Card Links
https://www.notion.so/quill/Feedback-Evaluation-Metrics-Storage-185d42e6f941807191e3ddc2da872cd7

What have you done to QA this feature?

I've run some tests locally in the console while connected to the staging DB. I also ran the metrics on staging and seemed to work properly. Viewed the UI locally to ensure the metrics show properly.

PR Checklist	Your Answer
Have you added and/or updated tests?	Yes
Have you deployed to Staging?	YES
Self-Review: Have you done an initial self-review of the code below on Github?	Yes

emilia-friedberg

This looks good to me -- nice work cleanly following the existing paradigm on the frontend. It looks like you might be having the same issue with snapshot datetimes that @brendanshean did last week, maybe he can speak to how he resolved that.

brendanshean

Nice work on this and well done on integrating with the new UI.

services/QuillLMS/engines/evidence/app/models/evidence/research/gen_ai/trial.rb

...S/engines/evidence/spec/workers/evidence/research/gen_ai/run_trial_evaluation_worker_spec.rb

services/QuillLMS/engines/evidence/app/models/evidence/research/gen_ai/trial.rb

brendanshean

This looks great. I think this will provide some good insight into LLM feedback.

services/QuillLMS/engines/evidence/app/models/evidence/research/gen_ai/trial.rb

Evidence - FeedbackEvaluator Storage & Basic UI (#12758)

dandrabik added 30 commits January 8, 2025 14:53

wip

55bd451

Wip

c391c58

Working version of rule trial.

b16de96

Extract examples out of markdown and into code to make easier to edit.

58c4c2f

Add String extensions to count questions.

4d6ab0a

WIP for regex version of prompt checker.

5d2c9a6

Add some initial regex checks.

6c62072

wip

971d6a8

Prompt that flags a good number of examples.

a75efd8

More iteration.

071fbd3

Remove unused code for now.

96b6148

Extract concern from base class.

48109ad

Move more out of the base class.

611e922

Clean up unused files.

724178c

Add new data files.

c11e6fc

Update Verbose checker to be looser.

3b0db29

Add more datasets

9f24bcd

Spec wip.

804a6ec

Fix specs, update Scalpel code for more edge cases.

9959c1f

Add some basic tests.

64d12a1

Lint

9c6b146

Lint and small refactors.

cd41170

Rename Scalpel to better name since I’ve edited it a bunch already.

2d8be04

Lint.

0647b7e

Don’t modify String from an Engine (rethinking this is bad form).

26e5123

Lint.

9fbbe43

Fix script, delete unused file, lint.

fae954f

Code Cleanup.

574f4b5

Initial working version of Feedback Evaluation.

1d9e902

Update controller endpoints for frontend use.

79a72e3

Update snap tests.

c810451

emilia-friedberg approved these changes Feb 3, 2025

View reviewed changes

dandrabik added 2 commits February 3, 2025 09:31

Check out snap from develop.

e478781

Add feedback errors to old UI.

1c58625

brendanshean approved these changes Feb 3, 2025

View reviewed changes

services/QuillLMS/engines/evidence/app/models/evidence/research/gen_ai/trial.rb Outdated Show resolved Hide resolved

...S/engines/evidence/spec/workers/evidence/research/gen_ai/run_trial_evaluation_worker_spec.rb Outdated Show resolved Hide resolved

PR feedback.

741031a

dandrabik temporarily deployed to quill-lms-dan February 3, 2025 19:49 Inactive

anathomical approved these changes Feb 3, 2025

View reviewed changes

dandrabik added 3 commits February 3, 2025 14:57

Merge branch 'develop' into feedback_evaluator_storage

bd619c1

Use develops snapshot.

f0175fd

Fix jest test.

2c9d46a

dandrabik temporarily deployed to quill-lms-dan February 3, 2025 21:07 Inactive

Add another test, update set_evaluator_counts method.

d5f022b

dandrabik temporarily deployed to quill-lms-dan February 4, 2025 02:25 Inactive

Fix test stub with missing created_at attribute.

e3e1ac1

dandrabik temporarily deployed to quill-lms-dan February 4, 2025 15:38 Inactive

dandrabik added 3 commits February 4, 2025 15:32

Add eager loading to comparison page so it loads.

b9a13bc

Fix race condition in update_results with transaction.

727ac1f

Fixing up tests.

fba1fcc

dandrabik temporarily deployed to quill-lms-dan February 5, 2025 15:48 Inactive

Add save before with_lock.

7fb3efd

dandrabik temporarily deployed to quill-lms-dan February 5, 2025 16:55 Inactive

Merge branch 'develop' into feedback_evaluator_storage

f41cc08

dandrabik commented Feb 5, 2025

View reviewed changes

services/QuillLMS/engines/evidence/app/models/evidence/research/gen_ai/trial.rb Show resolved Hide resolved

dandrabik requested review from brendanshean and anathomical February 5, 2025 17:09

brendanshean approved these changes Feb 5, 2025

View reviewed changes

services/QuillLMS/engines/evidence/app/models/evidence/research/gen_ai/trial.rb Show resolved Hide resolved

dandrabik merged commit 1eca0a2 into develop Feb 5, 2025
19 checks passed

dandrabik deleted the feedback_evaluator_storage branch February 5, 2025 21:24

dandrabik added a commit that referenced this pull request Feb 5, 2025

Merge pull request #12775 from empirical-org/develop

ceaae79

Evidence - FeedbackEvaluator Storage & Basic UI (#12758)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evidence - FeedbackEvaluator Storage & Basic UI #12758

Evidence - FeedbackEvaluator Storage & Basic UI #12758

dandrabik commented Jan 31, 2025 •

edited

Loading

emilia-friedberg left a comment

brendanshean left a comment

brendanshean left a comment

Evidence - FeedbackEvaluator Storage & Basic UI #12758

Evidence - FeedbackEvaluator Storage & Basic UI #12758

Conversation

dandrabik commented Jan 31, 2025 • edited Loading

WHAT

WHY

HOW

Screenshots

What have you done to QA this feature?

emilia-friedberg left a comment

Choose a reason for hiding this comment

brendanshean left a comment

Choose a reason for hiding this comment

brendanshean left a comment

Choose a reason for hiding this comment

dandrabik commented Jan 31, 2025 •

edited

Loading