Implement job_result_download for experiment service #632

veekaybee · 2025-01-15T12:10:39Z

What's changing

We currently download results per job. We'd like to implement all data results downloaded per experiment.

We look in the jobs table for all jobs that match a specific experiment id (inference + eval) and
return the results as JSON.

In order to make these changes, we need to change the jobs, experiments service, and related API calls.

Provide a clear and concise description of the content changes you're proposing. List all the changes you are making to the content.

Changing jobs service
Changing experiment service
Changing experiment service API calls

If this PR is related to an issue or closes one, please link it here.

See #572

How to test it

Steps to test the changes:

Upload dialogsum dataset

#!/bin/bash
if [ "$#" -gt 0 ]; then
    DATA_CSV_PATH="$1"
else
    DATA_CSV_PATH="$HOME/Downloads/dialogsum.csv"
fi

if [[ -z "${BACKEND_URL}" ]]; then
  BACKEND_URL=http://localhost:8000
fi

echo Connecting to $BACKEND_URL...

curl -s $BACKEND_URL/api/v1/datasets/ \
  -H 'Accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'dataset=@'"$DATA_CSV_PATH"';type=text/csv' \
  -F 'format=job' | jq

Create an experiment:

if [[ -z "${BACKEND_URL}" ]]; then
  BACKEND_URL=http://localhost:8000
fi

DATASET_ID=$(curl -s $BACKEND_URL/api/v1/datasets/ | jq -r '.items |sort_by(.created_at) | reverse | .[0].id')

EVAL_NAME="test_experiment_mistral"
EVAL_DESC="Test experiment (inference + eval) with Mistral API"
EVAL_MODEL="hf://facebook/bart-large-cnn"
EVAL_DATASET="3fa85f64-5717-4562-b3fc-2c963f66afa6"
EVAL_MAX_SAMPLES="10"

JSON_STRING=$(jq -n \
                --arg name "$EVAL_NAME" \
                --arg desc "$EVAL_DESC" \
                --arg model "$EVAL_MODEL" \
                --arg dataset_id "$EVAL_DATASET" \
                --arg max_samples "$EVAL_MAX_SAMPLES" \
                '{name: $name, description: $desc, model: $model, dataset: $dataset_id, max_samples: $max_samples}' )

echo Connecting to $BACKEND_URL...

curl -s $BACKEND_URL/api/v1/experiments_new/ \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -d "$JSON_STRING" | jq

Run eval_lite:

curl -X 'POST' \
  'http://localhost:8000/api/v1/jobs/eval_lite/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "string",
  "description": "xx",
  "model":"hf://facebook/bart-large-cnn",
  "model_url":"hf://facebook/bart-large-cnn",
  "dataset": "e19b878f-4eda-4439-841b-e010f2e9a16b",
  "max_samples": "-1"
}'

Run

curl -X 'GET' \
  'http://localhost:8000/api/v1/experiments_new/{yourid}/result/download' \
  -H 'accept: application/json'

You should get two files

Additional notes for reviewers

I already...

Tested the changes in a working environment to ensure they work as expected
Added some tests for any new functionality
Updated the documentation (both comments in code and product documentation under /docs)
Checked if a (backend) DB migration step was required and included it if required

lumigator/python/mzai/schemas/lumigator_schemas/experiments.py

veekaybee · 2025-01-16T19:58:38Z

lumigator/python/mzai/schemas/lumigator_schemas/jobs.py

@@ -73,6 +73,7 @@ class JobEvalLiteCreate(BaseModel):
    name: str
    description: str = ""
    model: str
+    model_url: str | None = None


keeping consistent with other eval job, I added it because the Pydantic validation wasn't working without it

We can try to get a deeper look at that issue, if you feel it should be left out.

The evaluator job takes models in because it runs inference with those models. The eval_lite job has a model parameter just to record the model name and it does not use model_url at all. What is the caller that passes this parameter to eval_lite? The ExperimentService passes this dictionary.

Are we sure it should stay consistent between inference and evaluation jobs? I understand the awkward situation while the evaluator combo is in the code, but we are phasing out at the same time evaluator and (by extension) the current /experiments at the same time, so I'm not sure all jobs need a model_url, no?

lumigator/python/mzai/schemas/lumigator_schemas/datasets.py

javiermtorres · 2025-01-17T08:56:07Z

lumigator/python/mzai/backend/backend/services/experiments.py

+            with s3.open(f"{settings.S3_BUCKET}/{result_key}", "r") as f:
+                job_results = json.loads(f.read())
+
+            # we just merge the two dictionaries for now


What about a list of job responses? Merging the dicts means we and the user needs to keep track of what info goes into which dicts (which is not modelled currently), and some may be overwritten accidentally.

I can provide the rationale behind this.

This is by no way a general method we want to implement for composite jobs, but serves the purpose of merging results from inference+evaluation, whose outputs we know precisely. We will likely not need this anymore if/when we start pulling aggregated results from child runs in e.g. mlflow, and we would not need to do this if we did not want to be 100% compatible with the current API and the UI which hits "experiments" when it needs information about jobs.
All of this to say this is ad-hoc and temporary :-)

The reason why we merge these is that the UI expects from the API the outputs of an "experiment" (from the UI pov) with original samples, ground_truth, predictions, and metrics. We do not force evaluation to have original samples as an input though, so we need to get them from the inference job. The union of the jobs' outputs is our output. We don't mind about overwriting keys rn as if we have the same key in the two dictionaries the contents are the same too.

From what you're saying and since this PR belongs to the whole experiments_new body of work and we expected changes to the API for that post MVP, it may be worth discussing a different way of storing results.

WDYT?

Agree: if we are free from the need to be compatible with the current API, I think we can come up with something better than this.
For instance, there's no need to move all this data around jobs: we could limit this to the bare minimum inputs and outputs, and then let the UI pick whatever they need to visualize at a given moment (e.g. they won't likely need a GB-large dataset if they just want to show aggregated metrics)

Yep, my suggestion here was to change this piece of the UI

lumigator/lumigator/frontend/src/services/experiments/experimentService.js

Line 56 in fde73e2

async function fetchResults(job_id) {

lumigator/python/mzai/backend/backend/services/experiments.py

javiermtorres

LGTM. Some comments but I'm pre-ok with your responses :)

javiermtorres · 2025-01-17T09:28:01Z

lumigator/python/mzai/backend/backend/repositories/jobs.py

+        return self.session.query(JobResultRecord).where(JobResultRecord.job_id == job_id).first()
+
+    def get_jobs_by_experiment_id(self, experiment_id: UUID) -> list[JobRecord]:
+        return self.session.query(JobRecord).order_by(desc(JobRecord.created_at)).limit(2).all()


I'd remove the order and the limit. Is there any change to put these constraints?

Why don't we use the experiment_id here and instead limit the result set to the first two results?

This is a mockup we set up while waiting for the grouping by experiment_id to be present. It allowed Vicki to get inference+eval job from the latest call to experiments_new without having job->experiment matching in place yet. It will be changed with the current method made available to get all experiments for a given job (which I think has not been merged yet, is that correct?)

dpoulopoulos · 2025-01-17T11:04:10Z

lumigator/python/mzai/backend/backend/services/jobs.py

        """Sets model URL based on protocol address"""
-        if request.model.startswith("oai://"):
+        if request.model_url.startswith("oai://"):


You should now pass a model_url argument even if you don't need it (i.e., in the case of self-hosted models). Why don't we first check if model_url is set and if not return None. Currently, this breaks the "create new experiment" flow, since the second job that the experiment tries to create fails with:

backend-1 | File "/mzai/lumigator/python/mzai/backend/backend/services/jobs.py", line 117, in _set_model_type backend-1 | if request.model_url.startswith("oai://"): backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ backend-1 | AttributeError: 'NoneType' object has no attribute 'startswith'

I think this was intended to be model, which always has a value (it is the URI we provide as input). The url, instead, is optional for non-api models (see ExperimentCreate)

This is the stack trace I was getting without model_url on an eval_lite call:

backend-1 | File "/mzai/lumigator/python/mzai/backend/backend/api/routes/jobs.py", line 95, in create_evaluation_lite_job backend-1 | job_response = service.create_job(job_create_request) backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ backend-1 | File "/mzai/lumigator/python/mzai/backend/backend/services/jobs.py", line 209, in create_job backend-1 | config_params = self._get_job_params(job_type, record, request) backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ backend-1 | File "/mzai/lumigator/python/mzai/backend/backend/services/jobs.py", line 154, in _get_job_params backend-1 | "model_uri": request.model, backend-1 | ^^^^^^^^^^^^^ backend-1 | File "/mzai/lumigator/python/mzai/backend/.venv/lib/python3.11/site-packages/pydantic/main.py", line 856, in __getattr__ backend-1 | raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}') backend-1 | AttributeError: 'JobEvalLiteCreate' object has no attribute 'model'

It's clearly not correct because it does have that attribute:

lumigator/lumigator/python/mzai/schemas/lumigator_schemas/jobs.py

Line 72 in 55eb2d9

class JobEvalLiteCreate(BaseModel):

so I need to dig a bit deeper here

My hypothesis is that the model URI (e.g. mistral://...) might be passed as model_url field instead of model. The URI identifies the model, the URL provides e.g. an IP for locally hosted models (e.g. localhost for llamafiles)... hope this helps!

Could it be that the request is passed typed as a BaseModel??? I don't know if Pydantic will intercept some of the method/attr calls to enforce the constraints :-/

dpoulopoulos · 2025-01-17T11:10:45Z

lumigator/python/mzai/backend/backend/repositories/jobs.py

+        return self.session.query(JobResultRecord).where(JobResultRecord.job_id == job_id).first()
+
+    def get_jobs_by_experiment_id(self, experiment_id: UUID) -> list[JobRecord]:
+        return self.session.query(JobRecord).order_by(desc(JobRecord.created_at)).limit(2).all()


Why don't we use the experiment_id here and instead limit the result set to the first two results?

dpoulopoulos · 2025-01-17T11:16:48Z

lumigator/python/mzai/backend/backend/services/experiments.py

@@ -196,6 +200,48 @@ def create_experiment(

        return ExperimentResponse.model_validate(experiment_record)

+    def _get_experiment_jobs(self, experiment_id: UUID):
+        records = self._job_service.job_repo.get_jobs_by_experiment_id(experiment_id)


You've added the get_jobs_by_experiment_id method in the JobResultRepository class and you're trying to invoke it using a JobRepository instance. This line doesn't work:

AttributeError: 'JobRepository' object has no attribute 'get_jobs_by_experiment_id'

I guess you'd want to move the get_jobs_by_experiment_id method in the JobRepository class.

I've opted to put this in the experiment repo since it's logically related to experiments and we don't instantiate a jobs repo in the experiment service - let me know what you think

lumigator/python/mzai/backend/backend/services/jobs.py

Co-authored-by: Davide Eynard <[email protected]> Signed-off-by: Vicki Boykis <[email protected]>

ividal · 2025-01-17T14:51:47Z

lumigator/python/mzai/backend/backend/services/experiments.py

+        records = self._job_service.job_repo.get_jobs_by_experiment_id(experiment_id)
+        return records
+
+    def get_experiment_result_download(


Does having this here impact hitting the api through /experiments/{job_id} to get the results?

Any changes needed there?

I see these as two different routes in that case

ividal · 2025-01-17T14:59:11Z

lumigator/python/mzai/backend/backend/services/experiments.py

+            with s3.open(f"{settings.S3_BUCKET}/{result_key}", "r") as f:
+                job_results = json.loads(f.read())
+
+            # we just merge the two dictionaries for now


From what you're saying and since this PR belongs to the whole experiments_new body of work and we expected changes to the API for that post MVP, it may be worth discussing a different way of storing results.

WDYT?

ividal · 2025-01-17T15:02:22Z

lumigator/python/mzai/schemas/lumigator_schemas/jobs.py

@@ -73,6 +73,7 @@ class JobEvalLiteCreate(BaseModel):
    name: str
    description: str = ""
    model: str
+    model_url: str | None = None


Are we sure it should stay consistent between inference and evaluation jobs? I understand the awkward situation while the evaluator combo is in the code, but we are phasing out at the same time evaluator and (by extension) the current /experiments at the same time, so I'm not sure all jobs need a model_url, no?

github-actions bot added backend api Changes which impact API/presentation layer labels Jan 15, 2025

veekaybee force-pushed the 572_experiment_results branch from bbab0b5 to 9283c23 Compare January 15, 2025 12:40

github-actions bot added the schemas Changes to schemas (which may be public facing) label Jan 16, 2025

veekaybee added 6 commits January 16, 2025 11:31

change job_result_download to experiment service

901c798

orm get jobs by experiment

b824eac

add experiment methods

4fe14cd

add service deps

8400788

fix service deps

33f7127

fix dataset input

3ee4aac

veekaybee force-pushed the 572_experiment_results branch from a725415 to 3ee4aac Compare January 16, 2025 16:46

remove comment

4e78a40

veekaybee commented Jan 16, 2025

View reviewed changes

lumigator/python/mzai/schemas/lumigator_schemas/experiments.py Show resolved Hide resolved

fix schemas

973db9a

veekaybee commented Jan 16, 2025

View reviewed changes

lumigator/python/mzai/schemas/lumigator_schemas/datasets.py Show resolved Hide resolved

veekaybee requested a review from aittalam January 16, 2025 19:59

veekaybee marked this pull request as ready for review January 16, 2025 19:59

Merge branch 'main' into 572_experiment_results

ced51da

javiermtorres reviewed Jan 17, 2025

View reviewed changes

lumigator/python/mzai/backend/backend/services/experiments.py Show resolved Hide resolved

javiermtorres approved these changes Jan 17, 2025

View reviewed changes

javiermtorres reviewed Jan 17, 2025

View reviewed changes

dpoulopoulos requested changes Jan 17, 2025

View reviewed changes

aittalam reviewed Jan 17, 2025

View reviewed changes

lumigator/python/mzai/backend/backend/services/jobs.py Outdated Show resolved Hide resolved

aittalam reviewed Jan 17, 2025

View reviewed changes

lumigator/python/mzai/backend/backend/services/jobs.py Outdated Show resolved Hide resolved

veekaybee and others added 4 commits January 17, 2025 09:26

Update lumigator/python/mzai/backend/backend/services/jobs.py

5ed84b7

Co-authored-by: Davide Eynard <[email protected]> Signed-off-by: Vicki Boykis <[email protected]>

Update lumigator/python/mzai/backend/backend/services/jobs.py

0d31dbf

Co-authored-by: Davide Eynard <[email protected]> Signed-off-by: Vicki Boykis <[email protected]>

Merge branch 'main' into 572_experiment_results

3a9adf1

fix experiment query

0091233

ividal reviewed Jan 17, 2025

View reviewed changes

fetch jobs by experiment from experiment repo

6969137

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement job_result_download for experiment service #632

Implement job_result_download for experiment service #632

veekaybee commented Jan 15, 2025 •

edited by dpoulopoulos

Loading

veekaybee Jan 16, 2025 •

edited

Loading

javiermtorres Jan 17, 2025

aittalam Jan 17, 2025

ividal Jan 17, 2025

javiermtorres Jan 17, 2025

aittalam Jan 17, 2025

ividal Jan 17, 2025

aittalam Jan 17, 2025

veekaybee Jan 17, 2025

javiermtorres left a comment

javiermtorres Jan 17, 2025

dpoulopoulos Jan 17, 2025

aittalam Jan 17, 2025

dpoulopoulos Jan 17, 2025

aittalam Jan 17, 2025 •

edited

Loading

veekaybee Jan 17, 2025

aittalam Jan 17, 2025 •

edited

Loading

javiermtorres Jan 17, 2025

dpoulopoulos Jan 17, 2025

dpoulopoulos Jan 17, 2025 •

edited

Loading

veekaybee Jan 17, 2025

ividal Jan 17, 2025

veekaybee Jan 17, 2025

ividal Jan 17, 2025

ividal Jan 17, 2025

Implement job_result_download for experiment service #632

Are you sure you want to change the base?

Implement job_result_download for experiment service #632

Conversation

veekaybee commented Jan 15, 2025 • edited by dpoulopoulos Loading

What's changing

How to test it

Additional notes for reviewers

I already...

veekaybee Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javiermtorres left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aittalam Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aittalam Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dpoulopoulos Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

veekaybee commented Jan 15, 2025 •

edited by dpoulopoulos

Loading

veekaybee Jan 16, 2025 •

edited

Loading

aittalam Jan 17, 2025 •

edited

Loading

aittalam Jan 17, 2025 •

edited

Loading

dpoulopoulos Jan 17, 2025 •

edited

Loading