Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement job_result_download for experiment service #632

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

veekaybee
Copy link
Member

@veekaybee veekaybee commented Jan 15, 2025

What's changing

We currently download results per job. We'd like to implement all data results downloaded per experiment.

  1. We look in the jobs table for all jobs that match a specific experiment id (inference + eval) and
  2. return the results as JSON.

image

In order to make these changes, we need to change the jobs, experiments service, and related API calls.

Provide a clear and concise description of the content changes you're proposing. List all the changes you are making to the content.

  • Changing jobs service
  • Changing experiment service
  • Changing experiment service API calls

If this PR is related to an issue or closes one, please link it here.

See #572

How to test it

Steps to test the changes:

  1. Upload dialogsum dataset
#!/bin/bash
if [ "$#" -gt 0 ]; then
    DATA_CSV_PATH="$1"
else
    DATA_CSV_PATH="$HOME/Downloads/dialogsum.csv"
fi

if [[ -z "${BACKEND_URL}" ]]; then
  BACKEND_URL=http://localhost:8000
fi

echo Connecting to $BACKEND_URL...

curl -s $BACKEND_URL/api/v1/datasets/ \
  -H 'Accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'dataset=@'"$DATA_CSV_PATH"';type=text/csv' \
  -F 'format=job' | jq
  1. Create an experiment:
if [[ -z "${BACKEND_URL}" ]]; then
  BACKEND_URL=http://localhost:8000
fi

DATASET_ID=$(curl -s $BACKEND_URL/api/v1/datasets/ | jq -r '.items |sort_by(.created_at) | reverse | .[0].id')

EVAL_NAME="test_experiment_mistral"
EVAL_DESC="Test experiment (inference + eval) with Mistral API"
EVAL_MODEL="hf://facebook/bart-large-cnn"
EVAL_DATASET="3fa85f64-5717-4562-b3fc-2c963f66afa6"
EVAL_MAX_SAMPLES="10"

JSON_STRING=$(jq -n \
                --arg name "$EVAL_NAME" \
                --arg desc "$EVAL_DESC" \
                --arg model "$EVAL_MODEL" \
                --arg dataset_id "$EVAL_DATASET" \
                --arg max_samples "$EVAL_MAX_SAMPLES" \
                '{name: $name, description: $desc, model: $model, dataset: $dataset_id, max_samples: $max_samples}' )

echo Connecting to $BACKEND_URL...

curl -s $BACKEND_URL/api/v1/experiments_new/ \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -d "$JSON_STRING" | jq
  1. Run eval_lite:
curl -X 'POST' \
  'http://localhost:8000/api/v1/jobs/eval_lite/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "string",
  "description": "xx",
  "model":"hf://facebook/bart-large-cnn",
  "model_url":"hf://facebook/bart-large-cnn",
  "dataset": "e19b878f-4eda-4439-841b-e010f2e9a16b",
  "max_samples": "-1"
}'
  1. Run
curl -X 'GET' \
  'http://localhost:8000/api/v1/experiments_new/{yourid}/result/download' \
  -H 'accept: application/json'

You should get two files

Additional notes for reviewers

I already...

  • Tested the changes in a working environment to ensure they work as expected
  • Added some tests for any new functionality
  • Updated the documentation (both comments in code and product documentation under /docs)
  • Checked if a (backend) DB migration step was required and included it if required

@github-actions github-actions bot added backend api Changes which impact API/presentation layer labels Jan 15, 2025
@veekaybee veekaybee force-pushed the 572_experiment_results branch from bbab0b5 to 9283c23 Compare January 15, 2025 12:40
@github-actions github-actions bot added the schemas Changes to schemas (which may be public facing) label Jan 16, 2025
@veekaybee veekaybee force-pushed the 572_experiment_results branch from a725415 to 3ee4aac Compare January 16, 2025 16:46
@@ -73,6 +73,7 @@ class JobEvalLiteCreate(BaseModel):
name: str
description: str = ""
model: str
model_url: str | None = None
Copy link
Member Author

@veekaybee veekaybee Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping consistent with other eval job, I added it because the Pydantic validation wasn't working without it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can try to get a deeper look at that issue, if you feel it should be left out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evaluator job takes models in because it runs inference with those models. The eval_lite job has a model parameter just to record the model name and it does not use model_url at all. What is the caller that passes this parameter to eval_lite? The ExperimentService passes this dictionary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure it should stay consistent between inference and evaluation jobs? I understand the awkward situation while the evaluator combo is in the code, but we are phasing out at the same time evaluator and (by extension) the current /experiments at the same time, so I'm not sure all jobs need a model_url, no?

@veekaybee veekaybee requested a review from aittalam January 16, 2025 19:59
@veekaybee veekaybee marked this pull request as ready for review January 16, 2025 19:59
with s3.open(f"{settings.S3_BUCKET}/{result_key}", "r") as f:
job_results = json.loads(f.read())

# we just merge the two dictionaries for now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a list of job responses? Merging the dicts means we and the user needs to keep track of what info goes into which dicts (which is not modelled currently), and some may be overwritten accidentally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can provide the rationale behind this.

This is by no way a general method we want to implement for composite jobs, but serves the purpose of merging results from inference+evaluation, whose outputs we know precisely. We will likely not need this anymore if/when we start pulling aggregated results from child runs in e.g. mlflow, and we would not need to do this if we did not want to be 100% compatible with the current API and the UI which hits "experiments" when it needs information about jobs.
All of this to say this is ad-hoc and temporary :-)

The reason why we merge these is that the UI expects from the API the outputs of an "experiment" (from the UI pov) with original samples, ground_truth, predictions, and metrics. We do not force evaluation to have original samples as an input though, so we need to get them from the inference job. The union of the jobs' outputs is our output. We don't mind about overwriting keys rn as if we have the same key in the two dictionaries the contents are the same too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what you're saying and since this PR belongs to the whole experiments_new body of work and we expected changes to the API for that post MVP, it may be worth discussing a different way of storing results.

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree: if we are free from the need to be compatible with the current API, I think we can come up with something better than this.
For instance, there's no need to move all this data around jobs: we could limit this to the bare minimum inputs and outputs, and then let the UI pick whatever they need to visualize at a given moment (e.g. they won't likely need a GB-large dataset if they just want to show aggregated metrics)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, my suggestion here was to change this piece of the UI

Copy link
Contributor

@javiermtorres javiermtorres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some comments but I'm pre-ok with your responses :)

return self.session.query(JobResultRecord).where(JobResultRecord.job_id == job_id).first()

def get_jobs_by_experiment_id(self, experiment_id: UUID) -> list[JobRecord]:
return self.session.query(JobRecord).order_by(desc(JobRecord.created_at)).limit(2).all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the order and the limit. Is there any change to put these constraints?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we use the experiment_id here and instead limit the result set to the first two results?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mockup we set up while waiting for the grouping by experiment_id to be present. It allowed Vicki to get inference+eval job from the latest call to experiments_new without having job->experiment matching in place yet. It will be changed with the current method made available to get all experiments for a given job (which I think has not been merged yet, is that correct?)

"""Sets model URL based on protocol address"""
if request.model.startswith("oai://"):
if request.model_url.startswith("oai://"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should now pass a model_url argument even if you don't need it (i.e., in the case of self-hosted models). Why don't we first check if model_url is set and if not return None. Currently, this breaks the "create new experiment" flow, since the second job that the experiment tries to create fails with:

backend-1      |   File "/mzai/lumigator/python/mzai/backend/backend/services/jobs.py", line 117, in _set_model_type
backend-1      |     if request.model_url.startswith("oai://"):
backend-1      |        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1      | AttributeError: 'NoneType' object has no attribute 'startswith'

Copy link
Member

@aittalam aittalam Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was intended to be model, which always has a value (it is the URI we provide as input). The url, instead, is optional for non-api models (see ExperimentCreate)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the stack trace I was getting without model_url on an eval_lite call:

backend-1      |   File "/mzai/lumigator/python/mzai/backend/backend/api/routes/jobs.py", line 95, in create_evaluation_lite_job
backend-1      |     job_response = service.create_job(job_create_request)
backend-1      |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1      |   File "/mzai/lumigator/python/mzai/backend/backend/services/jobs.py", line 209, in create_job
backend-1      |     config_params = self._get_job_params(job_type, record, request)
backend-1      |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1      |   File "/mzai/lumigator/python/mzai/backend/backend/services/jobs.py", line 154, in _get_job_params
backend-1      |     "model_uri": request.model,
backend-1      |                  ^^^^^^^^^^^^^
backend-1      |   File "/mzai/lumigator/python/mzai/backend/.venv/lib/python3.11/site-packages/pydantic/main.py", line 856, in __getattr__
backend-1      |     raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
backend-1      | AttributeError: 'JobEvalLiteCreate' object has no attribute 'model'

It's clearly not correct because it does have that attribute:

class JobEvalLiteCreate(BaseModel):
so I need to dig a bit deeper here

Copy link
Member

@aittalam aittalam Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hypothesis is that the model URI (e.g. mistral://...) might be passed as model_url field instead of model. The URI identifies the model, the URL provides e.g. an IP for locally hosted models (e.g. localhost for llamafiles)... hope this helps!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that the request is passed typed as a BaseModel??? I don't know if Pydantic will intercept some of the method/attr calls to enforce the constraints :-/

return self.session.query(JobResultRecord).where(JobResultRecord.job_id == job_id).first()

def get_jobs_by_experiment_id(self, experiment_id: UUID) -> list[JobRecord]:
return self.session.query(JobRecord).order_by(desc(JobRecord.created_at)).limit(2).all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we use the experiment_id here and instead limit the result set to the first two results?

@@ -196,6 +200,48 @@ def create_experiment(

return ExperimentResponse.model_validate(experiment_record)

def _get_experiment_jobs(self, experiment_id: UUID):
records = self._job_service.job_repo.get_jobs_by_experiment_id(experiment_id)
Copy link
Contributor

@dpoulopoulos dpoulopoulos Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've added the get_jobs_by_experiment_id method in the JobResultRepository class and you're trying to invoke it using a JobRepository instance. This line doesn't work:

AttributeError: 'JobRepository' object has no attribute 'get_jobs_by_experiment_id'

I guess you'd want to move the get_jobs_by_experiment_id method in the JobRepository class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opted to put this in the experiment repo since it's logically related to experiments and we don't instantiate a jobs repo in the experiment service - let me know what you think

records = self._job_service.job_repo.get_jobs_by_experiment_id(experiment_id)
return records

def get_experiment_result_download(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does having this here impact hitting the api through /experiments/{job_id} to get the results?

Any changes needed there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see these as two different routes in that case

with s3.open(f"{settings.S3_BUCKET}/{result_key}", "r") as f:
job_results = json.loads(f.read())

# we just merge the two dictionaries for now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what you're saying and since this PR belongs to the whole experiments_new body of work and we expected changes to the API for that post MVP, it may be worth discussing a different way of storing results.

WDYT?

@@ -73,6 +73,7 @@ class JobEvalLiteCreate(BaseModel):
name: str
description: str = ""
model: str
model_url: str | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure it should stay consistent between inference and evaluation jobs? I understand the awkward situation while the evaluator combo is in the code, but we are phasing out at the same time evaluator and (by extension) the current /experiments at the same time, so I'm not sure all jobs need a model_url, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Changes which impact API/presentation layer backend schemas Changes to schemas (which may be public facing)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants