-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement job_result_download for experiment service #632
base: main
Are you sure you want to change the base?
Conversation
bbab0b5
to
9283c23
Compare
a725415
to
3ee4aac
Compare
@@ -73,6 +73,7 @@ class JobEvalLiteCreate(BaseModel): | |||
name: str | |||
description: str = "" | |||
model: str | |||
model_url: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keeping consistent with other eval job, I added it because the Pydantic validation wasn't working without it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can try to get a deeper look at that issue, if you feel it should be left out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The evaluator job takes models in because it runs inference with those models. The eval_lite job has a model
parameter just to record the model name and it does not use model_url
at all. What is the caller that passes this parameter to eval_lite? The ExperimentService passes this dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure it should stay consistent between inference and evaluation jobs? I understand the awkward situation while the evaluator
combo is in the code, but we are phasing out at the same time evaluator
and (by extension) the current /experiments
at the same time, so I'm not sure all jobs need a model_url
, no?
with s3.open(f"{settings.S3_BUCKET}/{result_key}", "r") as f: | ||
job_results = json.loads(f.read()) | ||
|
||
# we just merge the two dictionaries for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a list of job responses? Merging the dicts means we and the user needs to keep track of what info goes into which dicts (which is not modelled currently), and some may be overwritten accidentally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can provide the rationale behind this.
This is by no way a general method we want to implement for composite jobs, but serves the purpose of merging results from inference+evaluation, whose outputs we know precisely. We will likely not need this anymore if/when we start pulling aggregated results from child runs in e.g. mlflow, and we would not need to do this if we did not want to be 100% compatible with the current API and the UI which hits "experiments" when it needs information about jobs.
All of this to say this is ad-hoc and temporary :-)
The reason why we merge these is that the UI expects from the API the outputs of an "experiment" (from the UI pov) with original samples, ground_truth, predictions, and metrics. We do not force evaluation to have original samples as an input though, so we need to get them from the inference job. The union of the jobs' outputs is our output. We don't mind about overwriting keys rn as if we have the same key in the two dictionaries the contents are the same too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what you're saying and since this PR belongs to the whole experiments_new body of work and we expected changes to the API for that post MVP, it may be worth discussing a different way of storing results.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree: if we are free from the need to be compatible with the current API, I think we can come up with something better than this.
For instance, there's no need to move all this data around jobs: we could limit this to the bare minimum inputs and outputs, and then let the UI pick whatever they need to visualize at a given moment (e.g. they won't likely need a GB-large dataset if they just want to show aggregated metrics)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, my suggestion here was to change this piece of the UI
async function fetchResults(job_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some comments but I'm pre-ok with your responses :)
return self.session.query(JobResultRecord).where(JobResultRecord.job_id == job_id).first() | ||
|
||
def get_jobs_by_experiment_id(self, experiment_id: UUID) -> list[JobRecord]: | ||
return self.session.query(JobRecord).order_by(desc(JobRecord.created_at)).limit(2).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove the order and the limit. Is there any change to put these constraints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we use the experiment_id
here and instead limit the result set to the first two results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a mockup we set up while waiting for the grouping by experiment_id to be present. It allowed Vicki to get inference+eval job from the latest call to experiments_new without having job->experiment matching in place yet. It will be changed with the current method made available to get all experiments for a given job (which I think has not been merged yet, is that correct?)
"""Sets model URL based on protocol address""" | ||
if request.model.startswith("oai://"): | ||
if request.model_url.startswith("oai://"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should now pass a model_url
argument even if you don't need it (i.e., in the case of self-hosted models). Why don't we first check if model_url
is set and if not return None
. Currently, this breaks the "create new experiment" flow, since the second job that the experiment tries to create fails with:
backend-1 | File "/mzai/lumigator/python/mzai/backend/backend/services/jobs.py", line 117, in _set_model_type
backend-1 | if request.model_url.startswith("oai://"):
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | AttributeError: 'NoneType' object has no attribute 'startswith'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was intended to be model
, which always has a value (it is the URI we provide as input). The url, instead, is optional for non-api models (see ExperimentCreate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the stack trace I was getting without model_url
on an eval_lite call:
backend-1 | File "/mzai/lumigator/python/mzai/backend/backend/api/routes/jobs.py", line 95, in create_evaluation_lite_job
backend-1 | job_response = service.create_job(job_create_request)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/mzai/lumigator/python/mzai/backend/backend/services/jobs.py", line 209, in create_job
backend-1 | config_params = self._get_job_params(job_type, record, request)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/mzai/lumigator/python/mzai/backend/backend/services/jobs.py", line 154, in _get_job_params
backend-1 | "model_uri": request.model,
backend-1 | ^^^^^^^^^^^^^
backend-1 | File "/mzai/lumigator/python/mzai/backend/.venv/lib/python3.11/site-packages/pydantic/main.py", line 856, in __getattr__
backend-1 | raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
backend-1 | AttributeError: 'JobEvalLiteCreate' object has no attribute 'model'
It's clearly not correct because it does have that attribute:
class JobEvalLiteCreate(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hypothesis is that the model URI (e.g. mistral://...
) might be passed as model_url
field instead of model
. The URI identifies the model, the URL provides e.g. an IP for locally hosted models (e.g. localhost for llamafiles)... hope this helps!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be that the request is passed typed as a BaseModel??? I don't know if Pydantic will intercept some of the method/attr calls to enforce the constraints :-/
return self.session.query(JobResultRecord).where(JobResultRecord.job_id == job_id).first() | ||
|
||
def get_jobs_by_experiment_id(self, experiment_id: UUID) -> list[JobRecord]: | ||
return self.session.query(JobRecord).order_by(desc(JobRecord.created_at)).limit(2).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we use the experiment_id
here and instead limit the result set to the first two results?
@@ -196,6 +200,48 @@ def create_experiment( | |||
|
|||
return ExperimentResponse.model_validate(experiment_record) | |||
|
|||
def _get_experiment_jobs(self, experiment_id: UUID): | |||
records = self._job_service.job_repo.get_jobs_by_experiment_id(experiment_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've added the get_jobs_by_experiment_id
method in the JobResultRepository
class and you're trying to invoke it using a JobRepository
instance. This line doesn't work:
AttributeError: 'JobRepository' object has no attribute 'get_jobs_by_experiment_id'
I guess you'd want to move the get_jobs_by_experiment_id
method in the JobRepository
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opted to put this in the experiment repo since it's logically related to experiments and we don't instantiate a jobs repo in the experiment service - let me know what you think
Co-authored-by: Davide Eynard <[email protected]> Signed-off-by: Vicki Boykis <[email protected]>
Co-authored-by: Davide Eynard <[email protected]> Signed-off-by: Vicki Boykis <[email protected]>
records = self._job_service.job_repo.get_jobs_by_experiment_id(experiment_id) | ||
return records | ||
|
||
def get_experiment_result_download( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does having this here impact hitting the api through /experiments/{job_id} to get the results?
Any changes needed there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see these as two different routes in that case
with s3.open(f"{settings.S3_BUCKET}/{result_key}", "r") as f: | ||
job_results = json.loads(f.read()) | ||
|
||
# we just merge the two dictionaries for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what you're saying and since this PR belongs to the whole experiments_new body of work and we expected changes to the API for that post MVP, it may be worth discussing a different way of storing results.
WDYT?
@@ -73,6 +73,7 @@ class JobEvalLiteCreate(BaseModel): | |||
name: str | |||
description: str = "" | |||
model: str | |||
model_url: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure it should stay consistent between inference and evaluation jobs? I understand the awkward situation while the evaluator
combo is in the code, but we are phasing out at the same time evaluator
and (by extension) the current /experiments
at the same time, so I'm not sure all jobs need a model_url
, no?
What's changing
We currently download results per job. We'd like to implement all data results downloaded per experiment.
In order to make these changes, we need to change the jobs, experiments service, and related API calls.
Provide a clear and concise description of the content changes you're proposing. List all the changes you are making to the content.
See #572
How to test it
Steps to test the changes:
dialogsum
datasetYou should get two files
Additional notes for reviewers
I already...
/docs
)