-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First attempt at a parametrized JobCreate #740
base: main
Are you sure you want to change the base?
First attempt at a parametrized JobCreate #740
Conversation
cf6d9aa
to
d8ae072
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far! Made a code suggestion but looks like a very logical refactor. My only question would be whether you plan on addressing the custom logic of _get_job_params
in this PR. If you don't plan on addressing it here, can you make a separate issue to track elevating that out of the service layer?
The SDK needs to be updated. I have checked the backend unit and integration tests locally and they seem to work. |
72585b0
to
a17f040
Compare
The SDK and notebook tests have been updated. @veekaybee @aittalam I've changed the code of the notebook slightly. One important difference is that I have removed the model param in the eval lite job. AFAICT, it's not needed there. The notebook takes it from the initial model spec in the notebook and not from the output of the summarization job. Since the output is a csv, it didn't make sense to put the model there, but I'll check the results metadata. |
0c8ef33
to
4ca7e65
Compare
9159569
to
e6f13f3
Compare
ab23ff5
to
aa1f92a
Compare
c9357b2
to
293ae98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that this PR is dropping support for the JobType.EVALUATION, which is needed to support the current frontend design. I may misunderstand the code. Other than that, only minor comments. Thanks for the work on this! (Let me know about JobType.EVALUATION and then I'll approve once that's worked out).
inference_job_create_config_dict = job_create_request.job_config.dict() | ||
inference_job_create_config_dict["model"] = "hf://facebook/bart-large-cnn" | ||
inference_job_create_config_dict["output_field"] = "ground_truth" | ||
inference_job_create_config_dict["store_to_dataset"] = True | ||
inference_job_create_config_dict["job_type"] = JobType.INFERENCE | ||
|
||
inference_job_create_request_dict = job_create_request.model_dump() | ||
inference_job_create_request_dict["job_config"] = inference_job_create_config_dict | ||
|
||
inference_job_create_request = JobCreate(**inference_job_create_request_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code confuses me slightly: the job_create_request comes in as a JobCreate, then the code dumps it out of that class, changes a handful of content, and then re-puts it in as the same JobCreate class that it came in on, only now it has different params? Would it work instead to change the params on the job_create_request object? Something like `job_create_request.model = "hf://facebook/bart-large-cnn".
Or, did you mean for lin 82 to be inference_job_create_request = JobInferenceCreate(**inference_job_create_request_dict)
instead of JobCreate?
Now here comes a "scope creep" comment, since your changes don't introduce this error but it might be an easy one to fix. Could we have some sort of warning if the key was already set to something else? I'm especially thinking of the "model" parameter. Let's say I make a call to the backend and I want to use the ""hf://facebook/bart-base-cnn" model instead of the bart-large-cnn model. This code as it stands now would quietly overwrite my request settings. Might be nice to log some sort of warning that we are throwing away a request parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have some sort of warning if the key was already set to something else?
The idea would be to prevent setting any param that would not be acceptable for an Annotation job config. However, there are params like model that should perhaps be moved to the jobs config, since they don't deal with job lifecycle. Maybe as an interim measure we could log a warning until we settle on the schemas.
Conceptually, an annotation is a different kind of job, even though internally is also mapped to an inference (for example, a model cannot be set in an annotation since the annotation job is designed to use exclusively the BART model [please @ividal confirm]).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR: an annotation job is a constrained inference job (one where the user is not choosing model or parameters). We should at least be transparent about the constraints (from API to UI).
This is more a product decision than a technical one: Lumigator at this point is supposed to be opinionated about the model used to generate annotations. So an annotation job request should not make the user believe they can set any model they like, since it would be quietly ignored (replaced currently by BART, tomorrow who knows).
That's why I'd worry about the API offering (as this PR introduces):
{
"name": "string",
...
"job_config": {
"job_type": "evaluate", # ouch, minor gripe, but this is a default taken from swagger for an annotation job
"model": "string", # misleading: this would be promptly ignored by the backend
"model_url": "string", # same
"system_prompt": "string",
"skip_inference": false
}
}
What do you think - is there any reasonable way to avoid the confusions mentioned above (specially related to the model)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm not getting this, but the annotation does not allow a model to be set:
https://github.com/mozilla-ai/lumigator/pull/740/files#diff-851bf090de2849773e9c7486bebd855b586c8a24f6b20cdd2eb10d65139dd1c2R100-R103
If we are talking about the standalone EVALUATE job, then we should remove this one ASAP and port the current usage to INFERENCE+EVAL_LITE.
|
||
def create_job( | ||
self, | ||
request: JobEvalCreate | JobEvalLiteCreate | JobInferenceCreate, | ||
request: JobCreate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function dropping support for JobEval? From looking at the logic a few lines lower it seems like only JobType.EVALUATION_LITE and JobType.INFERENCE are supported, which will break the frontend API which is currently only using EVALUATION and not EVALUATION_LITE + INFERENCE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the current jobs are handled via the experiments endpoint, afaict, not directly via the jobs endpoint, so the UI should not see a difference.
inference_job_create_config_dict = job_create_request.job_config.dict() | ||
inference_job_create_config_dict["model"] = "hf://facebook/bart-large-cnn" | ||
inference_job_create_config_dict["output_field"] = "ground_truth" | ||
inference_job_create_config_dict["store_to_dataset"] = True | ||
inference_job_create_config_dict["job_type"] = JobType.INFERENCE | ||
|
||
inference_job_create_request_dict = job_create_request.model_dump() | ||
inference_job_create_request_dict["job_config"] = inference_job_create_config_dict | ||
|
||
inference_job_create_request = JobCreate(**inference_job_create_request_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR: an annotation job is a constrained inference job (one where the user is not choosing model or parameters). We should at least be transparent about the constraints (from API to UI).
This is more a product decision than a technical one: Lumigator at this point is supposed to be opinionated about the model used to generate annotations. So an annotation job request should not make the user believe they can set any model they like, since it would be quietly ignored (replaced currently by BART, tomorrow who knows).
That's why I'd worry about the API offering (as this PR introduces):
{
"name": "string",
...
"job_config": {
"job_type": "evaluate", # ouch, minor gripe, but this is a default taken from swagger for an annotation job
"model": "string", # misleading: this would be promptly ignored by the backend
"model_url": "string", # same
"system_prompt": "string",
"skip_inference": false
}
}
What do you think - is there any reasonable way to avoid the confusions mentioned above (specially related to the model)?
Thanks for this! One note, @javiermtorres this PR should still be in sync with the UI and keep an eye on how it interacts with /experiments. |
What's changing
The JobCreate schema is changed to include a separate specific job_config. The openapi produced includes a
oneOf
constraint:The jobs and experiments services are changed accordingly.
Closes #706
How to test it
Tests should run correctly.
Additional notes for reviewers
N/A
I already...
/docs
)