First attempt at a parametrized JobCreate #740

javiermtorres · 2025-01-24T17:20:33Z

What's changing

The JobCreate schema is changed to include a separate specific job_config. The openapi produced includes a oneOf constraint:

      "JobCreate": {
        "properties": {
          "name": {
            "type": "string",
            "title": "Name"
          },
[...]
          "job_config": {
            "oneOf": [
              {
                "$ref": "#/components/schemas/JobEvalConfig"
              },
              {
                "$ref": "#/components/schemas/JobEvalLiteConfig"
              },
              {
                "$ref": "#/components/schemas/JobInferenceConfig"
              },
              {
                "$ref": "#/components/schemas/JobAnnotateConfig"
              }
            ],
            "title": "Job Config",
            "discriminator": {
              "propertyName": "job_type",
              "mapping": {
                "annotate": "#/components/schemas/JobAnnotateConfig",
                "eval_lite": "#/components/schemas/JobEvalLiteConfig",
                "evaluate": "#/components/schemas/JobEvalConfig",
                "inference": "#/components/schemas/JobInferenceConfig"
              }
            }
          }
        },

The jobs and experiments services are changed accordingly.

Closes #706

How to test it

Tests should run correctly.

Additional notes for reviewers

N/A

I already...

Tested the changes in a working environment to ensure they work as expected
Added some tests for any new functionality
Updated the documentation (both comments in code and product documentation under /docs)
Checked if a (backend) DB migration step was required and included it if required
- No DB migration needed

njbrake

Looking good so far! Made a code suggestion but looks like a very logical refactor. My only question would be whether you plan on addressing the custom logic of _get_job_params in this PR. If you don't plan on addressing it here, can you make a separate issue to track elevating that out of the service layer?

lumigator/python/mzai/backend/backend/api/routes/jobs.py

lumigator/python/mzai/backend/backend/services/jobs.py

javiermtorres · 2025-01-27T20:07:11Z

The SDK needs to be updated. I have checked the backend unit and integration tests locally and they seem to work.

javiermtorres · 2025-01-29T12:11:38Z

The SDK and notebook tests have been updated. @veekaybee @aittalam I've changed the code of the notebook slightly. One important difference is that I have removed the model param in the eval lite job. AFAICT, it's not needed there. The notebook takes it from the initial model spec in the notebook and not from the output of the summarization job. Since the output is a csv, it didn't make sense to put the model there, but I'll check the results metadata.

…cords

njbrake

My concern is that this PR is dropping support for the JobType.EVALUATION, which is needed to support the current frontend design. I may misunderstand the code. Other than that, only minor comments. Thanks for the work on this! (Let me know about JobType.EVALUATION and then I'll approve once that's worked out).

njbrake · 2025-02-04T19:59:35Z

lumigator/backend/backend/api/routes/jobs.py

+    inference_job_create_config_dict = job_create_request.job_config.dict()
+    inference_job_create_config_dict["model"] = "hf://facebook/bart-large-cnn"
+    inference_job_create_config_dict["output_field"] = "ground_truth"
+    inference_job_create_config_dict["store_to_dataset"] = True
+    inference_job_create_config_dict["job_type"] = JobType.INFERENCE
+
+    inference_job_create_request_dict = job_create_request.model_dump()
+    inference_job_create_request_dict["job_config"] = inference_job_create_config_dict
+
+    inference_job_create_request = JobCreate(**inference_job_create_request_dict)


This code confuses me slightly: the job_create_request comes in as a JobCreate, then the code dumps it out of that class, changes a handful of content, and then re-puts it in as the same JobCreate class that it came in on, only now it has different params? Would it work instead to change the params on the job_create_request object? Something like `job_create_request.model = "hf://facebook/bart-large-cnn".

Or, did you mean for lin 82 to be inference_job_create_request = JobInferenceCreate(**inference_job_create_request_dict) instead of JobCreate?

Now here comes a "scope creep" comment, since your changes don't introduce this error but it might be an easy one to fix. Could we have some sort of warning if the key was already set to something else? I'm especially thinking of the "model" parameter. Let's say I make a call to the backend and I want to use the ""hf://facebook/bart-base-cnn" model instead of the bart-large-cnn model. This code as it stands now would quietly overwrite my request settings. Might be nice to log some sort of warning that we are throwing away a request parameter.

Could we have some sort of warning if the key was already set to something else?

The idea would be to prevent setting any param that would not be acceptable for an Annotation job config. However, there are params like model that should perhaps be moved to the jobs config, since they don't deal with job lifecycle. Maybe as an interim measure we could log a warning until we settle on the schemas.

Conceptually, an annotation is a different kind of job, even though internally is also mapped to an inference (for example, a model cannot be set in an annotation since the annotation job is designed to use exclusively the BART model [please @ividal confirm]).

TLDR: an annotation job is a constrained inference job (one where the user is not choosing model or parameters). We should at least be transparent about the constraints (from API to UI).

This is more a product decision than a technical one: Lumigator at this point is supposed to be opinionated about the model used to generate annotations. So an annotation job request should not make the user believe they can set any model they like, since it would be quietly ignored (replaced currently by BART, tomorrow who knows).

That's why I'd worry about the API offering (as this PR introduces):

{ "name": "string", ... "job_config": { "job_type": "evaluate", # ouch, minor gripe, but this is a default taken from swagger for an annotation job "model": "string", # misleading: this would be promptly ignored by the backend "model_url": "string", # same "system_prompt": "string", "skip_inference": false } }

What do you think - is there any reasonable way to avoid the confusions mentioned above (specially related to the model)?

Maybe I'm not getting this, but the annotation does not allow a model to be set:
https://github.com/mozilla-ai/lumigator/pull/740/files#diff-851bf090de2849773e9c7486bebd855b586c8a24f6b20cdd2eb10d65139dd1c2R100-R103

If we are talking about the standalone EVALUATE job, then we should remove this one ASAP and port the current usage to INFERENCE+EVAL_LITE.

lumigator/backend/backend/services/jobs.py

njbrake · 2025-02-04T20:08:01Z

lumigator/backend/backend/services/jobs.py


    def create_job(
        self,
-        request: JobEvalCreate | JobEvalLiteCreate | JobInferenceCreate,
+        request: JobCreate,


Is this function dropping support for JobEval? From looking at the logic a few lines lower it seems like only JobType.EVALUATION_LITE and JobType.INFERENCE are supported, which will break the frontend API which is currently only using EVALUATION and not EVALUATION_LITE + INFERENCE

But the current jobs are handled via the experiments endpoint, afaict, not directly via the jobs endpoint, so the UI should not see a difference.

lumigator/backend/backend/services/jobs.py

ividal · 2025-02-05T19:11:49Z

lumigator/backend/backend/api/routes/jobs.py

+    inference_job_create_config_dict = job_create_request.job_config.dict()
+    inference_job_create_config_dict["model"] = "hf://facebook/bart-large-cnn"
+    inference_job_create_config_dict["output_field"] = "ground_truth"
+    inference_job_create_config_dict["store_to_dataset"] = True
+    inference_job_create_config_dict["job_type"] = JobType.INFERENCE
+
+    inference_job_create_request_dict = job_create_request.model_dump()
+    inference_job_create_request_dict["job_config"] = inference_job_create_config_dict
+
+    inference_job_create_request = JobCreate(**inference_job_create_request_dict)


TLDR: an annotation job is a constrained inference job (one where the user is not choosing model or parameters). We should at least be transparent about the constraints (from API to UI).

This is more a product decision than a technical one: Lumigator at this point is supposed to be opinionated about the model used to generate annotations. So an annotation job request should not make the user believe they can set any model they like, since it would be quietly ignored (replaced currently by BART, tomorrow who knows).

That's why I'd worry about the API offering (as this PR introduces):

{ "name": "string", ... "job_config": { "job_type": "evaluate", # ouch, minor gripe, but this is a default taken from swagger for an annotation job "model": "string", # misleading: this would be promptly ignored by the backend "model_url": "string", # same "system_prompt": "string", "skip_inference": false } }

What do you think - is there any reasonable way to avoid the confusions mentioned above (specially related to the model)?

ividal · 2025-02-05T19:20:28Z

Thanks for this! One note, @javiermtorres this PR should still be in sync with the UI and keep an eye on how it interacts with /experiments.

github-actions bot added backend api Changes which impact API/presentation layer schemas Changes to schemas (which may be public facing) labels Jan 24, 2025

javiermtorres force-pushed the javiermtorres/issue-706-organize-creation-records branch from cf6d9aa to d8ae072 Compare January 24, 2025 17:28

javiermtorres requested review from aittalam and njbrake January 24, 2025 17:28

njbrake reviewed Jan 24, 2025

View reviewed changes

lumigator/python/mzai/backend/backend/api/routes/jobs.py Outdated Show resolved Hide resolved

lumigator/python/mzai/backend/backend/api/routes/jobs.py Outdated Show resolved Hide resolved

lumigator/python/mzai/backend/backend/services/jobs.py Outdated Show resolved Hide resolved

github-actions bot added the sdk label Jan 28, 2025

javiermtorres force-pushed the javiermtorres/issue-706-organize-creation-records branch 2 times, most recently from 72585b0 to a17f040 Compare January 28, 2025 15:30

javiermtorres force-pushed the javiermtorres/issue-706-organize-creation-records branch 2 times, most recently from 0c8ef33 to 4ca7e65 Compare January 29, 2025 15:22

javiermtorres marked this pull request as ready for review January 29, 2025 15:55

javiermtorres force-pushed the javiermtorres/issue-706-organize-creation-records branch 2 times, most recently from 9159569 to e6f13f3 Compare January 31, 2025 11:43

njbrake mentioned this pull request Jan 31, 2025

Mlflow implementation of Tracking Interface #768

Merged

4 tasks

javiermtorres added 7 commits February 3, 2025 09:11

First attempt at a parametrized JobCreate

331e389

Replace templates with pydantic models

ea16039

Adapt SDK and SDK tests

e1a3f3b

Fix sdk unit tests

5e1e7bb

Fix notebook tests

741abfd

Fix tests

bd9ab8d

Fix job definition in workflows

aa1f92a

javiermtorres force-pushed the javiermtorres/issue-706-organize-creation-records branch from ab23ff5 to aa1f92a Compare February 3, 2025 08:51

javiermtorres added 2 commits February 3, 2025 12:30

Fix job unit test

9a1395c

Merge branch 'main' into javiermtorres/issue-706-organize-creation-re…

293ae98

…cords

javiermtorres force-pushed the javiermtorres/issue-706-organize-creation-records branch from c9357b2 to 293ae98 Compare February 4, 2025 16:20

Merge branch 'main' into javiermtorres/issue-706-organize-creation-re…

39666b2

…cords

Merge branch 'main' into javiermtorres/issue-706-organize-creation-re…

2767c81

…cords

javiermtorres requested review from njbrake and veekaybee February 4, 2025 19:22

njbrake reviewed Feb 4, 2025

View reviewed changes

ividal requested changes Feb 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First attempt at a parametrized JobCreate #740

First attempt at a parametrized JobCreate #740

javiermtorres commented Jan 24, 2025 •

edited

Loading

njbrake left a comment

javiermtorres commented Jan 27, 2025

javiermtorres commented Jan 29, 2025

njbrake left a comment

njbrake Feb 4, 2025

javiermtorres Feb 5, 2025

ividal Feb 5, 2025

javiermtorres Feb 7, 2025

njbrake Feb 4, 2025

javiermtorres Feb 5, 2025

ividal Feb 5, 2025

ividal commented Feb 5, 2025

First attempt at a parametrized JobCreate #740

Are you sure you want to change the base?

First attempt at a parametrized JobCreate #740

Conversation

javiermtorres commented Jan 24, 2025 • edited Loading

What's changing

How to test it

Additional notes for reviewers

I already...

njbrake left a comment

Choose a reason for hiding this comment

javiermtorres commented Jan 27, 2025

javiermtorres commented Jan 29, 2025

njbrake left a comment

Choose a reason for hiding this comment

njbrake Feb 4, 2025

Choose a reason for hiding this comment

javiermtorres Feb 5, 2025

Choose a reason for hiding this comment

ividal Feb 5, 2025

Choose a reason for hiding this comment

javiermtorres Feb 7, 2025

Choose a reason for hiding this comment

njbrake Feb 4, 2025

Choose a reason for hiding this comment

javiermtorres Feb 5, 2025

Choose a reason for hiding this comment

ividal Feb 5, 2025

Choose a reason for hiding this comment

ividal commented Feb 5, 2025

javiermtorres commented Jan 24, 2025 •

edited

Loading