Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge LM-Eval dev branch #337

Merged
merged 26 commits into from
Oct 22, 2024
Merged

Merge LM-Eval dev branch #337

merged 26 commits into from
Oct 22, 2024

Commits on Jul 24, 2024

  1. Add lm-eval-service controller (#258)

    * feat: Initial database support (#246)
    
    * Initial database support
    
    - Add status checking
    - Add better storage flags
    - Add spec.storage.format validation
    - Add DDL
    -Add HIBERNATE format to DB (test)
    - Update service image
    - Revert identifier to DATABASE
    - Update CR options (remove mandatory data)
    
    * Remove default DDL generation env var
    
    * Update service image to latest tag
    
    * Add migration awareness
    
    * Add updating pods for migration
    
    * Change JDBC url from mysql to mariadb
    
    * Fix TLS mount
    
    * Revert images
    
    * Remove redundant logic
    
    * Fix comments
    
    * feat: Add TLS certificate mount on ModelMesh (#255)
    
    * feat: Add TLS certificate mount on ModelMesh
    
    * Revert from http to https until kserve/modelmesh#147 is merged
    
    * Add lm-eval-service controller
    
    refactor the existing TrustyAIService controller and
    add LMEvalService controller
    
    Signed-off-by: Yihong Wang <[email protected]>
    
    ---------
    
    Signed-off-by: Yihong Wang <[email protected]>
    Co-authored-by: Rui Vieira <[email protected]>
    yhwang and ruivieira authored Jul 24, 2024
    Configuration menu
    Copy the full SHA
    7e1a712 View commit details
    Browse the repository at this point in the history

Commits on Jul 26, 2024

  1. fix: Fix typo in operator's arguments (#261)

    Operator's arguments changed from `--eanble-services` to `--enable-services`.
    trustyai.opendatahub.io_lmevaljobs.yaml and zz_generated.deepcopy.go regenerated.
    ruivieira authored Jul 26, 2024
    Configuration menu
    Copy the full SHA
    5e853a1 View commit details
    Browse the repository at this point in the history

Commits on Aug 5, 2024

  1. Configuration menu
    Copy the full SHA
    2173aae View commit details
    Browse the repository at this point in the history
  2. sync: sync dev/lm-eval with main branch (#271)

    * feat: Initial database support (#246)
    
    * Initial database support
    
    - Add status checking
    - Add better storage flags
    - Add spec.storage.format validation
    - Add DDL
    -Add HIBERNATE format to DB (test)
    - Update service image
    - Revert identifier to DATABASE
    - Update CR options (remove mandatory data)
    
    * Remove default DDL generation env var
    
    * Update service image to latest tag
    
    * Add migration awareness
    
    * Add updating pods for migration
    
    * Change JDBC url from mysql to mariadb
    
    * Fix TLS mount
    
    * Revert images
    
    * Remove redundant logic
    
    * Fix comments
    
    * feat: Add TLS certificate mount on ModelMesh (#255)
    
    * feat: Add TLS certificate mount on ModelMesh
    
    * Revert from http to https until kserve/modelmesh#147 is merged
    
    * Pin oc version, ubi version (#263)
    
    * Restore checkout of trustyai-exp (#265)
    
    * Add operator installation robustness (#266)
    
    * fix: Skip InferenceService patching for KServe RawDeployment (#262)
    
    * feat: ConfigMap key to disable KServe Serverless configuration (#267)
    
    * feat: Add support for custom certificates in database connection (#259)
    
    * Add TLS endpoint for ModelMesh payload processors. (#268)
    
    Keep non-TLS endpoint for KServe Serverless (disabled by default)
    
    ---------
    
    Signed-off-by: Yihong Wang <[email protected]>
    Co-authored-by: Rui Vieira <[email protected]>
    Co-authored-by: Rob Geada <[email protected]>
    3 people authored Aug 5, 2024
    Configuration menu
    Copy the full SHA
    427d102 View commit details
    Browse the repository at this point in the history

Commits on Aug 23, 2024

  1. Weekly sync up of dev/lm-eval branch (#278)

    * feat: Initial database support (#246)
    
    * Initial database support
    
    - Add status checking
    - Add better storage flags
    - Add spec.storage.format validation
    - Add DDL
    -Add HIBERNATE format to DB (test)
    - Update service image
    - Revert identifier to DATABASE
    - Update CR options (remove mandatory data)
    
    * Remove default DDL generation env var
    
    * Update service image to latest tag
    
    * Add migration awareness
    
    * Add updating pods for migration
    
    * Change JDBC url from mysql to mariadb
    
    * Fix TLS mount
    
    * Revert images
    
    * Remove redundant logic
    
    * Fix comments
    
    * feat: Add TLS certificate mount on ModelMesh (#255)
    
    * feat: Add TLS certificate mount on ModelMesh
    
    * Revert from http to https until kserve/modelmesh#147 is merged
    
    * Pin oc version, ubi version (#263)
    
    * Restore checkout of trustyai-exp (#265)
    
    * Add operator installation robustness (#266)
    
    * fix: Skip InferenceService patching for KServe RawDeployment (#262)
    
    * feat: ConfigMap key to disable KServe Serverless configuration (#267)
    
    * feat: Add support for custom certificates in database connection (#259)
    
    * Add TLS endpoint for ModelMesh payload processors. (#268)
    
    Keep non-TLS endpoint for KServe Serverless (disabled by default)
    
    * fix: Correct maxSurge and maxUnavailable (#275)
    
    * feat: Add support for custom DB names (#257)
    
    * feat: Add support for custom DB names
    
    * fix: Correct custom DB name
    
    ---------
    
    Signed-off-by: Yihong Wang <[email protected]>
    Co-authored-by: Rui Vieira <[email protected]>
    Co-authored-by: Rob Geada <[email protected]>
    3 people authored Aug 23, 2024
    Configuration menu
    Copy the full SHA
    342d1e2 View commit details
    Browse the repository at this point in the history

Commits on Aug 27, 2024

  1. Driver updates job's status periodically (#280)

    The driver periodically update the LMEvalJob.Status.Message field
    with the outputs from the lm-eval. The message pattern the driver
    captures is like `Running text generation:  81%|`. Then users
    can use this information to check the progress of the job.
    
    Signed-off-by: Yihong Wang <[email protected]>
    yhwang authored Aug 27, 2024
    Configuration menu
    Copy the full SHA
    f6d37ea View commit details
    Browse the repository at this point in the history
  2. Add Dockerfile for LMES job image (#276)

    Add Dockerfile for LMES job image and the needed files
    
    Signed-off-by: Yihong Wang <[email protected]>
    yhwang authored Aug 27, 2024
    Configuration menu
    Copy the full SHA
    2767641 View commit details
    Browse the repository at this point in the history

Commits on Aug 29, 2024

  1. feat: Add overlays (#283)

    * feat: Add overlays
    
    * Remove redundant lmes-tas overlay. Change job image name.
    ruivieira authored Aug 29, 2024
    Configuration menu
    Copy the full SHA
    f9c1284 View commit details
    Browse the repository at this point in the history
  2. Add job image build (#284)

    ruivieira authored Aug 29, 2024
    Configuration menu
    Copy the full SHA
    df87ea2 View commit details
    Browse the repository at this point in the history

Commits on Aug 30, 2024

  1. Configuration menu
    Copy the full SHA
    0d2393d View commit details
    Browse the repository at this point in the history

Commits on Sep 12, 2024

  1. feat: support batch size (#290)

    Add batch size support in the LMEvalJob which
    leverages the `--batch_size` in the `lm-evaluation-harness`.
    This only affects the local models. The `--bath_size` doesn't
    work for remote inference APIs.
    
    Signed-off-by: Yihong Wang <[email protected]>
    yhwang authored Sep 12, 2024
    Configuration menu
    Copy the full SHA
    d2b9b2f View commit details
    Browse the repository at this point in the history
  2. Add the openai package into the lmes job image (#292)

    update the LMES job's Dockerfile to include the
    `openai` package.
    
    Signed-off-by: Yihong Wang <[email protected]>
    yhwang authored Sep 12, 2024
    Configuration menu
    Copy the full SHA
    db7ae08 View commit details
    Browse the repository at this point in the history

Commits on Sep 17, 2024

  1. fix: fix dependency error in the job image (#296)

    Split up the unitxt and openai dependencies to
    avoid the conflict.
    
    Signed-off-by: Yihong Wang <[email protected]>
    yhwang authored Sep 17, 2024
    Configuration menu
    Copy the full SHA
    d9b5684 View commit details
    Browse the repository at this point in the history

Commits on Sep 20, 2024

  1. feat: add device detection in lmes driver (#298)

    Added a new feature in LMES driver to detect the available
    devices by using the PyTorch API. This feature can be disabled
    by passing the `--detect-device false` option.
    
    Signed-off-by: Yihong Wang <[email protected]>
    yhwang authored Sep 20, 2024
    Configuration menu
    Copy the full SHA
    a626cf8 View commit details
    Browse the repository at this point in the history

Commits on Sep 24, 2024

  1. feat: support unitxt recipes (#301)

    Add new fields in the CRD to support unitxt recipes and
    leverage the driver to create corresponding yaml files
    of the unitxt recipes.
    
    Signed-off-by: Yihong Wang <[email protected]>
    yhwang authored Sep 24, 2024
    Configuration menu
    Copy the full SHA
    159842f View commit details
    Browse the repository at this point in the history

Commits on Oct 9, 2024

  1. feat: support custom dataset (#309)

    Updated the CRD data struct to allow users to specify a custom Unitxt card in
    JSON format. The custom Unitxt card is equivalent to a custom dataset
    definition. Also restructured and updated the CRD to support Volumes,
    VolumeMounts, Env, Resources, Labels, and Annotations.
    
    Signed-off-by: Yihong Wang <[email protected]>
    yhwang authored Oct 9, 2024
    Configuration menu
    Copy the full SHA
    b2bec12 View commit details
    Browse the repository at this point in the history

Commits on Oct 13, 2024

  1. feat: new pulling mechanism for job statuses (#314)

    Update the driver to keep running even the user program
    finishes. The driver provides two APIs:
    - GetStatus(): retrieve job status
    - Shutdown(): properly tear down the driver
    
    In the controller side, it uses `pod/exec` resource
    to run the driver command to invoke the driver APIs
    to retrieve the job status and shutdown the driver
    when job is done.
    
    Signed-off-by: Yihong Wang <[email protected]>
    yhwang authored Oct 13, 2024
    Configuration menu
    Copy the full SHA
    ab6bc98 View commit details
    Browse the repository at this point in the history

Commits on Oct 14, 2024

  1. Configuration menu
    Copy the full SHA
    36c035a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1d3e882 View commit details
    Browse the repository at this point in the history

Commits on Oct 18, 2024

  1. Configuration menu
    Copy the full SHA
    fe7c0bf View commit details
    Browse the repository at this point in the history

Commits on Oct 19, 2024

  1. Refactor some lmesreconcile methods (#323)

    * Refactor lmes reconcile  optoins
    
    Signed-off-by: ted chang <[email protected]>
    
    * Update controllers/lmes/lmevaljob_controller.go
    
    Co-authored-by: Yihong Wang <[email protected]>
    
    * Update controllers/lmes/lmevaljob_controller.go
    
    Co-authored-by: Yihong Wang <[email protected]>
    Signed-off-by: ted chang <[email protected]>
    
    ---------
    
    Signed-off-by: ted chang <[email protected]>
    Co-authored-by: Yihong Wang <[email protected]>
    tedhtchang and yhwang authored Oct 19, 2024
    Configuration menu
    Copy the full SHA
    61744ff View commit details
    Browse the repository at this point in the history
  2. tidy: clean up lmes-job image (#333)

    remove BAM related packages and patch.
    
    Signed-off-by: Yihong Wang <[email protected]>
    yhwang authored Oct 19, 2024
    Configuration menu
    Copy the full SHA
    dc03620 View commit details
    Browse the repository at this point in the history

Commits on Oct 21, 2024

  1. Enable job suspend for Kueue (#317)

    * Refactor lmes reconcile  optoins
    
    Signed-off-by: ted chang <[email protected]>
    
    * Update controllers/lmes/lmevaljob_controller.go
    
    Co-authored-by: Yihong Wang <[email protected]>
    
    * Update controllers/lmes/lmevaljob_controller.go
    
    Co-authored-by: Yihong Wang <[email protected]>
    Signed-off-by: ted chang <[email protected]>
    
    * Enable job suspend for Kueue
    
    Signed-off-by: ted chang <[email protected]>
    
    ---------
    
    Signed-off-by: ted chang <[email protected]>
    Co-authored-by: Yihong Wang <[email protected]>
    tedhtchang and yhwang authored Oct 21, 2024
    Configuration menu
    Copy the full SHA
    b54e222 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    faf468b View commit details
    Browse the repository at this point in the history
  3. sync: sync up dev/lm-eval branch with main branch (#336)

    * [CI] Run tests from trustyai-tests (#279)
    
    * Change Dockerfile to clone trustyai-tests
    
    * Add PYTEST_MARKERS env and remove TESTS_REGEX
    
    * RHOAIENG-12274: Update operator's overlays (#287)
    
    * Update operator's overlays
    
    * Update kustomization.yaml
    
    * Add devflag printout to GH Action comment (#289)
    
    * Add timeout loop to DSC install (#305)
    
    * RHOAIENG-13625: Add DBAvailable status to CR (#304)
    
    * Add DBAvailable status to CR
    
    * Remove probes
    
    * Add KServe destination rule for Inference Services in the ServiceMesh (#315)
    
    * Add DestinationRule creation for KServe serverless
    
    * Add permissions for destination rules
    
    * Add role for destination rules
    
    * Add missing role for creating destination rules
    
    * Fix spacing in DestinationRule template
    
    * Add check if DestinationRule CRD is present before creating it (#316)
    
    * Add check for DestinationRule CRD
    
    * Add API extensions to operator's scheme
    
    * Add permission for CRD resource
    
    * Fix operator metrics service target port (#320)
    
    * Add readiness probes (#312)
    
    * Enable KServe serverless in the rhoai overlay (#321)
    
    * Update overlay images (#331)
    
    * Add correct CA cert to JDBC (#324)
    
    * Add correct CA cert to JDBC
    
    * Add require SSL
    
    * Support for VirtualServices for InferenceLogger traffic (#332)
    
    * Generate KServe Inference Logger in conformance with DestinationRule and VirtualService
    
    * Add VirtualService creation for models in the mesh
    
    * Add permissions for VirtualServices
    
    * Update manifests for VirtualServices
    
    * Fix VirtualServiceName variable
    
    * fix yaml linter after the sync
    
    Signed-off-by: Yihong Wang <[email protected]>
    
    * tidy the go.mod and go.sum as well
    
    Signed-off-by: Yihong Wang <[email protected]>
    
    ---------
    
    Signed-off-by: Yihong Wang <[email protected]>
    Co-authored-by: Adolfo Aguirrezabal <[email protected]>
    Co-authored-by: Rui Vieira <[email protected]>
    Co-authored-by: Rob Geada <[email protected]>
    Co-authored-by: Rui Vieira <[email protected]>
    5 people authored Oct 21, 2024
    Configuration menu
    Copy the full SHA
    471738b View commit details
    Browse the repository at this point in the history

Commits on Oct 22, 2024

  1. Configuration menu
    Copy the full SHA
    834829b View commit details
    Browse the repository at this point in the history