Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge LM-Eval dev branch #337

Merged
merged 26 commits into from
Oct 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
7e1a712
Add lm-eval-service controller (#258)
yhwang Jul 24, 2024
5e853a1
fix: Fix typo in operator's arguments (#261)
ruivieira Jul 26, 2024
2173aae
feat: Add LMES driver build to GHA (#272)
ruivieira Aug 5, 2024
427d102
sync: sync dev/lm-eval with main branch (#271)
yhwang Aug 5, 2024
342d1e2
Weekly sync up of dev/lm-eval branch (#278)
yhwang Aug 23, 2024
f6d37ea
Driver updates job's status periodically (#280)
yhwang Aug 27, 2024
2767641
Add Dockerfile for LMES job image (#276)
yhwang Aug 27, 2024
f9c1284
feat: Add overlays (#283)
ruivieira Aug 29, 2024
df87ea2
Add job image build (#284)
ruivieira Aug 29, 2024
0d2393d
Change job image use midstream lm-evaluation-harness (#285)
ruivieira Aug 30, 2024
d2b9b2f
feat: support batch size (#290)
yhwang Sep 12, 2024
db7ae08
Add the `openai` package into the lmes job image (#292)
yhwang Sep 12, 2024
d9b5684
fix: fix dependency error in the job image (#296)
yhwang Sep 17, 2024
a626cf8
feat: add device detection in lmes driver (#298)
yhwang Sep 20, 2024
159842f
feat: support unitxt recipes (#301)
yhwang Sep 24, 2024
b2bec12
feat: support custom dataset (#309)
yhwang Oct 9, 2024
ab6bc98
feat: new pulling mechanism for job statuses (#314)
yhwang Oct 13, 2024
36c035a
Move operator's cmd/operator/main.go to cmd/main.go to keep operator-…
ruivieira Oct 14, 2024
1d3e882
Remove hardcoded job's user ID (#322)
ruivieira Oct 14, 2024
fe7c0bf
Fix mkdir command in Job dockerfile (#330)
RobGeada Oct 18, 2024
61744ff
Refactor some lmesreconcile methods (#323)
tedhtchang Oct 19, 2024
dc03620
tidy: clean up lmes-job image (#333)
yhwang Oct 19, 2024
b54e222
Enable job suspend for Kueue (#317)
tedhtchang Oct 21, 2024
faf468b
Add overlay placeholders for main merge (#334)
ruivieira Oct 21, 2024
471738b
sync: sync up dev/lm-eval branch with main branch (#336)
yhwang Oct 21, 2024
834829b
Merge branch 'main' into dev/lm-eval
ruivieira Oct 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 23 additions & 3 deletions .github/workflows/build-and-push.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ jobs:
echo "GITHUB.HEAD_REF: ${{ github.head_ref }}"
echo "SHA: ${{ github.event.pull_request.head.sha }}"
echo "MAIN IMAGE AT: ${{ vars.QUAY_RELEASE_REPO }}:latest"
echo "LMES DRIVER IMAGE AT: ${{ vars.QUAY_RELEASE_LMES_DRIVER_REPO }}:latest"
echo "LMES JOB IMAGE AT: ${{ vars.QUAY_RELEASE_LMES_JOB_REPO }}:latest"
echo "CI IMAGE AT: quay.io/trustyai/trustyai-service-operator-ci:${{ github.event.pull_request.head.sha }}"
#
# Set environments depending on context
Expand All @@ -64,27 +66,41 @@ jobs:
run: |
echo "TAG=${{ github.event.pull_request.head.sha }}" >> $GITHUB_ENV
echo "IMAGE_NAME=quay.io/trustyai/trustyai-service-operator-ci" >> $GITHUB_ENV
echo "DRIVER_IMAGE_NAME=quay.io/trustyai/ta-lmes-driver-ci" >> $GITHUB_ENV
echo "JOB_IMAGE_NAME=quay.io/trustyai/ta-lmes-job-ci" >> $GITHUB_ENV
- name: Set main-branch environment
if: env.BUILD_CONTEXT == 'main'
run: |
echo "TAG=latest" >> $GITHUB_ENV
echo "IMAGE_NAME=${{ vars.QUAY_RELEASE_REPO }}" >> $GITHUB_ENV
echo "DRIVER_IMAGE_NAME=${{ vars.QUAY_RELEASE_LMES_DRIVER_REPO }}" >> $GITHUB_ENV
echo "JOB_IMAGE_NAME=${{ vars.QUAY_RELEASE_LMES_JOB_REPO }}" >> $GITHUB_ENV
- name: Set tag environment
if: env.BUILD_CONTEXT == 'tag'
run: |
echo "TAG=${{ github.ref_name }}" >> $GITHUB_ENV
echo "IMAGE_NAME=${{ vars.QUAY_RELEASE_REPO }}" >> $GITHUB_ENV
echo "DRIVER_IMAGE_NAME=${{ vars.QUAY_RELEASE_LMES_DRIVER_REPO }}" >> $GITHUB_ENV
echo "JOB_IMAGE_NAME=${{ vars.QUAY_RELEASE_LMES_JOB_REPO }}" >> $GITHUB_ENV

# Run docker commands
# Run docker commands
- name: Put expiry date on CI-tagged image
if: env.BUILD_CONTEXT == 'ci'
run: sed -i 's#summary="odh-trustyai-service-operator\"#summary="odh-trustyai-service-operator" \\ \n quay.expires-after=7d#' Dockerfile
- name: Log in to Quay
run: docker login -u ${{ secrets.QUAY_ROBOT_USERNAME }} -p ${{ secrets.QUAY_ROBOT_SECRET }} quay.io
- name: Build image
- name: Build main image
run: docker build -t ${{ env.IMAGE_NAME }}:$TAG .
- name: Push to Quay CI repo
- name: Push main image to Quay
run: docker push ${{ env.IMAGE_NAME }}:$TAG
- name: Build LMES driver image
run: docker build -f Dockerfile.driver -t ${{ env.DRIVER_IMAGE_NAME }}:$TAG .
- name: Push LMES driver image to Quay
run: docker push ${{ env.DRIVER_IMAGE_NAME }}:$TAG
- name: Build LMES job image
run: docker build -f Dockerfile.lmes-job -t ${{ env.JOB_IMAGE_NAME }}:$TAG .
- name: Push LMES job image to Quay
run: docker push ${{ env.JOB_IMAGE_NAME }}:$TAG

# Create CI Manifests
- name: Set up manifests for CI
Expand Down Expand Up @@ -127,6 +143,10 @@ jobs:

📦 [PR image](https://quay.io/trustyai/trustyai-service-operator-ci:${{ github.event.pull_request.head.sha }}): `quay.io/trustyai/trustyai-service-operator-ci:${{ github.event.pull_request.head.sha }}`

📦 [LMES driver image](https://quay.io/trustyai/ta-lmes-driver:${{ github.event.pull_request.head.sha }}): `quay.io/trustyai/ta-lmes-driver:${{ github.event.pull_request.head.sha }}`

📦 [LMES job image](https://quay.io/trustyai/ta-lmes-job:${{ github.event.pull_request.head.sha }}): `quay.io/trustyai/ta-lmes-job:${{ github.event.pull_request.head.sha }}`

🗂️ [CI manifests](https://github.com/trustyai-explainability/trustyai-service-operator-ci/tree/operator-${{ env.TAG }})

```
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/controller-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@v4
with:
go-version: '1.19.0'
go-version: '1.21.12'

- name: Download & install envtest binaries
run: |
Expand Down
4 changes: 3 additions & 1 deletion .yamllint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ rules:
level: warning
hyphens:
max-spaces-after: 1
level: warning
level: warning
indentation:
indent-sequences: consistent
6 changes: 3 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Build the manager binary
FROM registry.access.redhat.com/ubi8/go-toolset:1.21 as builder
FROM registry.access.redhat.com/ubi8/go-toolset:1.21 AS builder
ARG TARGETOS
ARG TARGETARCH

Expand All @@ -12,7 +12,7 @@ COPY go.sum go.sum
RUN go mod download

# Copy the go source
COPY main.go main.go
COPY cmd/ cmd/
COPY api/ api/
COPY controllers/ controllers/

Expand All @@ -22,7 +22,7 @@ COPY controllers/ controllers/
# the docker BUILDPLATFORM arg will be linux/arm64 when for Apple x86 it will be linux/amd64. Therefore,
# by leaving it empty we can ensure that the container and binary shipped on it will have the same platform.
USER root
RUN CGO_ENABLED=0 GOOS=${TARGETOS:-linux} GOARCH=${TARGETARCH} go build -a -o manager main.go
RUN CGO_ENABLED=0 GOOS=${TARGETOS:-linux} GOARCH=${TARGETARCH} go build -a -o manager cmd/main.go

# Use distroless as minimal base image to package the manager binary
# Refer to https://github.com/GoogleContainerTools/distroless for more details
Expand Down
25 changes: 25 additions & 0 deletions Dockerfile.driver
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
FROM registry.access.redhat.com/ubi8/go-toolset:1.21 AS builder

WORKDIR /go/src/github.com/trustyai-explainability/trustyai-service-operator
# Copy the Go Modules manifests
COPY go.mod go.mod
COPY go.sum go.sum
# cache deps before building and copying source so that we don't need to re-download as much
# and so that source changes don't invalidate our downloaded layer
RUN go mod download
# Copy the go source
COPY cmd/ cmd/
COPY api/ api/
COPY controllers/ controllers/

RUN GO111MODULE=on CGO_ENABLED=0 GOOS=linux go build -tags netgo -ldflags '-extldflags "-static"' -o /bin/driver ./cmd/lmes_driver/*.go

FROM registry.access.redhat.com/ubi8/ubi-minimal:latest

COPY --from=builder /bin/driver /bin/driver

USER 65532:65532

WORKDIR /bin

ENTRYPOINT [ "/bin/driver" ]
27 changes: 27 additions & 0 deletions Dockerfile.lmes-job
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
FROM registry.access.redhat.com/ubi9/python-311@sha256:fccda5088dd13d2a3f2659e4c904beb42fc164a0c909e765f01af31c58affae3

USER root
RUN sed -i.bak 's/include-system-site-packages = false/include-system-site-packages = true/' /opt/app-root/pyvenv.cfg

USER default
WORKDIR /opt/app-root/src
RUN mkdir /opt/app-root/src/hf_home && chmod g+rwx /opt/app-root/src/hf_home
RUN mkdir /opt/app-root/src/output && chmod g+rwx /opt/app-root/src/output
RUN mkdir /opt/app-root/src/my_tasks && chmod g+rwx /opt/app-root/src/my_tasks
RUN mkdir -p /opt/app-root/src/my_catalogs/cards && chmod -R g+rwx /opt/app-root/src/my_catalogs
RUN mkdir -p /opt/app-root/src/.cache
ENV PATH="/opt/app-root/bin:/opt/app-root/src/.local/bin/:/opt/app-root/src/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

# Clone the Git repository, check out v0.4.4 and install the Python package
RUN git clone https://github.com/opendatahub-io/lm-evaluation-harness.git && \
cd lm-evaluation-harness && git checkout 543617fef9ba885e87f8db8930fbbff1d4e2ca49 && \
pip install --no-cache-dir --user -e .[api]

RUN python -c 'from lm_eval.tasks.unitxt import task; import os.path; print("class: !function " + task.__file__.replace("task.py", "task.Unitxt"))' > ./my_tasks/unitxt

ENV PYTHONPATH=/opt/app-root/src/.local/lib/python3.11/site-packages:/opt/app-root/src/lm-evaluation-harness:/opt/app-root/src:/opt/app-root/src/server
ENV HF_HOME=/opt/app-root/src/hf_home
ENV UNITXT_ARTIFACTORIES=/opt/app-root/src/my_catalogs

CMD ["/opt/app-root/bin/python"]

9 changes: 6 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ VERSION ?= 1.17.0

BUILD_TOOL ?= podman

# enable TrustyAIService by default for `make run`
ENABLED_SERVICES ?= TAS

# CHANNELS define the bundle channels used in the bundle.
# Add a new line here if you would like to change its default config. (E.g CHANNELS = "candidate,fast,stable")
# To re-generate a bundle for other specific channels without changing the standard setup, you can:
Expand Down Expand Up @@ -111,11 +114,11 @@ test: manifests generate fmt vet envtest ## Run tests.

.PHONY: build
build: manifests generate fmt vet ## Build manager binary.
go build -o bin/manager main.go
go build -o bin/manager cmd/main.go

.PHONY: run
run: manifests generate fmt vet ## Run a controller from your host.
go run ./main.go
go run ./cmd/main.go --enable-services $(ENABLED_SERVICES)

# If you wish built the manager image targeting other platforms you can use the --platform flag.
# (i.e. docker build --platform linux/arm64 ). However, you must enable docker buildKit for it.
Expand Down Expand Up @@ -182,7 +185,7 @@ ENVTEST ?= $(LOCALBIN)/setup-envtest

## Tool Versions
KUSTOMIZE_VERSION ?= v3.8.7
CONTROLLER_TOOLS_VERSION ?= v0.11.1
CONTROLLER_TOOLS_VERSION ?= v0.16.3

KUSTOMIZE_INSTALL_SCRIPT ?= "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh"
.PHONY: kustomize
Expand Down
17 changes: 15 additions & 2 deletions PROJECT
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# More info: https://book.kubebuilder.io/reference/project-config.html
domain: opendatahub.io
layout:
- go.kubebuilder.io/v3
- go.kubebuilder.io/v4
plugins:
manifests.sdk.operatorframework.io/v2: {}
scorecard.sdk.operatorframework.io/v2: {}
Expand All @@ -18,6 +18,19 @@ resources:
domain: opendatahub.io
group: trustyai
kind: TrustyAIService
path: github.com/trustyai-explainability/trustyai-service-operator/api/v1alpha1
path: github.com/trustyai-explainability/trustyai-service-operator/api/tas/v1alpha1
version: v1alpha1
- api:
crdVersion: v1
namespaced: true
controller: true
domain: opendatahub.io
group: trustyai
kind: LMEvalJob
path: github.com/trustyai-explainability/trustyai-service-operator/api/lmes/v1alpha1
version: v1alpha1
webhooks:
defaulting: true
validation: true
webhookVersion: v1
version: "3"
43 changes: 43 additions & 0 deletions api/lmes/v1alpha1/groupversion_info.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
/*
Copyright 2024.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

// Package v1alpha1 contains API Schema definitions for the trustyai.opendatahub.io v1alpha1 API group
// +kubebuilder:object:generate=true
// +groupName=trustyai.opendatahub.io
package v1alpha1

import (
"k8s.io/apimachinery/pkg/runtime/schema"
"sigs.k8s.io/controller-runtime/pkg/scheme"
)

const (
GroupName = "trustyai.opendatahub.io"
Version = "v1alpha1"
KindName = "LMEvalJob"
FinalizerName = "trustyai.opendatahub.io/lmes-finalizer"
)

var (
// GroupVersion is group version used to register these objects
GroupVersion = schema.GroupVersion{Group: GroupName, Version: Version}

// SchemeBuilder is used to add go types to the GroupVersionKind scheme
SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion}

// AddToScheme adds the types in this group-version to the given scheme.
AddToScheme = SchemeBuilder.AddToScheme
)
Loading
Loading