Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: switch to numeric user instead of default #20

Merged
merged 1 commit into from
Feb 12, 2025

Conversation

jbusche
Copy link

@jbusche jbusche commented Feb 8, 2025

Description

Closes issue #19
This changes the dockerfile to match the dockerfile in ODH so that the "default" user is changed to a numeric value so that the pods can run successfully in the default namespace without error.

How Has This Been Tested?

I made the change and built the image and tried deploying this new image on my OpenShift 4.17.9 cluster and it works.

Specifically:

  1. I added USER 65532:65532 to the Dockerfile.lmes-job
  2. I built the new image with: docker build -f Dockerfile.lmes-job -t jbusche-lmes-pod-img:fixed2 .
  3. Pushed the image to quay.io
  4. Changed the trustyai-service-operator-config cm to point to my new image
  5. Deployed a sample online test:
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: "jim-default-lmeval-glue"
  namespace: default
spec:
  allowOnline: true
  allowCodeExecution: true
  model: hf
  modelArgs:
  - name: pretrained
    value: google/flan-t5-base
  taskList:
    taskRecipes:
    - card:
        name: "cards.wnli"
      template: "templates.classification.multi_class.relation.default"
  logSamples: true

and it's running fine now in default namespace:

2025-02-08:19:10:10,904 INFO     [evaluator.py:489] Running generate_until requests
Running generate_until requests:   1%|▏         | 1/71 [00:39<46:23, 39.77s/it]
Running generate_until requests:  13%|█▎        | 9/71 [01:16<07:40,  7.43s/it]
Running generate_until requests:  24%|██▍       | 17/71 [01:44<04:41,  5.20s/it]
Running generate_until requests:  35%|███▌      | 25/71 [02:13<03:27,  4.50s/it]
Running generate_until requests:  46%|████▋     | 33/71 [02:47<02:46,  4.37s/it]
Running generate_until requests:  58%|█████▊    | 41/71 [03:20<02:09,  4.31s/it]

Instead of getting a CreateContainerConfigError

I see it's completed now:

oc get pods
NAME                        READY   STATUS      RESTARTS   AGE
jim-default-lmeval-glue     0/1     Completed   0          6m4s

This is on an OpenShift 4.17.9 FIPS cluster

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Signed-off-by: James Busche <[email protected]>
@ruivieira ruivieira self-assigned this Feb 10, 2025
@ruivieira ruivieira added the kind/bug Something isn't working label Feb 10, 2025
@ruivieira ruivieira self-requested a review February 12, 2025 17:23
@ruivieira ruivieira changed the title switch to numeric user instead of default fix: switch to numeric user instead of default Feb 12, 2025
@ruivieira ruivieira merged commit 852c95b into opendatahub-io:release-0.4.6 Feb 12, 2025
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants