Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PVC as storage option for LM-Eval results #325

Open
ruivieira opened this issue Oct 17, 2024 · 2 comments
Open

Add PVC as storage option for LM-Eval results #325

ruivieira opened this issue Oct 17, 2024 · 2 comments
Assignees
Labels
kind/enhancement New feature or request lm-eval Issues related to LM-Eval
Milestone

Comments

@ruivieira
Copy link
Member

For large, full-results (not just summaries) storing the evaluation results in the CR's status is not practical.
An option should be available to persist full evaluation results to PVC.

@ruivieira ruivieira added kind/enhancement New feature or request lm-eval Issues related to LM-Eval labels Oct 17, 2024
@ruivieira ruivieira added this to the LM-Eval milestone Oct 17, 2024
@yhwang
Copy link
Collaborator

yhwang commented Oct 22, 2024

Here is my thought on this feature, suggestions/comments are welcome

  • provide an option to save the outputs, including stdout, stderr, results, and sampling log, to external storage
  • for external storage, we can support different types of storage, i.e. PVC, COS, etc. Let's add the PVC for now.

For the argument's sake, the new option is named Outputs and users can specify a full data struct of the PersistentVolumeClaim . LM-Eval controller would create the PVC and mount it on the /opt/app-root/src/output and all the outputs would store to the PVC. The data struct for the Outputs could be:

type Outputs struct {
	// +optional
	PersistentVolumeClaim *corev1.PersistentVolumeClaim `json:"pvc,omitempty"`
}

Only support PVC for now.

One thing worth mentioning is If users want to create a PVC by themselves and then use it in an LM-Eval job, they can do that already. Here is an example and the name of the PVC that has been created is mypvc:

apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: evaljob-sample
spec:
  model: hf
  modelArgs:
  - name: pretrained
    value: google/flan-t5-base
  taskList:
    taskRecipes:
    - card:
        name: "cards.wnli"
      template: "templates.classification.multi_class.relation.default"
  logSamples: true
  pod:
    volumes:
      - name: pvc
        persistentVolumeClaim:
          claimName: mypvc
    container:
      volumeMounts:
      - mountPath: "/opt/app-root/src/output"
        name: PVC

@yhwang
Copy link
Collaborator

yhwang commented Oct 23, 2024

If we want to make the case of using an existing PVC easier, the Outputs data struct could be:

type Outputs struct {
	// Use an existing PVC to store the outputs
	// +optional
	PersistentVolumeClaimName *string `json:"pvcName,omitempty"`
	// Create a PVC and use it to store the outputs
	// +optional
	PersistentVolumeClaim *corev1.PersistentVolumeClaim `json:"pvc,omitempty"`
}

When the pvcName is specified, the LM-Eval controller will create the corresponding volumes and volumeMOunts for the job pod and store the outputs in the PVC. Basically, it makes the LMEvalJob example in previous comment become this:

apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: evaljob-sample
spec:
  model: hf
  modelArgs:
  - name: pretrained
    value: google/flan-t5-base
  taskList:
    taskRecipes:
    - card:
        name: "cards.wnli"
      template: "templates.classification.multi_class.relation.default"
  logSamples: true
  outputs:
    pvcName: mypvc

@yhwang yhwang self-assigned this Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request lm-eval Issues related to LM-Eval
Projects
Status: In Progress
Development

No branches or pull requests

2 participants