Skip to content

Commit

Permalink
Update tutorial.md
Browse files Browse the repository at this point in the history
add examples
  • Loading branch information
yhwang authored Oct 11, 2024
1 parent 2183f4e commit 7897e74
Showing 1 changed file with 140 additions and 2 deletions.
142 changes: 140 additions & 2 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,6 @@ Here is an example of an LMEvalJob object:
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
labels:
name: evaljob-sample
spec:
model: hf
Expand All @@ -59,7 +58,8 @@ spec:
value: google/flan-t5-base
taskList:
taskRecipes:
- card: "cards.wnli"
- card:
name: "cards.wnli"
template: "templates.classification.multi_class.relation.default"
logSamples: true
```
Expand Down Expand Up @@ -165,3 +165,141 @@ In this section, let's review each property in the LMEvalJob and its usage.
- `resources`: Specify the resources for the lm-eval container.
- `volumes`: Specify the volume information for the lm-eval and other containers. It uses the `Volume` data structure of kubernetes.
- `sideCars`: A list of containers that run along with the lm-eval container. It uses the `Container` data structure of kubernetes.

## Examples

### Environment Variables
If the LMEvalJob needs to access a model on HuggingFace with the access token, you can set up the `HF_TOKEN` as one of the environment variables
for the lm-eval container:
```
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
name: evaljob-sample
spec:
model: hf
modelArgs:
- name: pretrained
value: huggingfacespace/model
taskList:
taskNames:
- unfair_tos
logSamples: true
pod:
container:
env:
- name: HF_TOKEN
value: "My HuggingFace token"
```

Or you can create a secret to store the token and refer the key from the secret object using the reference syntax:
(only attach the env part)
```
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: my-secret
key: hf-token
```

### Custom Unitxt Card

Pass a custom Unitxt Card in JSON format:
```
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
name: evaljob-sample
spec:
model: hf
modelArgs:
- name: pretrained
value: google/flan-t5-base
taskList:
taskRecipes:
- template: "templates.classification.multi_class.relation.default"
card:
custom: |
{
"__type__": "task_card",
"loader": {
"__type__": "load_hf",
"path": "glue",
"name": "wnli"
},
"preprocess_steps": [
{
"__type__": "split_random_mix",
"mix": {
"train": "train[95%]",
"validation": "train[5%]",
"test": "validation"
}
},
{
"__type__": "rename",
"field": "sentence1",
"to_field": "text_a"
},
{
"__type__": "rename",
"field": "sentence2",
"to_field": "text_b"
},
{
"__type__": "map_instance_values",
"mappers": {
"label": {
"0": "entailment",
"1": "not entailment"
}
}
},
{
"__type__": "set",
"fields": {
"classes": [
"entailment",
"not entailment"
]
}
},
{
"__type__": "set",
"fields": {
"type_of_relation": "entailment"
}
},
{
"__type__": "set",
"fields": {
"text_a_type": "premise"
}
},
{
"__type__": "set",
"fields": {
"text_b_type": "hypothesis"
}
}
],
"task": "tasks.classification.multi_class.relation",
"templates": "templates.classification.multi_class.relation.all"
}
logSamples: true
```

Inside the custom card, it uses the HuggingFace dataset loader:
```
"loader": {
"__type__": "load_hf",
"path": "glue",
"name": "wnli"
},
```
You can use other [loaders](https://www.unitxt.ai/en/latest/unitxt.loaders.html#module-unitxt.loaders)
and use the `volumes` and `volumeMounts` to mount the dataset from persistent volumes. For example, if you
use [LoadCSV](https://www.unitxt.ai/en/latest/unitxt.loaders.html#unitxt.loaders.LoadCSV), you need to mount the
files to the container and make the dataset accessible for the evaluation process.

0 comments on commit 7897e74

Please sign in to comment.