From aca503338add63110792c2314f468f6643847131 Mon Sep 17 00:00:00 2001
From: Anh-Uong <anh.uong@ibm.com>
Date: Mon, 10 Jun 2024 10:20:14 -0600
Subject: [PATCH 1/2] bloom model can't run with flash-attn

Signed-off-by: Anh-Uong <anh.uong@ibm.com>
---
 examples/kfto-kueue-sft-trainer.yaml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/examples/kfto-kueue-sft-trainer.yaml b/examples/kfto-kueue-sft-trainer.yaml
index 146e9d27..a8af4976 100644
--- a/examples/kfto-kueue-sft-trainer.yaml
+++ b/examples/kfto-kueue-sft-trainer.yaml
@@ -15,7 +15,8 @@ data:
       "gradient_accumulation_steps": 4,
       "learning_rate": 1e-05,
       "response_template": "\n### Label:",
-      "dataset_text_field": "output"
+      "dataset_text_field": "output",
+      "use_flash_attn": false
     }
 ---
 apiVersion: "kubeflow.org/v1"

From fe43108cfb2563abb8a4380dc570c9d0ecea483c Mon Sep 17 00:00:00 2001
From: Sukriti Sharma <Ssukriti@users.noreply.github.com>
Date: Mon, 10 Jun 2024 17:02:36 -0600
Subject: [PATCH 2/2] Update README.md for Lora modules (#174)

Signed-off-by: Sukriti Sharma <Ssukriti@users.noreply.github.com>
---
 README.md | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 86c4eccf..e29f4544 100644
--- a/README.md
+++ b/README.md
@@ -287,6 +287,11 @@ For example for LLaMA model the modules look like:
 
 You can specify attention or linear layers. With the CLI, you can specify layers with `--target_modules "q_proj" "v_proj" "k_proj" "o_proj"` or `--target_modules "all-linear"`.
 
+#### Recommended target modules per model architecture 
+As per [LoRA paper](https://arxiv.org/pdf/2106.09685), section 4.2 , by using the query and value projection matrices, we can achieve reasonable quality with efficient GPU utilization. Hence, while thinking about what LoRA adapters to specify, we recommend starting with query and value matrices. You could also refer to the defaults specified by PEFT library for popular model architectures in section [TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING](https://github.com/huggingface/peft/blob/7b1c08d2b5e13d3c99b7d6ee83eab90e1216d4ba/src/peft/utils/constants.py#L70) as a good starting point. 
+
+_________________________
+
 ### Prompt Tuning:
 
 Specify `peft_method` to `'pt'` . You can additionally pass any arguments from [PromptTuningConfig](https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/peft_config.py#L63).
@@ -446,4 +451,4 @@ The above runs several tasks with `hendrycksTest-*` being MMLU.
 
 [Prompt Tuning on Twitter Complaints](examples/prompt_tuning_twitter_complaints/README.md)
 
-A good simple example can be found [here](examples/kfto-kueue-sft-trainer.yaml) which launches a Kubernetes-native `PyTorchJob` using the [Kubeflow Training Operator](https://github.com/kubeflow/training-operator/) with [Kueue](https://github.com/kubernetes-sigs/kueue) for the queue management of tuning jobs.
\ No newline at end of file
+A good simple example can be found [here](examples/kfto-kueue-sft-trainer.yaml) which launches a Kubernetes-native `PyTorchJob` using the [Kubeflow Training Operator](https://github.com/kubeflow/training-operator/) with [Kueue](https://github.com/kubernetes-sigs/kueue) for the queue management of tuning jobs.