Skip to content

Latest commit

 

History

History
191 lines (149 loc) · 12.6 KB

File metadata and controls

191 lines (149 loc) · 12.6 KB

LoNAS

Official implementation of LoNAS: Elastic Low-Rank Adapters for Efficient Large Language Models.

This repo contains the code for LoNAS, which is a pioneering method that leverages Neural Architecture Search (NAS) to explore a space of elastic low-rank adapters, effectively compressing large language models while maintaining or even enhancing performance, thus facilitating their use in resource-constrained environments. Please refer to our paper for more details.

Setup

Here is an installation script developed from scratch for LoNAS.

conda create -n lonas -y python=3.10
conda activate lonas

# install pytorch
pip install torch==2.1.2

# install dependencies
bash install.sh

Note: Please ignore the whitespace issues when applying the patch and running install.sh.

Quick Start

Training

Taking the unified commonsense reasoning training as an example, please download the 15K instruction-following commonsense reasoning training data from LLM-Adapters, and place it under DATA_PATH.

Example command to train a super-adapter of LLaMA-7B using LoNAS:

CUDA_VISIBLE_DEVICES=${DEVICES} python run_commonsense.py \
    --dataset_path $DATA_PATH//commonsense_15k.json \
    --model_name_or_path yahma/llama-7b-hf \
    --do_train \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --num_train_epochs 6 \
    --warmup_steps 100 \
    --optim adamw_torch \
    --fp16 \
    --output_dir trained_super_adapter/unified_commonsense/lonas-llama-7b-commonsense \
    --logging_steps 20 \
    --save_strategy epoch \
    --save_total_limit 2 \
    --lora \
    --lora_r 32 \
    --lora_alpha 64 \
    --lora_dropout 0.1 \
    --target_modules q_proj,k_proj,v_proj,up_proj,gate_proj,down_proj \
    --nncf_config nncf_config/unified_commonsense/nncf_lonas_llama_7b.json

The nncf_config indicates the NNCF configuration encompassing the search space for elastic adapters and modules of the base model (e.g., q_proj). The implementation of the elastic modules leverages the BootstrapNAS feature of OpenVINO™ NNCF. And we employ the stage LR scheduler within NNCF, so the learning rate schedule is specified within the NNCF configuration file, rather than within the arguments of TrainingArguments. For instance,

"schedule": {
    "list_stage_descriptions": [
        {"train_dims": ["width"], "epochs": 6, "depth_indicator": 1, "width_indicator": 5, "init_lr": 3e-4, "epochs_lr": 6, "sample_rate": 1}
    ]
},

For more details on the stage scheduler, see BootstrapNAS.md. After training, the weights of the trained super-adapter will be obtained in the --output_dir directory.

Evaluation

All evaluation datasets can be downloaded from LLM-Adapters. Place them into the directory datasets/.

git clone https://github.com/AGI-Edgerunners/LLM-Adapters.git
mv LLM-Adapters/dataset/ datasets/ 

Example command to evaluate the trained super-adapter (heuristic subnetwork):

CUDA_VISIBLE_DEVICES=${DEVICES} python run_commonsense.py \
    --dataset_path None \
    --model_name_or_path yahma/llama-7b-hf \
    --lora \
    --lora_weights trained_super_adapter/unified_commonsense/lonas-llama-7b-commonsense \
    --nncf_config nncf_config/unified_commonsense/nncf_lonas_llama_7b.json \
    --do_test \
    --output_dir trained_super_adapter/unified_commonsense/lonas-llama-7b-commonsense/results

This command evaluates the performance of the heuristic subnetwork across eight commonsense reasoning tasks: BoolQ, PIQA, SIQA, HellaSwag, WinoG, Arc-e, Arc-c, and OBQA.

Search

In order to discover more optimized subnetworks within the trained super-network, LoNAS employs advanced search algorithms to further explore the super-network. To implement it, we leverage OpenVINO™ NNCF, which conveniently supports various search algorithms, requiring the configuration of search settings within NNCF config, such as:

"search": {
    "algorithm": "NSGA2",
    "batchnorm_adaptation": {
        "num_bn_adaptation_samples": 0
    },
    "num_evals": 200,
    "population": 5,
    "ref_acc": 0.45,
    "acc_delta": 0.01
}

Further details can be found in BootstrapNAS.md. The following is an example command to search for the trained super adapter:

CUDA_VISIBLE_DEVICES=${DEVICES} python run_commonsense.py \
    --dataset_path $DATA_PATH//commonsense_15k.json \
    --model_name_or_path yahma/llama-7b-hf \
    --lora \
    --lora_weights trained_super_adapter/unified_commonsense/lonas-llama-7b-commonsense \
    --val_set_size 1000
    --nncf_config nncf_config/unified_commonsense/nncf_lonas_llama_7b.json \
    --do_search \
    --output_dir trained_super_adapter/unified_commonsense/lonas-llama-7b-commonsense/search

The argument --val_set_size 1000 signifies the utilization of 1k validation samples to evaluate each discovered subnetwork. After running this command, results of the 200 identified subnetworks ("num_evals": 200 set in search field of NNCF config) can be obtained in the --output_dir folder, including search_progression.png and search_progression.csv. From these results, we can select the subnetwork configurations that best meets different requirements.

Released Models

Name Tasks Base Model
lonas-bert-base-glue RTE, MRPC, STS-B, CoLA, SST2, QNLI, QQP, MNLI bert-base-uncased
lonas-llama-7b-commonsense Commonsense Reasoning yahma/llama-7b-hf
lonas-bloomz-7b-math Math Reasoning bigscience/bloomz-7b1

Reproduce Results

Please refer to running_commands for all commands related to reproducing the paper's results.

  • GLUE benchmark
Method Trainable Parameter Ratio GFLOPs RTE MRPC STS-B CoLA SST-2 QNLI QQP MNLI AVG
LoRA 0.27% 11.2 65.85 84.46 88.73 57.58 92.06 90.62 89.41 83.00 81.46
LoNAS 0.27% 8.0 70.76 88.97 88.28 61.12 93.23 91.21 88.55 82.00 83.02
  • Commonsense Reasoning
Method Total Params. TFLOPs BoolQ PIQA SIQA HellaSwag WinoG Arc-e Arc-c OBQA Average
LoRA 6.7B 1.7 62.6 75.3 67.9 52.9 58.6 79.2 58.3 71.2 65.8
LoNAS 5.6B 1.4 62.9 73.0 68.7 51.4 63.9 72.3 58.5 71.0 65.2
  • Math Reasoning
Method Total Params. TFLOPs GSM8K AQuA MAWPS SVAMP Average
LoRA 7.1B 1.8 17.4 21.3 70.2 41.0 37.5
LoNAS 6.1B 1.5 18.6 22.0 76.5 31.8 37.2

Citation

@inproceedings{
munoz2024lonas,
title={LoNAS: Elastic Low-Rank Adapters for Efficient Large Language Models},
author={J. Pablo Muñoz and Jinjie Yuan and Yi Zheng and Nilesh Jain},
booktitle={The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation},
year={2024},
url={https://aclanthology.org/2024.lrec-main.940/}
}

Acknowledgement

This work benefits from the following repositories: