Releases: foundation-model-stack/fms-hf-tuning
Releases · foundation-model-stack/fms-hf-tuning
v2.0.1
New major features:
- Support for LoRA for the following model architectures - llama3, llama3.1, granite (GPTBigCode and LlamaForCausalLM), mistral, mixtral, and allam
- Support for QLora for the following model architectures - llama3, granite (GPTBigCode and LlamaForCausalLM), mistral, mixtral
- Addition of post-processing function to format tuned adapters as required by vLLM for inference. Refer to README on how to run as a script. When tuning on image, post-processing can be enabled using the flag
lora_post_process_for_vllm
. See build README for details on how to set this flag. - Enablement of new flags for throughput improvements:
padding_free
to process multiple examples without adding padding tokens,multipack
for multi-GPU training to balance the number of tokens processed on each device, andfast_kernels
for optimized tuning with fused operations and triton kernels. See README for details on how to set these flags and use cases.
Dependency upgrades:
- Upgraded
transformers
to version 4.44.2 needed for tuning of all models - Upgraded
accelerate
to version 0.33 needed for tuning of all models. Version 0.34.0 has a bug for FSDP.
API /interface changes:
train()
API now returns a tuple of trainer instance and additional metadata as a dict
Additional features and fixes
- Support of resume tuning from the existing checkpoint. Refer to README on how to use it as a flag. Flag
resume_training
defaults toTrue
. - Addition of default pad token in tokenizer when
EOS
andPAD
tokens are equal to improve training quality. - JSON compatability for input datasets. See docs for details on data formats.
- Fix to not resize embedding layer by default, embedding layer can continue to be resized as needed using flag
embedding_size_multiple_of
.
Full List of what's Changed
- fix: do not resize embedding layer by default by @kmehant in #310
- fix: logger is unbound error by @HarikrishnanBalagopal in #308
- feat: Enable JSON dataset compatibility by @willmj in #297
- doc: How to tune LoRA lm_head by @aluu317 in #305
- docs: Add findings from exploration into model tuning performance degradation by @willmj in #315
- fix: warnings about casing when building the Docker image by @HarikrishnanBalagopal in #318
- fix: need to pass skip_prepare_dataset for pretokenized dataset due to breaking change in HF SFTTrainer by @HarikrishnanBalagopal in #326
- feat: install fms-acceleration to enable qlora by @anhuong in #284
- feat: Migrating the trainer controller to python logger by @seshapad in #309
- fix: remove fire ported from Hari's PR #303 by @HarikrishnanBalagopal in #324
- dep: cap transformers version due to FSDP bug by @anhuong in #335
- deps: Add protobuf to support aLLaM models by @willmj in #336
- fix: add enable_aim build args in all stages needed by @anhuong in #337
- fix: remove lm_head post processing by @Abhishek-TAMU in #333
- doc: Add qLoRA README by @aluu317 in #322
- feat: Add deps to evaluate qLora tuned model by @aluu317 in #312
- feat: Add support for smoothly resuming training from a saved checkpoint by @Abhishek-TAMU in #300
- ci: add a github workflow to label pull requests based on their title by @HarikrishnanBalagopal in #298
- fix: Addition of default pad token in tokenizer when EOS and PAD token are equal by @Abhishek-TAMU in #343
- feat: Add DataClass Arguments to Activate Padding-Free and MultiPack Plugin and FastKernels by @achew010 in #280
- fix: cap transformers at v4.44 by @anhuong in #349
- fix: utilities to post process checkpoint for LoRA by @Ssukriti in #338
- feat: Add post processing logic to accelerate launch by @willmj in #351
- build: install additional fms-acceleration plugins by @anhuong in #350
- fix: unable to find output_dir in multi-GPU during resume_from_checkpoint check by @Abhishek-TAMU in #352
- fix: check for wte.weight along with embed_tokens.weight by @willmj in #356
- release: merge set of changes for v2.0.0 by @Abhishek-TAMU in #357
- build(deps): unset hardcoded trl version to get latest updates by @anhuong in #358
New Contributors
Full Changelog: v1.2.2...v2.0.0
v2.0.0
v2.0.0-rc.2
What's Changed
Full Changelog: v2.0.0-rc.1...v2.0.0-rc.2
v2.0.0-rc.1
What's Changed
- fix: do not resize embedding layer by default by @kmehant in #310
- fix: logger is unbound error by @HarikrishnanBalagopal in #308
- feat: Enable JSON dataset compatibility by @willmj in #297
- doc: How to tune LoRA lm_head by @aluu317 in #305
- docs: Add findings from exploration into model tuning performance degradation by @willmj in #315
- fix: warnings about casing when building the Docker image by @HarikrishnanBalagopal in #318
- fix: need to pass skip_prepare_dataset for pretokenized dataset due to breaking change in HF SFTTrainer by @HarikrishnanBalagopal in #326
- feat: install fms-acceleration to enable qlora by @anhuong in #284
- feat: Migrating the trainer controller to python logger by @seshapad in #309
- fix: remove fire ported from Hari's PR #303 by @HarikrishnanBalagopal in #324
- dep: cap transformers version due to FSDP bug by @anhuong in #335
- deps: Add protobuf to support aLLaM models by @willmj in #336
- fix: add enable_aim build args in all stages needed by @anhuong in #337
- fix: remove lm_head post processing by @Abhishek-TAMU in #333
- doc: Add qLoRA README by @aluu317 in #322
- feat: Add deps to evaluate qLora tuned model by @aluu317 in #312
- feat: Add support for smoothly resuming training from a saved checkpoint by @Abhishek-TAMU in #300
- ci: add a github workflow to label pull requests based on their title by @HarikrishnanBalagopal in #298
- fix: Addition of default pad token in tokenizer when EOS and PAD token are equal by @Abhishek-TAMU in #343
- feat: Add DataClass Arguments to Activate Padding-Free and MultiPack Plugin and FastKernels by @achew010 in #280
- fix: cap transformers at v4.44 by @anhuong in #349
- fix: utilities to post process checkpoint for LoRA by @Ssukriti in #338
- feat: Add post processing logic to accelerate launch by @willmj in #351
- build: install additional fms-acceleration plugins by @anhuong in #350
- fix: unable to find output_dir in multi-GPU during resume_from_checkpoint check by @Abhishek-TAMU in #352
New Contributors
Full Changelog: v1.2.1...v2.0.0-rc.1
v1.2.2
v1.2.1
v1.2.1-rc.1
What's Changed
Full Changelog: v1.2.0...v1.2.1-rc.1
v1.2.0
Dependency Updates
- Update packaging requirement from less than 23.2 to a max of 24 to less than 23.2 to max of 25
API/Interface Changes
- Add optional save_model_dir where final checkpoint is saved. See https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/README.md#saving-checkpoints-while-training
Full List of What's Changed
- Add config_utils tests by @aluu317 in #262
- bug: On save event added to callback by @seshapad in #256
- feat: All metric handling changes by @seshapad in #263
- feat: Configuration to set logging level for trigger log by @seshapad in #241
- Data custom collator by @Ssukriti in #260
- feat: per process state metric by @HarikrishnanBalagopal in #239
- feat: Add a dockerfile argument to enable aimstack by @dushyantbehl in #261
- Set default value of target_modules to be None in LoraConfig by @willmj in #269
- feat: Support pretokenized by @kmehant in #272
- Update packaging requirement from <24,>=23.2 to >=23.2,<25 by @dependabot in #212
- Enabling tests for prompt tuning by @Abhishek-TAMU in #278
- fix: do not add special tokens for custom tokenizer by @kmehant in #279
- fix: bug where the logger was not being used properly by @HarikrishnanBalagopal in #286
- Add functionality to free disk space from Github Actions by @willmj in #287
- Add unit test to verify target_modules defaults correctly by @willmj in #281
- docs: Add documentation on experiment tracking. by @dushyantbehl in #257
- Ensure additional metadata to trackers don't throw error in happy case. by @dushyantbehl in #290
- fix: multiple runid creation bug with distributed training by @dushyantbehl in #268
- feat: logging control operation by @seshapad in #264
- fix run evaluation to get base model path by @anhuong in #273
- Fix: Removal of transformers logger and addition of python native logger by @Abhishek-TAMU in #270
- feat: Added additional events such as on_step_begin, on_optimizer_step, on_substep_end by @seshapad in #293
- Always update setuptools to latest by @jbusche in #288
- Rename all fixtures with correct .jsonl extension by @willmj in #295
- feat: add save_model_dir flag where final checkpoint saved by @anhuong in #291
- feat: Example log controller yaml with training state by @seshapad in #296
New Contributors
Full Changelog: v1.1.0...v1.2.0
v1.2.0-rc.1
What's Changed
- Add config_utils tests by @aluu317 in #262
- bug: On save event added to callback by @seshapad in #256
- feat: All metric handling changes by @seshapad in #263
- feat: Configuration to set logging level for trigger log by @seshapad in #241
- deps: limit peft deps by @anhuong in #274
- Data custom collator by @Ssukriti in #260
- Revert "limit peft deps until investigate (#274)" by @anhuong in #275
- feat: per process state metric by @HarikrishnanBalagopal in #239
- feat: Add a dockerfile argument to enable aimstack by @dushyantbehl in #261
- Set default value of target_modules to be None in LoraConfig by @willmj in #269
- feat: Support pretokenized by @kmehant in #272
- Update packaging requirement from <24,>=23.2 to >=23.2,<25 by @dependabot in #212
- Enabling tests for prompt tuning by @Abhishek-TAMU in #278
- fix: do not add special tokens for custom tokenizer by @kmehant in #279
- fix: bug where the logger was not being used properly by @HarikrishnanBalagopal in #286
- Add functionality to free disk space from Github Actions by @willmj in #287
- Add unit test to verify target_modules defaults correctly by @willmj in #281
- docs: Add documentation on experiment tracking. by @dushyantbehl in #257
- Ensure additional metadata to trackers don't throw error in happy case. by @dushyantbehl in #290
- fix: multiple runid creation bug with distributed training by @dushyantbehl in #268
- feat: logging control operation by @seshapad in #264
- fix run evaluation to get base model path by @anhuong in #273
- Fix: Removal of transformers logger and addition of python native logger by @Abhishek-TAMU in #270
- FIX: Metrics file epoch indexing starting from 0 by @Abhishek-TAMU in #294
- feat: Added additional events such as on_step_begin, on_optimizer_step, on_substep_end by @seshapad in #293
- Always update setuptools to latest by @jbusche in #288
- Rename all fixtures with correct .jsonl extension by @willmj in #295
- feat: add save_model_dir flag where final checkpoint saved by @anhuong in #291
New Contributors
Full Changelog: v1.1.0-rc.1...v1.2.0-rc.1
v1.1.0
What's Changed
- fix: Added correct link in main readme for the trainer-controller readme by @seshapad in #254
- trainer controller doc updates by @alex-jw-brooks in #244
- docs: fix the instructions for running with LORA by @HarikrishnanBalagopal in #265
- refactor code to preprocess datasets by @Ssukriti in #259
- Replace shutil.copytree() to fix permission error by @olson-ibm in #251
- fix: logic for getting tracker config by @HarikrishnanBalagopal in #267
- fix: remove lm_head for granite with llama arch models by @anhuong in #258
Full Changelog: v1.0.0...v1.1.0