Releases · foundation-model-stack/fms-hf-tuning

01 Oct 16:07

Abhishek-TAMU

v2.0.1

9b8245e

v2.0.1 Latest

Latest

New major features:

Support for LoRA for the following model architectures - llama3, llama3.1, granite (GPTBigCode and LlamaForCausalLM), mistral, mixtral, and allam
Support for QLora for the following model architectures - llama3, granite (GPTBigCode and LlamaForCausalLM), mistral, mixtral
Addition of post-processing function to format tuned adapters as required by vLLM for inference. Refer to README on how to run as a script. When tuning on image, post-processing can be enabled using the flag lora_post_process_for_vllm. See build README for details on how to set this flag.
Enablement of new flags for throughput improvements: padding_free to process multiple examples without adding padding tokens, multipack for multi-GPU training to balance the number of tokens processed on each device, and fast_kernels for optimized tuning with fused operations and triton kernels. See README for details on how to set these flags and use cases.

Dependency upgrades:

Upgraded transformers to version 4.44.2 needed for tuning of all models
Upgraded accelerate to version 0.33 needed for tuning of all models. Version 0.34.0 has a bug for FSDP.

API /interface changes:

train() API now returns a tuple of trainer instance and additional metadata as a dict

Additional features and fixes

Support of resume tuning from the existing checkpoint. Refer to README on how to use it as a flag. Flag resume_training defaults to True.
Addition of default pad token in tokenizer when EOS and PAD tokens are equal to improve training quality.
JSON compatability for input datasets. See docs for details on data formats.
Fix to not resize embedding layer by default, embedding layer can continue to be resized as needed using flag embedding_size_multiple_of.

Full List of what's Changed

fix: do not resize embedding layer by default by @kmehant in #310
fix: logger is unbound error by @HarikrishnanBalagopal in #308
feat: Enable JSON dataset compatibility by @willmj in #297
doc: How to tune LoRA lm_head by @aluu317 in #305
docs: Add findings from exploration into model tuning performance degradation by @willmj in #315
fix: warnings about casing when building the Docker image by @HarikrishnanBalagopal in #318
fix: need to pass skip_prepare_dataset for pretokenized dataset due to breaking change in HF SFTTrainer by @HarikrishnanBalagopal in #326
feat: install fms-acceleration to enable qlora by @anhuong in #284
feat: Migrating the trainer controller to python logger by @seshapad in #309
fix: remove fire ported from Hari's PR #303 by @HarikrishnanBalagopal in #324
dep: cap transformers version due to FSDP bug by @anhuong in #335
deps: Add protobuf to support aLLaM models by @willmj in #336
fix: add enable_aim build args in all stages needed by @anhuong in #337
fix: remove lm_head post processing by @Abhishek-TAMU in #333
doc: Add qLoRA README by @aluu317 in #322
feat: Add deps to evaluate qLora tuned model by @aluu317 in #312
feat: Add support for smoothly resuming training from a saved checkpoint by @Abhishek-TAMU in #300
ci: add a github workflow to label pull requests based on their title by @HarikrishnanBalagopal in #298
fix: Addition of default pad token in tokenizer when EOS and PAD token are equal by @Abhishek-TAMU in #343
feat: Add DataClass Arguments to Activate Padding-Free and MultiPack Plugin and FastKernels by @achew010 in #280
fix: cap transformers at v4.44 by @anhuong in #349
fix: utilities to post process checkpoint for LoRA by @Ssukriti in #338
feat: Add post processing logic to accelerate launch by @willmj in #351
build: install additional fms-acceleration plugins by @anhuong in #350
fix: unable to find output_dir in multi-GPU during resume_from_checkpoint check by @Abhishek-TAMU in #352
fix: check for wte.weight along with embed_tokens.weight by @willmj in #356
release: merge set of changes for v2.0.0 by @Abhishek-TAMU in #357
build(deps): unset hardcoded trl version to get latest updates by @anhuong in #358

New Contributors

@achew010 made their first contribution in #280

Full Changelog: v1.2.2...v2.0.0

Contributors

aluu317, kmehant, and 7 other contributors

Assets 2

30 Sep 21:03

Abhishek-TAMU

v2.0.0

3b150ab

v2.0.0

This version has old dependency and users should move to v2.0.1 instead

Assets 2

27 Sep 23:08

Ssukriti

v2.0.0-rc.2

a37f074

v2.0.0-rc.2 Pre-release

Pre-release

What's Changed

fix: check for wte.weight along with embed_tokens.weight by @willmj in #356

Full Changelog: v2.0.0-rc.1...v2.0.0-rc.2

Contributors

willmj

Assets 2

27 Sep 17:17

Abhishek-TAMU

v2.0.0-rc.1

0c6a062

v2.0.0-rc.1 Pre-release

Pre-release

What's Changed

fix: do not resize embedding layer by default by @kmehant in #310
fix: logger is unbound error by @HarikrishnanBalagopal in #308
feat: Enable JSON dataset compatibility by @willmj in #297
doc: How to tune LoRA lm_head by @aluu317 in #305
docs: Add findings from exploration into model tuning performance degradation by @willmj in #315
fix: warnings about casing when building the Docker image by @HarikrishnanBalagopal in #318
fix: need to pass skip_prepare_dataset for pretokenized dataset due to breaking change in HF SFTTrainer by @HarikrishnanBalagopal in #326
feat: install fms-acceleration to enable qlora by @anhuong in #284
feat: Migrating the trainer controller to python logger by @seshapad in #309
fix: remove fire ported from Hari's PR #303 by @HarikrishnanBalagopal in #324
dep: cap transformers version due to FSDP bug by @anhuong in #335
deps: Add protobuf to support aLLaM models by @willmj in #336
fix: add enable_aim build args in all stages needed by @anhuong in #337
fix: remove lm_head post processing by @Abhishek-TAMU in #333
doc: Add qLoRA README by @aluu317 in #322
feat: Add deps to evaluate qLora tuned model by @aluu317 in #312
feat: Add support for smoothly resuming training from a saved checkpoint by @Abhishek-TAMU in #300
ci: add a github workflow to label pull requests based on their title by @HarikrishnanBalagopal in #298
fix: Addition of default pad token in tokenizer when EOS and PAD token are equal by @Abhishek-TAMU in #343
feat: Add DataClass Arguments to Activate Padding-Free and MultiPack Plugin and FastKernels by @achew010 in #280
fix: cap transformers at v4.44 by @anhuong in #349
fix: utilities to post process checkpoint for LoRA by @Ssukriti in #338
feat: Add post processing logic to accelerate launch by @willmj in #351
build: install additional fms-acceleration plugins by @anhuong in #350
fix: unable to find output_dir in multi-GPU during resume_from_checkpoint check by @Abhishek-TAMU in #352

New Contributors

@achew010 made their first contribution in #280

Full Changelog: v1.2.1...v2.0.0-rc.1

Contributors

aluu317, kmehant, and 7 other contributors

Assets 2

03 Sep 21:48

willmj

v1.2.2

16543ee

v1.2.2

What's Changed

deps: Add protobuf to support ALLaM models by @willmj in #328
deps: set previous versions for accelerate and trl for patch release by @willmj in #329

Full Changelog: v1.2.1...v1.2.2

Contributors

willmj

Assets 2

19 Aug 17:20

willmj

v1.2.1

a6d093e

v1.2.1

What's Changed

fix: setting log level in save() by @anhuong in #304

Full Changelog: v1.2.0...v1.2.1

Contributors

anhuong

Assets 2

16 Aug 16:00

willmj

v1.2.1-rc.1

a6d093e

v1.2.1-rc.1 Pre-release

Pre-release

What's Changed

fix: setting log level in save() by @anhuong in #304

Full Changelog: v1.2.0...v1.2.1-rc.1

Contributors

anhuong

Assets 2

14 Aug 22:38

willmj

v1.2.0

2d1c17c

v1.2.0

Dependency Updates

Update packaging requirement from less than 23.2 to a max of 24 to less than 23.2 to max of 25

API/Interface Changes

Add optional save_model_dir where final checkpoint is saved. See https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/README.md#saving-checkpoints-while-training

Full List of What's Changed

Add config_utils tests by @aluu317 in #262
bug: On save event added to callback by @seshapad in #256
feat: All metric handling changes by @seshapad in #263
feat: Configuration to set logging level for trigger log by @seshapad in #241
Data custom collator by @Ssukriti in #260
feat: per process state metric by @HarikrishnanBalagopal in #239
feat: Add a dockerfile argument to enable aimstack by @dushyantbehl in #261
Set default value of target_modules to be None in LoraConfig by @willmj in #269
feat: Support pretokenized by @kmehant in #272
Update packaging requirement from <24,>=23.2 to >=23.2,<25 by @dependabot in #212
Enabling tests for prompt tuning by @Abhishek-TAMU in #278
fix: do not add special tokens for custom tokenizer by @kmehant in #279
fix: bug where the logger was not being used properly by @HarikrishnanBalagopal in #286
Add functionality to free disk space from Github Actions by @willmj in #287
Add unit test to verify target_modules defaults correctly by @willmj in #281
docs: Add documentation on experiment tracking. by @dushyantbehl in #257
Ensure additional metadata to trackers don't throw error in happy case. by @dushyantbehl in #290
fix: multiple runid creation bug with distributed training by @dushyantbehl in #268
feat: logging control operation by @seshapad in #264
fix run evaluation to get base model path by @anhuong in #273
Fix: Removal of transformers logger and addition of python native logger by @Abhishek-TAMU in #270
feat: Added additional events such as on_step_begin, on_optimizer_step, on_substep_end by @seshapad in #293
Always update setuptools to latest by @jbusche in #288
Rename all fixtures with correct .jsonl extension by @willmj in #295
feat: add save_model_dir flag where final checkpoint saved by @anhuong in #291
feat: Example log controller yaml with training state by @seshapad in #296

New Contributors

@aluu317 made their first contribution in #262
@willmj made their first contribution in #269

Full Changelog: v1.1.0...v1.2.0

Contributors

aluu317, dushyantbehl, and 9 other contributors

Assets 2

14 Aug 13:34

willmj

v1.2.0-rc.1

78909af

v1.2.0-rc.1 Pre-release

Pre-release

What's Changed

Add config_utils tests by @aluu317 in #262
bug: On save event added to callback by @seshapad in #256
feat: All metric handling changes by @seshapad in #263
feat: Configuration to set logging level for trigger log by @seshapad in #241
deps: limit peft deps by @anhuong in #274
Data custom collator by @Ssukriti in #260
Revert "limit peft deps until investigate (#274)" by @anhuong in #275
feat: per process state metric by @HarikrishnanBalagopal in #239
feat: Add a dockerfile argument to enable aimstack by @dushyantbehl in #261
Set default value of target_modules to be None in LoraConfig by @willmj in #269
feat: Support pretokenized by @kmehant in #272
Update packaging requirement from <24,>=23.2 to >=23.2,<25 by @dependabot in #212
Enabling tests for prompt tuning by @Abhishek-TAMU in #278
fix: do not add special tokens for custom tokenizer by @kmehant in #279
fix: bug where the logger was not being used properly by @HarikrishnanBalagopal in #286
Add functionality to free disk space from Github Actions by @willmj in #287
Add unit test to verify target_modules defaults correctly by @willmj in #281
docs: Add documentation on experiment tracking. by @dushyantbehl in #257
Ensure additional metadata to trackers don't throw error in happy case. by @dushyantbehl in #290
fix: multiple runid creation bug with distributed training by @dushyantbehl in #268
feat: logging control operation by @seshapad in #264
fix run evaluation to get base model path by @anhuong in #273
Fix: Removal of transformers logger and addition of python native logger by @Abhishek-TAMU in #270
FIX: Metrics file epoch indexing starting from 0 by @Abhishek-TAMU in #294
feat: Added additional events such as on_step_begin, on_optimizer_step, on_substep_end by @seshapad in #293
Always update setuptools to latest by @jbusche in #288
Rename all fixtures with correct .jsonl extension by @willmj in #295
feat: add save_model_dir flag where final checkpoint saved by @anhuong in #291

New Contributors

@aluu317 made their first contribution in #262
@willmj made their first contribution in #269

Full Changelog: v1.1.0-rc.1...v1.2.0-rc.1

Contributors

aluu317, dushyantbehl, and 9 other contributors

Assets 2

01 Aug 00:45

jbusche

v1.1.0

ab3b331

v1.1.0

What's Changed

fix: Added correct link in main readme for the trainer-controller readme by @seshapad in #254
trainer controller doc updates by @alex-jw-brooks in #244
docs: fix the instructions for running with LORA by @HarikrishnanBalagopal in #265
refactor code to preprocess datasets by @Ssukriti in #259
Replace shutil.copytree() to fix permission error by @olson-ibm in #251
fix: logic for getting tracker config by @HarikrishnanBalagopal in #267
fix: remove lm_head for granite with llama arch models by @anhuong in #258

Full Changelog: v1.0.0...v1.1.0

Contributors

alex-jw-brooks, Ssukriti, and 4 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New major features:

Dependency upgrades:

API /interface changes:

Additional features and fixes

Full List of what's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Dependency Updates

API/Interface Changes

Full List of What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

Releases: foundation-model-stack/fms-hf-tuning

v2.0.1

New major features:

Dependency upgrades:

API /interface changes:

Additional features and fixes

Full List of what's Changed

New Contributors

Contributors

v2.0.0

v2.0.0-rc.2

What's Changed

Contributors

v2.0.0-rc.1

What's Changed

New Contributors

Contributors

v1.2.2

What's Changed

Contributors

v1.2.1

What's Changed

Contributors

v1.2.1-rc.1

What's Changed

Contributors

v1.2.0

Dependency Updates

API/Interface Changes

Full List of What's Changed

New Contributors

Contributors

v1.2.0-rc.1

What's Changed

New Contributors

Contributors

v1.1.0

What's Changed

Contributors