Add low cpu mem support for LLM instantiation #3460

arnavgarg1 · 2023-07-12T10:18:10Z

Reduces memory overhead when LLMs are loaded onto CPU memory but enabling low_cpu_mem_usage that loads the model using ~1x model size CPU memory. See https://huggingface.co/docs/transformers/main_classes/model#transformers.PreTrainedModel.from_pretrained for more info.

This also enables half-precision loading by default, which seems like a reasonable thing to do. It uses float16 since that always works, unlike bf16 which has hardware restrictions.

github-actions · 2023-07-12T11:28:01Z

Unit Test Results

  4 files ±0   4 suites ±0 51m 41s ⏱️ - 4m 14s
34 tests ±0 29 ✔️ ±0   5 💤 ±0 0 ❌ ±0
68 runs ±0 58 ✔️ ±0 10 💤 ±0 0 ❌ ±0

Results for commit 78b9a9e. ± Comparison against base commit 60f1416.

♻️ This comment has been updated with latest results.

Add low cpu mem support for LLMs

78b9a9e

arnavgarg1 requested review from geoffreyangus, tgaddair, justinxzhao and jppgks and removed request for geoffreyangus July 12, 2023 10:18

arnavgarg1 changed the title ~~Add low cpu mem support for LLMs~~ Add low cpu mem support for LLM instantiation Jul 12, 2023

arnavgarg1 closed this Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add low cpu mem support for LLM instantiation #3460

Add low cpu mem support for LLM instantiation #3460

arnavgarg1 commented Jul 12, 2023

github-actions bot commented Jul 12, 2023 •

edited

Loading

Add low cpu mem support for LLM instantiation #3460

Add low cpu mem support for LLM instantiation #3460

Conversation

arnavgarg1 commented Jul 12, 2023

github-actions bot commented Jul 12, 2023 • edited Loading

Unit Test Results

github-actions bot commented Jul 12, 2023 •

edited

Loading