Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add low cpu mem support for LLM instantiation #3460

Closed
wants to merge 1 commit into from

Conversation

arnavgarg1
Copy link
Contributor

Reduces memory overhead when LLMs are loaded onto CPU memory but enabling low_cpu_mem_usage that loads the model using ~1x model size CPU memory. See https://huggingface.co/docs/transformers/main_classes/model#transformers.PreTrainedModel.from_pretrained for more info.

This also enables half-precision loading by default, which seems like a reasonable thing to do. It uses float16 since that always works, unlike bf16 which has hardware restrictions.

@arnavgarg1 arnavgarg1 requested review from geoffreyangus, tgaddair, justinxzhao and jppgks and removed request for geoffreyangus July 12, 2023 10:18
@arnavgarg1 arnavgarg1 changed the title Add low cpu mem support for LLMs Add low cpu mem support for LLM instantiation Jul 12, 2023
@github-actions
Copy link

github-actions bot commented Jul 12, 2023

Unit Test Results

  4 files  ±0    4 suites  ±0   51m 41s ⏱️ - 4m 14s
34 tests ±0  29 ✔️ ±0    5 💤 ±0  0 ±0 
68 runs  ±0  58 ✔️ ±0  10 💤 ±0  0 ±0 

Results for commit 78b9a9e. ± Comparison against base commit 60f1416.

♻️ This comment has been updated with latest results.

@arnavgarg1 arnavgarg1 closed this Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant