Skip to content
This repository has been archived by the owner on Oct 9, 2024. It is now read-only.

Are there fine-tuning and inference scripts available for int4 quantization in bloom-7b? Is it possible to limit the GPU memory usage to within 10GB? #94

Open
dizhenx opened this issue May 31, 2023 · 1 comment

Comments

@dizhenx
Copy link

dizhenx commented May 31, 2023

Where can I download bloom-7b?
I noticed that int8 quantization is available, but is there an option for int4 quantization?
What is the memory overhead for int4 and int8 when using LoRA or PTuning fine-tuning? Are there any fine-tuning scripts available?
Additionally, are there inference scripts available for int4 quantization? How much GPU memory is required for int4 and int8 inference, respectively?

@mayank31398
Copy link
Collaborator

This is not possible.
But you might want to take a look at QLoRA paper: https://github.com/artidoro/qlora

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants