-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
8 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,14 @@ | ||
## Model Quantization | ||
|
||
This module provides interactive way to quantize your model. | ||
To quantize model: | ||
This module provides an interactive way to quantize your model. | ||
To quantize model, run: | ||
`python -m grag.quantize.quantize` | ||
|
||
After running the above command user will be prompted with following: | ||
After running the above command, user will be prompted with the following: | ||
|
||
- Path where user want to clone [llama.cpp](!https://github.com/ggerganov/llama.cpp) repo. | ||
- Path where user wants to clone [llama.cpp](!https://github.com/ggerganov/llama.cpp) repo | ||
- If user wants us to download model from [HuggingFace](!https://huggingface.co/models) or user has model downloaded | ||
locally. | ||
- For former, user will be prompted to provide repo path from HuggingFace. | ||
- In case of later, user will be instructed to copy the model and input the name of model directory. | ||
- Finally, user will be prompted to enter quantization (recommended Q5_K_M or Q4_K_M, etc.). Check | ||
more [here](!https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/quantize.cpp#L19). | ||
locally | ||
- For the former, user will be prompted to provide repo path from HuggingFace | ||
- For the latter, user will be instructed to copy the model and input the name of model directory | ||
- Finally, user will be prompted to enter quantization (recommended Q5_K_M or Q4_K_M, etc.). For more details, check [here](!https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/quantize.cpp#L19). |