Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
IlyasMoutawwakil authored Feb 22, 2024
1 parent 4a0e7d4 commit a87caf2
Showing 1 changed file with 6 additions and 5 deletions.
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Py-TGI

Py-TGI is a Python wrapper around [TGI](https://github.com/huggingface/text-generation-inference) to enable creating and running TGI servers in a similar style to vLLM.
Py-TGI is a Python wrapper around [Text-Generation-Inference](https://github.com/huggingface/text-generation-inference) that enables creating and running TGI instances through the awesome `docker-py` in a similar style to Transformers API.

## Installation

Expand All @@ -10,17 +10,18 @@ pip install py-tgi

## Usage

Py-TGI is designed to be used in a similar way to vLLM. Here's an example of how to use it:
Py-TGI is designed to be used in a similar way to Transformers API. We use `docker-py` (instead of a dirty `subprocess` solution) so that the containers you run are linked to the main process and are stopped automatically when your code finishes or fails.
Here's an example of how to use it:

```python
from py_tgi import TGI
from py_tgi.utils import is_nvidia_system, is_rocm_system

llm = TGI(
model="TheBloke/Llama-2-7B-AWQ", # awq model checkpoint
devices=["/dev/kfd", "/dev/dri"] if is_rocm_system() else None, # custom devices (ROCm)
gpus="all" if is_nvidia_system() else None, # all gpus (NVIDIA)
quantize="gptq", # use exllama kernels (rocm compatible)
quantize="gptq", # use exllama kernels (awq compatible)
devices=["/dev/kfd", "/dev/dri"] if is_rocm_system() else None,
gpus="all" if is_nvidia_system() else None,
)
output = llm.generate(["Hi, I'm a language model", "I'm fine, how are you?"])
print(output)
Expand Down

0 comments on commit a87caf2

Please sign in to comment.