Skip to content

Commit

Permalink
address nits
Browse files Browse the repository at this point in the history
  • Loading branch information
jbkyang-nvi committed Oct 27, 2023
1 parent 0d27224 commit 94309a7
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions Popular_Models_Guide/Llama2/trtllm_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ TensorRT-LLM requires each model to be compiled for the configuration you need b
> located in the same llama examples folder.
>
> ```bash
> python3 /run.py --engine_dir=<path to your engine>/1-gpu/ --max_output_len 100 --tokenizer_dir <path to your llama repo>/Llama-2-7b-hf --input_text "How do I count to ten in French?"
> python3 run.py --engine_dir=<path to your engine>/1-gpu/ --max_output_len 100 --tokenizer_dir <path to your llama repo>/Llama-2-7b-hf --input_text "How do I count to ten in French?"
> ```

## Serving with Triton
Expand Down Expand Up @@ -125,9 +125,9 @@ To run our Llama2-7B model, you will need to:
```bash
tritonserver --model-repository=/opt/tritonserver/inflight_batcher_llm
```
Note if you built the engine with `--world-size X` where `X` is greater than 1, you will need to use the [launch_triton_server.py](https://github.com/triton-inference-server/tensorrtllm_backend/blob/release/0.5.0/scripts/launch_triton_server.py) script.
Note if you built the engine with `--world_size X` where `X` is greater than 1, you will need to use the [launch_triton_server.py](https://github.com/triton-inference-server/tensorrtllm_backend/blob/release/0.5.0/scripts/launch_triton_server.py) script.
```bash
python3 /tensorrtllm_backend/scripts/launch_triton_server.py --world_size=4 --model_repo=/opt/tritonserver/inflight_batcher_llm
python3 /tensorrtllm_backend/scripts/launch_triton_server.py --world_size=X --model_repo=/opt/tritonserver/inflight_batcher_llm
```

## Client
Expand Down

0 comments on commit 94309a7

Please sign in to comment.