Skip to content

Commit

Permalink
Doc updates
Browse files Browse the repository at this point in the history
  • Loading branch information
fpetrini15 committed Aug 5, 2024
1 parent 210723c commit 52eef91
Show file tree
Hide file tree
Showing 14 changed files with 37 additions and 26 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ configuration](docs/user_guide/model_configuration.md) for the model.
[Backend-Platform Support Matrix](https://github.com/triton-inference-server/backend/blob/main/docs/backend_platform_support_matrix.md)
to learn which backends are supported on your target platform.
- Learn how to [optimize performance](docs/user_guide/optimization.md) using the
[Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
[Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
and
[Model Analyzer](https://github.com/triton-inference-server/model_analyzer)
- Learn how to [manage loading and unloading models](docs/user_guide/model_management.md) in
Expand Down
2 changes: 1 addition & 1 deletion deploy/gke-marketplace-app/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ The client example push about ~650 QPS(Query per second) to Triton Server, and w
![Locust Client Chart](client.png)

Alternatively, user can opt to use
[Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
[Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
to profile and study the performance of Triton Inference Server. Here we also
provide a
[client script](https://github.com/triton-inference-server/server/tree/master/deploy/gke-marketplace-app/client-sample/perf_analyzer_grpc.sh)
Expand Down
2 changes: 1 addition & 1 deletion deploy/k8s-onprem/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,7 @@ Image 'images/mug.jpg':
After you have confirmed that your Triton cluster is operational and can perform inference,
you can test the load balancing and autoscaling features by sending a heavy load of requests.
One option for doing this is using the
[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
[perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
application.

You can apply a progressively increasing load with a command like:
Expand Down
4 changes: 2 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ Understanding Inference performance is key to better resource utilization. Use T
- [Performance Tuning Guide](user_guide/performance_tuning.md)
- [Optimization](user_guide/optimization.md)
- [Model Analyzer](user_guide/model_analyzer.md)
- [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
- [Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
- [Inference Request Tracing](user_guide/trace.md)
### Jetson and JetPack
Triton can be deployed on edge devices. Explore [resources](user_guide/jetson.md) and [examples](examples/jetson/README.md).
Expand All @@ -185,7 +185,7 @@ The following resources are recommended to explore the full suite of Triton Infe

- **Configuring Deployment**: Triton comes with three tools which can be used to configure deployment setting, measure performance and recommend optimizations.
- [Model Analyzer](https://github.com/triton-inference-server/model_analyzer) Model Analyzer is CLI tool built to recommend deployment configurations for Triton Inference Server based on user's Quality of Service Requirements. It also generates detailed reports about model performance to summarize the benefits and trade offs of different configurations.
- [Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md):
- [Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md):
Perf Analyzer is a CLI application built to generate inference requests and
measures the latency of those requests and throughput of the model being
served.
Expand Down
29 changes: 18 additions & 11 deletions docs/contents.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,17 +119,24 @@ client/src/grpc_generated/java/README
:maxdepth: 1
:caption: Performance Analyzer
client/src/c++/perf_analyzer/README
client/src/c++/perf_analyzer/docs/README
client/src/c++/perf_analyzer/docs/install
client/src/c++/perf_analyzer/docs/quick_start
client/src/c++/perf_analyzer/docs/cli
client/src/c++/perf_analyzer/docs/inference_load_modes
client/src/c++/perf_analyzer/docs/input_data
client/src/c++/perf_analyzer/docs/measurements_metrics
client/src/c++/perf_analyzer/docs/benchmarking
client/src/c++/perf_analyzer/genai-perf/README
client/src/c++/perf_analyzer/genai-perf/examples/tutorial
perf_analyzer/README
perf_analyzer/docs/README
perf_analyzer/docs/install
perf_analyzer/docs/quick_start
perf_analyzer/docs/cli
perf_analyzer/docs/inference_load_modes
perf_analyzer/docs/input_data
perf_analyzer/docs/measurements_metrics
perf_analyzer/docs/benchmarking
perf_analyzer/genai-perf/README
perf_analyzer/genai-perf/docs/compare
perf_analyzer/genai-perf/docs/embeddings
perf_analyzer/genai-perf/docs/files
perf_analyzer/genai-perf/docs/lora
perf_analyzer/genai-perf/docs/multi_modal
perf_analyzer/genai-perf/docs/rankings
perf_analyzer/genai-perf/docs/tutorial
perf_analyzer/genai-perf/examples/tutorial
```

```{toctree}
Expand Down
4 changes: 2 additions & 2 deletions docs/examples/jetson/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Inference Server as a shared library.
## Part 2. Analyzing model performance with perf_analyzer

To analyze model performance on Jetson,
[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
[perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
tool is used. The `perf_analyzer` is included in the release tar file or can be
compiled from source.

Expand All @@ -65,4 +65,4 @@ From this directory of the repository, execute the following to evaluate model p

In the example above we saved the results as a `.csv` file. To visualize these
results, follow the steps described
[here](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md).
[here](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md).
4 changes: 4 additions & 0 deletions docs/generate_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,10 @@ def main():
if "client" in repo_tags:
clone_from_github("client", repo_tags["client"], github_org)

# Usage generate_docs.py --repo-tag=perf_analyzer:main
if "perf_analyzer" in repo_tags:
clone_from_github("perf_analyzer", repo_tags["perf_analyzer"], github_org)

# Usage generate_docs.py --repo-tag=python_backend:main
if "python_backend" in repo_tags:
clone_from_github("python_backend", repo_tags["python_backend"], github_org)
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guide/debugging_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Before proceeding, please see if the model configuration documentation [here](./
- [Custom_models](https://github.com/triton-inference-server/server/tree/main/qa/custom_models), [ensemble_models](https://github.com/triton-inference-server/server/tree/main/qa/ensemble_models), and [python_models](https://github.com/triton-inference-server/server/tree/main/qa/python_models) include examples of configs for their respective use cases.
- [L0_model_config](https://github.com/triton-inference-server/server/tree/main/qa/L0_model_config) tests many types of incomplete model configs.

Note that if you are running into an issue with [perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/README.md) or [Model Analyzer](https://github.com/triton-inference-server/model_analyzer), try loading the model onto Triton directly. This checks if the configuration is incorrect or the perf_analyzer or Model Analyzer options need to be updated.
Note that if you are running into an issue with [perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) or [Model Analyzer](https://github.com/triton-inference-server/model_analyzer), try loading the model onto Triton directly. This checks if the configuration is incorrect or the perf_analyzer or Model Analyzer options need to be updated.

## Model Issues
**Step 1. Run Models Outside of Triton**
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guide/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ available through the [HTTP/REST, GRPC, and C
APIs](../customization_guide/inference_protocols.md).

A client application,
[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md),
[perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md),
allows you to measure the performance of an individual model using a synthetic
load. The perf_analyzer application is designed to show you the tradeoff of
latency vs. throughput.
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guide/jetson.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ tritonserver --model-repository=/path/to/model_repo --backend-directory=/path/to
```

**Note**:
[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
[perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
is supported on Jetson, while the [model_analyzer](model_analyzer.md) is
currently not available for Jetson. To execute `perf_analyzer` for C API, use
the CLI flag `--service-kind=triton_c_api`:
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guide/model_analyzer.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@

The Triton [Model Analyzer](https://github.com/triton-inference-server/model_analyzer)
is a tool that uses
[Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
[Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
to send requests to your model while measuring GPU memory and compute
utilization. The Model Analyzer is specifically useful for characterizing the
GPU memory requirements for your model under different batching and model
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guide/model_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -934,7 +934,7 @@ dynamic batcher configurations.
```

* Use the
[Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
[Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
to determine the latency and throughput provided by the default dynamic
batcher configuration.

Expand Down
2 changes: 1 addition & 1 deletion docs/user_guide/optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ single GPU.
Unless you already have a client application suitable for measuring
the performance of your model on Triton, you should familiarize
yourself with
[Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md).
[Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md).
The Performance Analyzer is an essential tool for optimizing your model's
performance.

Expand Down
4 changes: 2 additions & 2 deletions docs/user_guide/performance_tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ For additional material, see the
verify that we can run inference requests and get a baseline performance
benchmark of your model.
Triton's
[Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
[Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
tool specifically fits this purpose. Here is a simplified output for
demonstration purposes:

Expand Down Expand Up @@ -103,7 +103,7 @@ For additional material, see the
There are many variables that can be tweaked just within your model
configuration (`config.pbtxt`) to obtain different results.
- As your model, config, or use case evolves,
[Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
[Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
is a great tool to quickly verify model functionality and performance.

3. How can I improve my model performance?
Expand Down

0 comments on commit 52eef91

Please sign in to comment.