Doc updates

triton-inference-server · Aug 5, 2024 · 52eef91 · 52eef91
1 parent 210723c
commit 52eef91
Show file tree

Hide file tree

Showing 14 changed files with 37 additions and 26 deletions.
diff --git a/README.md b/README.md
@@ -179,7 +179,7 @@ configuration](docs/user_guide/model_configuration.md) for the model.
   [Backend-Platform Support Matrix](https://github.com/triton-inference-server/backend/blob/main/docs/backend_platform_support_matrix.md)
   to learn which backends are supported on your target platform.
 - Learn how to [optimize performance](docs/user_guide/optimization.md) using the
-  [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+  [Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
   and
   [Model Analyzer](https://github.com/triton-inference-server/model_analyzer)
 - Learn how to [manage loading and unloading models](docs/user_guide/model_management.md) in

diff --git a/deploy/gke-marketplace-app/README.md b/deploy/gke-marketplace-app/README.md
@@ -172,7 +172,7 @@ The client example push about ~650 QPS(Query per second) to Triton Server, and w
 ![Locust Client Chart](client.png)
 
 Alternatively, user can opt to use
-[Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+[Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
 to profile and study the performance of Triton Inference Server. Here we also
 provide a
 [client script](https://github.com/triton-inference-server/server/tree/master/deploy/gke-marketplace-app/client-sample/perf_analyzer_grpc.sh)

diff --git a/deploy/k8s-onprem/README.md b/deploy/k8s-onprem/README.md
@@ -295,7 +295,7 @@ Image 'images/mug.jpg':
 After you have confirmed that your Triton cluster is operational and can perform inference,
 you can test the load balancing and autoscaling features by sending a heavy load of requests.
 One option for doing this is using the
-[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+[perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
 application.
 
 You can apply a progressively increasing load with a command like:

diff --git a/docs/README.md b/docs/README.md
@@ -173,7 +173,7 @@ Understanding Inference performance is key to better resource utilization. Use T
 - [Performance Tuning Guide](user_guide/performance_tuning.md)
 - [Optimization](user_guide/optimization.md)
 - [Model Analyzer](user_guide/model_analyzer.md)
-- [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+- [Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
 - [Inference Request Tracing](user_guide/trace.md)
 ### Jetson and JetPack
 Triton can be deployed on edge devices. Explore [resources](user_guide/jetson.md) and [examples](examples/jetson/README.md).
@@ -185,7 +185,7 @@ The following resources are recommended to explore the full suite of Triton Infe
 
 - **Configuring Deployment**: Triton comes with three tools which can be used to configure deployment setting, measure performance and recommend optimizations.
   - [Model Analyzer](https://github.com/triton-inference-server/model_analyzer) Model Analyzer is CLI tool built to recommend deployment configurations for Triton Inference Server based on user's Quality of Service Requirements. It also generates detailed reports about model performance to summarize the benefits and trade offs of different configurations.
-  - [Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md):
+  - [Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md):
   Perf Analyzer is a CLI application built to generate inference requests and
   measures the latency of those requests and throughput of the model being
   served.

diff --git a/docs/contents.md b/docs/contents.md
@@ -119,17 +119,24 @@ client/src/grpc_generated/java/README
 :maxdepth: 1
 :caption: Performance Analyzer
 
-client/src/c++/perf_analyzer/README
-client/src/c++/perf_analyzer/docs/README
-client/src/c++/perf_analyzer/docs/install
-client/src/c++/perf_analyzer/docs/quick_start
-client/src/c++/perf_analyzer/docs/cli
-client/src/c++/perf_analyzer/docs/inference_load_modes
-client/src/c++/perf_analyzer/docs/input_data
-client/src/c++/perf_analyzer/docs/measurements_metrics
-client/src/c++/perf_analyzer/docs/benchmarking
-client/src/c++/perf_analyzer/genai-perf/README
-client/src/c++/perf_analyzer/genai-perf/examples/tutorial
+perf_analyzer/README
+perf_analyzer/docs/README
+perf_analyzer/docs/install
+perf_analyzer/docs/quick_start
+perf_analyzer/docs/cli
+perf_analyzer/docs/inference_load_modes
+perf_analyzer/docs/input_data
+perf_analyzer/docs/measurements_metrics
+perf_analyzer/docs/benchmarking
+perf_analyzer/genai-perf/README
+perf_analyzer/genai-perf/docs/compare
+perf_analyzer/genai-perf/docs/embeddings
+perf_analyzer/genai-perf/docs/files
+perf_analyzer/genai-perf/docs/lora
+perf_analyzer/genai-perf/docs/multi_modal
+perf_analyzer/genai-perf/docs/rankings
+perf_analyzer/genai-perf/docs/tutorial
+perf_analyzer/genai-perf/examples/tutorial
 ```
 
 ```{toctree}

diff --git a/docs/examples/jetson/README.md b/docs/examples/jetson/README.md
@@ -53,7 +53,7 @@ Inference Server as a shared library.
 ## Part 2. Analyzing model performance with perf_analyzer
 
 To analyze model performance on Jetson,
-[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+[perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
 tool is used. The `perf_analyzer` is included in the release tar file or can be
 compiled from source.
 
@@ -65,4 +65,4 @@ From this directory of the repository, execute the following to evaluate model p
 
 In the example above we saved the results as a `.csv` file. To visualize these
 results, follow the steps described
-[here](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md).
+[here](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md).
diff --git a/docs/generate_docs.py b/docs/generate_docs.py
@@ -388,6 +388,10 @@ def main():
     if "client" in repo_tags:
         clone_from_github("client", repo_tags["client"], github_org)
 
+    # Usage generate_docs.py --repo-tag=perf_analyzer:main
+    if "perf_analyzer" in repo_tags:
+        clone_from_github("perf_analyzer", repo_tags["perf_analyzer"], github_org)
+
     # Usage generate_docs.py --repo-tag=python_backend:main
     if "python_backend" in repo_tags:
         clone_from_github("python_backend", repo_tags["python_backend"], github_org)

diff --git a/docs/user_guide/debugging_guide.md b/docs/user_guide/debugging_guide.md
@@ -59,7 +59,7 @@ Before proceeding, please see if the model configuration documentation [here](./
     - [Custom_models](https://github.com/triton-inference-server/server/tree/main/qa/custom_models), [ensemble_models](https://github.com/triton-inference-server/server/tree/main/qa/ensemble_models), and [python_models](https://github.com/triton-inference-server/server/tree/main/qa/python_models) include examples of configs for their respective use cases.
     - [L0_model_config](https://github.com/triton-inference-server/server/tree/main/qa/L0_model_config) tests many types of incomplete model configs.
 
-Note that if you are running into an issue with [perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/README.md) or [Model Analyzer](https://github.com/triton-inference-server/model_analyzer), try loading the model onto Triton directly. This checks if the configuration is incorrect or the perf_analyzer or Model Analyzer options need to be updated.
+Note that if you are running into an issue with [perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) or [Model Analyzer](https://github.com/triton-inference-server/model_analyzer), try loading the model onto Triton directly. This checks if the configuration is incorrect or the perf_analyzer or Model Analyzer options need to be updated.
 
 ## Model Issues
 **Step 1. Run Models Outside of Triton**

diff --git a/docs/user_guide/faq.md b/docs/user_guide/faq.md
@@ -99,7 +99,7 @@ available through the [HTTP/REST, GRPC, and C
 APIs](../customization_guide/inference_protocols.md).
 
 A client application,
-[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md),
+[perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md),
 allows you to measure the performance of an individual model using a synthetic
 load. The perf_analyzer application is designed to show you the tradeoff of
 latency vs. throughput.

diff --git a/docs/user_guide/jetson.md b/docs/user_guide/jetson.md
@@ -201,7 +201,7 @@ tritonserver --model-repository=/path/to/model_repo --backend-directory=/path/to
 ```
 
 **Note**:
-[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+[perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
 is supported on Jetson, while the [model_analyzer](model_analyzer.md) is
 currently not available for Jetson. To execute `perf_analyzer` for C API, use
 the CLI flag `--service-kind=triton_c_api`:

diff --git a/docs/user_guide/model_analyzer.md b/docs/user_guide/model_analyzer.md
@@ -30,7 +30,7 @@
 
 The Triton [Model Analyzer](https://github.com/triton-inference-server/model_analyzer)
  is a tool that uses
-[Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+[Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
 to send requests to your model while measuring GPU memory and compute
 utilization. The Model Analyzer is specifically useful for characterizing the
 GPU memory requirements for your model under different batching and model

diff --git a/docs/user_guide/model_configuration.md b/docs/user_guide/model_configuration.md
@@ -934,7 +934,7 @@ dynamic batcher configurations.
 ```
 
 * Use the
-  [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+  [Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
   to determine the latency and throughput provided by the default dynamic
   batcher configuration.
 

diff --git a/docs/user_guide/optimization.md b/docs/user_guide/optimization.md
@@ -44,7 +44,7 @@ single GPU.
 Unless you already have a client application suitable for measuring
 the performance of your model on Triton, you should familiarize
 yourself with
-[Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md).
+[Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md).
 The Performance Analyzer is an essential tool for optimizing your model's
 performance.
 

diff --git a/docs/user_guide/performance_tuning.md b/docs/user_guide/performance_tuning.md
@@ -73,7 +73,7 @@ For additional material, see the
     verify that we can run inference requests and get a baseline performance
     benchmark of your model.
     Triton's
-    [Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+    [Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
     tool specifically fits this purpose. Here is a simplified output for
     demonstration purposes:
 
@@ -103,7 +103,7 @@ For additional material, see the
     There are many variables that can be tweaked just within your model
     configuration (`config.pbtxt`) to obtain different results.
     - As your model, config, or use case evolves,
-    [Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+    [Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
     is a great tool to quickly verify model functionality and performance.
 
 3. How can I improve my model performance?