From c4f9c8782c87794d0146827047b387c9501c5ebb Mon Sep 17 00:00:00 2001 From: fpetrini15 Date: Wed, 31 Jul 2024 16:04:52 -0700 Subject: [PATCH 1/5] Purge PA from Client Repo --- docs/user_guide/perf_analyzer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/user_guide/perf_analyzer.md b/docs/user_guide/perf_analyzer.md index 7019d51c63..19a89599e5 100644 --- a/docs/user_guide/perf_analyzer.md +++ b/docs/user_guide/perf_analyzer.md @@ -27,4 +27,4 @@ --> Perf Analyzer documentation has been relocated to -[here](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md). +[here](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md). From b77b849385830ff5e14f521a576ea84b5337ff4f Mon Sep 17 00:00:00 2001 From: fpetrini15 Date: Thu, 1 Aug 2024 09:26:37 -0700 Subject: [PATCH 2/5] Point to PA repo --- qa/L0_perf_analyzer_doc_links/test.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/qa/L0_perf_analyzer_doc_links/test.sh b/qa/L0_perf_analyzer_doc_links/test.sh index db80e84974..d0a2c61290 100755 --- a/qa/L0_perf_analyzer_doc_links/test.sh +++ b/qa/L0_perf_analyzer_doc_links/test.sh @@ -35,8 +35,8 @@ python3 -m pip install mkdocs-htmlproofer-plugin==0.10.3 #Download perf_analyzer docs TRITON_REPO_ORGANIZATION=${TRITON_REPO_ORGANIZATION:="http://github.com/triton-inference-server"} -TRITON_CLIENT_REPO_TAG="${TRITON_CLIENT_REPO_TAG:=main}" -git clone -b ${TRITON_CLIENT_REPO_TAG} ${TRITON_REPO_ORGANIZATION}/client.git +TRITON_PERF_ANALYZER_REPO_TAG="${TRITON_PERF_ANALYZER_REPO_TAG:=main}" +git clone -b ${TRITON_PERF_ANALYZER_REPO_TAG} ${TRITON_REPO_ORGANIZATION}/perf_analyzer.git cp `pwd`/client/src/c++/perf_analyzer/README.md . cp -rf `pwd`/client/src/c++/perf_analyzer/docs . From 210723ca27d528b798ca75f9c7aaf5b8394fe8fe Mon Sep 17 00:00:00 2001 From: fpetrini15 Date: Thu, 1 Aug 2024 12:36:23 -0700 Subject: [PATCH 3/5] Change search path --- qa/L0_perf_analyzer_doc_links/test.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/qa/L0_perf_analyzer_doc_links/test.sh b/qa/L0_perf_analyzer_doc_links/test.sh index d0a2c61290..1e2cdd9bb2 100755 --- a/qa/L0_perf_analyzer_doc_links/test.sh +++ b/qa/L0_perf_analyzer_doc_links/test.sh @@ -37,8 +37,8 @@ python3 -m pip install mkdocs-htmlproofer-plugin==0.10.3 TRITON_REPO_ORGANIZATION=${TRITON_REPO_ORGANIZATION:="http://github.com/triton-inference-server"} TRITON_PERF_ANALYZER_REPO_TAG="${TRITON_PERF_ANALYZER_REPO_TAG:=main}" git clone -b ${TRITON_PERF_ANALYZER_REPO_TAG} ${TRITON_REPO_ORGANIZATION}/perf_analyzer.git -cp `pwd`/client/src/c++/perf_analyzer/README.md . -cp -rf `pwd`/client/src/c++/perf_analyzer/docs . +cp `pwd`/perf_analyzer/README.md . +cp -rf `pwd`/perf_analyzer/docs . # Need to remove all links that start with -- or -. Mkdocs converts all -- to - for anchor links. # This breaks all links to cli commands throughout the docs. This will iterate over all From 52eef916496539e5927996bd3bdcdd669a82c036 Mon Sep 17 00:00:00 2001 From: fpetrini15 Date: Mon, 5 Aug 2024 16:25:36 -0700 Subject: [PATCH 4/5] Doc updates --- README.md | 2 +- deploy/gke-marketplace-app/README.md | 2 +- deploy/k8s-onprem/README.md | 2 +- docs/README.md | 4 ++-- docs/contents.md | 29 ++++++++++++++++---------- docs/examples/jetson/README.md | 4 ++-- docs/generate_docs.py | 4 ++++ docs/user_guide/debugging_guide.md | 2 +- docs/user_guide/faq.md | 2 +- docs/user_guide/jetson.md | 2 +- docs/user_guide/model_analyzer.md | 2 +- docs/user_guide/model_configuration.md | 2 +- docs/user_guide/optimization.md | 2 +- docs/user_guide/performance_tuning.md | 4 ++-- 14 files changed, 37 insertions(+), 26 deletions(-) diff --git a/README.md b/README.md index 17628b4f03..2200886a20 100644 --- a/README.md +++ b/README.md @@ -179,7 +179,7 @@ configuration](docs/user_guide/model_configuration.md) for the model. [Backend-Platform Support Matrix](https://github.com/triton-inference-server/backend/blob/main/docs/backend_platform_support_matrix.md) to learn which backends are supported on your target platform. - Learn how to [optimize performance](docs/user_guide/optimization.md) using the - [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) + [Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) and [Model Analyzer](https://github.com/triton-inference-server/model_analyzer) - Learn how to [manage loading and unloading models](docs/user_guide/model_management.md) in diff --git a/deploy/gke-marketplace-app/README.md b/deploy/gke-marketplace-app/README.md index e99b9efbae..01519c9114 100644 --- a/deploy/gke-marketplace-app/README.md +++ b/deploy/gke-marketplace-app/README.md @@ -172,7 +172,7 @@ The client example push about ~650 QPS(Query per second) to Triton Server, and w ![Locust Client Chart](client.png) Alternatively, user can opt to use -[Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) +[Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) to profile and study the performance of Triton Inference Server. Here we also provide a [client script](https://github.com/triton-inference-server/server/tree/master/deploy/gke-marketplace-app/client-sample/perf_analyzer_grpc.sh) diff --git a/deploy/k8s-onprem/README.md b/deploy/k8s-onprem/README.md index 4287b23c35..ba78df498a 100644 --- a/deploy/k8s-onprem/README.md +++ b/deploy/k8s-onprem/README.md @@ -295,7 +295,7 @@ Image 'images/mug.jpg': After you have confirmed that your Triton cluster is operational and can perform inference, you can test the load balancing and autoscaling features by sending a heavy load of requests. One option for doing this is using the -[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) +[perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) application. You can apply a progressively increasing load with a command like: diff --git a/docs/README.md b/docs/README.md index 22e0c0d691..9826c1fef8 100644 --- a/docs/README.md +++ b/docs/README.md @@ -173,7 +173,7 @@ Understanding Inference performance is key to better resource utilization. Use T - [Performance Tuning Guide](user_guide/performance_tuning.md) - [Optimization](user_guide/optimization.md) - [Model Analyzer](user_guide/model_analyzer.md) -- [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) +- [Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) - [Inference Request Tracing](user_guide/trace.md) ### Jetson and JetPack Triton can be deployed on edge devices. Explore [resources](user_guide/jetson.md) and [examples](examples/jetson/README.md). @@ -185,7 +185,7 @@ The following resources are recommended to explore the full suite of Triton Infe - **Configuring Deployment**: Triton comes with three tools which can be used to configure deployment setting, measure performance and recommend optimizations. - [Model Analyzer](https://github.com/triton-inference-server/model_analyzer) Model Analyzer is CLI tool built to recommend deployment configurations for Triton Inference Server based on user's Quality of Service Requirements. It also generates detailed reports about model performance to summarize the benefits and trade offs of different configurations. - - [Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md): + - [Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md): Perf Analyzer is a CLI application built to generate inference requests and measures the latency of those requests and throughput of the model being served. diff --git a/docs/contents.md b/docs/contents.md index cf5653340d..581cae7f7e 100644 --- a/docs/contents.md +++ b/docs/contents.md @@ -119,17 +119,24 @@ client/src/grpc_generated/java/README :maxdepth: 1 :caption: Performance Analyzer -client/src/c++/perf_analyzer/README -client/src/c++/perf_analyzer/docs/README -client/src/c++/perf_analyzer/docs/install -client/src/c++/perf_analyzer/docs/quick_start -client/src/c++/perf_analyzer/docs/cli -client/src/c++/perf_analyzer/docs/inference_load_modes -client/src/c++/perf_analyzer/docs/input_data -client/src/c++/perf_analyzer/docs/measurements_metrics -client/src/c++/perf_analyzer/docs/benchmarking -client/src/c++/perf_analyzer/genai-perf/README -client/src/c++/perf_analyzer/genai-perf/examples/tutorial +perf_analyzer/README +perf_analyzer/docs/README +perf_analyzer/docs/install +perf_analyzer/docs/quick_start +perf_analyzer/docs/cli +perf_analyzer/docs/inference_load_modes +perf_analyzer/docs/input_data +perf_analyzer/docs/measurements_metrics +perf_analyzer/docs/benchmarking +perf_analyzer/genai-perf/README +perf_analyzer/genai-perf/docs/compare +perf_analyzer/genai-perf/docs/embeddings +perf_analyzer/genai-perf/docs/files +perf_analyzer/genai-perf/docs/lora +perf_analyzer/genai-perf/docs/multi_modal +perf_analyzer/genai-perf/docs/rankings +perf_analyzer/genai-perf/docs/tutorial +perf_analyzer/genai-perf/examples/tutorial ``` ```{toctree} diff --git a/docs/examples/jetson/README.md b/docs/examples/jetson/README.md index 281d5f2a97..b3064a8e28 100644 --- a/docs/examples/jetson/README.md +++ b/docs/examples/jetson/README.md @@ -53,7 +53,7 @@ Inference Server as a shared library. ## Part 2. Analyzing model performance with perf_analyzer To analyze model performance on Jetson, -[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) +[perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) tool is used. The `perf_analyzer` is included in the release tar file or can be compiled from source. @@ -65,4 +65,4 @@ From this directory of the repository, execute the following to evaluate model p In the example above we saved the results as a `.csv` file. To visualize these results, follow the steps described -[here](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md). +[here](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md). diff --git a/docs/generate_docs.py b/docs/generate_docs.py index 1cc6644fde..6982294d21 100755 --- a/docs/generate_docs.py +++ b/docs/generate_docs.py @@ -388,6 +388,10 @@ def main(): if "client" in repo_tags: clone_from_github("client", repo_tags["client"], github_org) + # Usage generate_docs.py --repo-tag=perf_analyzer:main + if "perf_analyzer" in repo_tags: + clone_from_github("perf_analyzer", repo_tags["perf_analyzer"], github_org) + # Usage generate_docs.py --repo-tag=python_backend:main if "python_backend" in repo_tags: clone_from_github("python_backend", repo_tags["python_backend"], github_org) diff --git a/docs/user_guide/debugging_guide.md b/docs/user_guide/debugging_guide.md index 3a38f209d3..8305e3ac95 100644 --- a/docs/user_guide/debugging_guide.md +++ b/docs/user_guide/debugging_guide.md @@ -59,7 +59,7 @@ Before proceeding, please see if the model configuration documentation [here](./ - [Custom_models](https://github.com/triton-inference-server/server/tree/main/qa/custom_models), [ensemble_models](https://github.com/triton-inference-server/server/tree/main/qa/ensemble_models), and [python_models](https://github.com/triton-inference-server/server/tree/main/qa/python_models) include examples of configs for their respective use cases. - [L0_model_config](https://github.com/triton-inference-server/server/tree/main/qa/L0_model_config) tests many types of incomplete model configs. -Note that if you are running into an issue with [perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/README.md) or [Model Analyzer](https://github.com/triton-inference-server/model_analyzer), try loading the model onto Triton directly. This checks if the configuration is incorrect or the perf_analyzer or Model Analyzer options need to be updated. +Note that if you are running into an issue with [perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) or [Model Analyzer](https://github.com/triton-inference-server/model_analyzer), try loading the model onto Triton directly. This checks if the configuration is incorrect or the perf_analyzer or Model Analyzer options need to be updated. ## Model Issues **Step 1. Run Models Outside of Triton** diff --git a/docs/user_guide/faq.md b/docs/user_guide/faq.md index 523b38f750..7119604c7b 100644 --- a/docs/user_guide/faq.md +++ b/docs/user_guide/faq.md @@ -99,7 +99,7 @@ available through the [HTTP/REST, GRPC, and C APIs](../customization_guide/inference_protocols.md). A client application, -[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md), +[perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md), allows you to measure the performance of an individual model using a synthetic load. The perf_analyzer application is designed to show you the tradeoff of latency vs. throughput. diff --git a/docs/user_guide/jetson.md b/docs/user_guide/jetson.md index cda1da111d..e2b2b0ad34 100644 --- a/docs/user_guide/jetson.md +++ b/docs/user_guide/jetson.md @@ -201,7 +201,7 @@ tritonserver --model-repository=/path/to/model_repo --backend-directory=/path/to ``` **Note**: -[perf_analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) +[perf_analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) is supported on Jetson, while the [model_analyzer](model_analyzer.md) is currently not available for Jetson. To execute `perf_analyzer` for C API, use the CLI flag `--service-kind=triton_c_api`: diff --git a/docs/user_guide/model_analyzer.md b/docs/user_guide/model_analyzer.md index 663a8a277a..6f1668713a 100644 --- a/docs/user_guide/model_analyzer.md +++ b/docs/user_guide/model_analyzer.md @@ -30,7 +30,7 @@ The Triton [Model Analyzer](https://github.com/triton-inference-server/model_analyzer) is a tool that uses -[Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) +[Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) to send requests to your model while measuring GPU memory and compute utilization. The Model Analyzer is specifically useful for characterizing the GPU memory requirements for your model under different batching and model diff --git a/docs/user_guide/model_configuration.md b/docs/user_guide/model_configuration.md index e7a2d29c3c..1b0e64a533 100644 --- a/docs/user_guide/model_configuration.md +++ b/docs/user_guide/model_configuration.md @@ -934,7 +934,7 @@ dynamic batcher configurations. ``` * Use the - [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) + [Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) to determine the latency and throughput provided by the default dynamic batcher configuration. diff --git a/docs/user_guide/optimization.md b/docs/user_guide/optimization.md index f842198a90..307374de89 100644 --- a/docs/user_guide/optimization.md +++ b/docs/user_guide/optimization.md @@ -44,7 +44,7 @@ single GPU. Unless you already have a client application suitable for measuring the performance of your model on Triton, you should familiarize yourself with -[Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md). +[Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md). The Performance Analyzer is an essential tool for optimizing your model's performance. diff --git a/docs/user_guide/performance_tuning.md b/docs/user_guide/performance_tuning.md index 49cad9e637..446534da99 100644 --- a/docs/user_guide/performance_tuning.md +++ b/docs/user_guide/performance_tuning.md @@ -73,7 +73,7 @@ For additional material, see the verify that we can run inference requests and get a baseline performance benchmark of your model. Triton's - [Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) + [Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) tool specifically fits this purpose. Here is a simplified output for demonstration purposes: @@ -103,7 +103,7 @@ For additional material, see the There are many variables that can be tweaked just within your model configuration (`config.pbtxt`) to obtain different results. - As your model, config, or use case evolves, - [Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) + [Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) is a great tool to quickly verify model functionality and performance. 3. How can I improve my model performance? From 6979fbccc5f55ee07ca7c3f3f345dcfebaf04150 Mon Sep 17 00:00:00 2001 From: fpetrini15 Date: Mon, 5 Aug 2024 16:47:41 -0700 Subject: [PATCH 5/5] Copyright changes --- deploy/gke-marketplace-app/README.md | 2 +- deploy/k8s-onprem/README.md | 2 +- docs/README.md | 2 +- docs/contents.md | 2 +- docs/examples/jetson/README.md | 2 +- docs/user_guide/debugging_guide.md | 2 +- docs/user_guide/faq.md | 2 +- docs/user_guide/model_analyzer.md | 2 +- docs/user_guide/optimization.md | 2 +- docs/user_guide/perf_analyzer.md | 2 +- qa/L0_perf_analyzer_doc_links/test.sh | 2 +- 11 files changed, 11 insertions(+), 11 deletions(-) diff --git a/deploy/gke-marketplace-app/README.md b/deploy/gke-marketplace-app/README.md index 01519c9114..595d4634ab 100644 --- a/deploy/gke-marketplace-app/README.md +++ b/deploy/gke-marketplace-app/README.md @@ -1,5 +1,5 @@