triton-inference-server · fpetrini15 · Aug 6, 2024 · Aug 5, 2024 · Aug 5, 2024
diff --git a/docs/README.md b/docs/README.md
@@ -1,5 +1,5 @@
 <!--
-Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+Copyright (c) 2020-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
@@ -47,4 +47,4 @@ The User Guide describes how to configure Model Analyzer, choose launch and sear
 
 The following resources are recommended:
 
-- [Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md): Perf Analyzer is a CLI application built to generate inference requests and measures the latency of those requests and throughput of the model being served.
+- [Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md): Perf Analyzer is a CLI application built to generate inference requests and measures the latency of those requests and throughput of the model being served.
diff --git a/docs/config.md b/docs/config.md
@@ -1,5 +1,5 @@
 <!--
-Copyright (c) 2020-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+Copyright (c) 2020-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
@@ -718,7 +718,7 @@ but profile `model_2` using GPU.
 This field allows the user to pass `perf_analyzer` any CLI options it needs to
 execute properly. Refer to [the
 `perf_analyzer`
-docs](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+docs](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
 for more information on these options.
 
 ### Global options to apply to all instances of Perf Analyzer
@@ -779,7 +779,7 @@ perf_analyzer_flags:
 If a model configuration has variable-sized dimensions in the inputs section,
 then the `shape` option of the `perf_analyzer_flags` option must be specified.
 More information about this can be found in the
-[Perf Analyzer documentation](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md#input-data).
+[Perf Analyzer documentation](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/input_data.md).
 
 ### SSL Support:
 
@@ -810,7 +810,7 @@ profile_models:
 ```
 
 More information about this can be found in the
-[Perf Analyzer documentation](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md#ssltls-support).
+[Perf Analyzer documentation](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/measurements_metrics.md#ssltls-support).
 
 #### **Important Notes**:
 

diff --git a/docs/config_search.md b/docs/config_search.md
@@ -98,7 +98,7 @@ It has two modes:
 
 The parameters that are automatically searched are
 [model maximum batch size](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#maximum-batch-size),
-[model instance groups](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#instance-groups), and [request concurrencies](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/perf_analyzer.md#request-concurrency).
+[model instance groups](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#instance-groups), and [request concurrencies](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/cli.md#measurement-options).
 Additionally, [dynamic_batching](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#dynamic-batcher) will be enabled if it is legal to do so.
 
 _An example model analyzer YAML config that performs an Automatic Brute Search:_
@@ -128,13 +128,13 @@ You can also modify the minimum/maximum values that the automatic search space w
 
 ---
 
-### [Request Concurrency Search Space](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/docs/inference_load_modes.md#concurrency-mode))
+### [Request Concurrency Search Space](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/inference_load_modes.md#concurrency-mode)
 
 - `Default:` 1 to 1024 concurrencies, sweeping over powers of 2 (i.e. 1, 2, 4, 8, ...)
 - `--run-config-search-min-concurrency: <val>`: Changes the request concurrency minimum automatic search space value
 - `--run-config-search-max-concurrency: <val>`: Changes the request concurrency maximum automatic search space value
 
-### [Request Rate Search Space](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/docs/inference_load_modes.md#request-rate-mode)
+### [Request Rate Search Space](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/inference_load_modes.md#request-rate-mode)
 
 - `Default:` 1 to 1024 concurrencies, sweeping over powers of 2 (i.e. 1, 2, 4, 8, ...)
 - `--run-config-search-min-request-rate: <val>`: Changes the request rate minimum automatic search space value
@@ -422,7 +422,7 @@ _This mode has the following limitations:_
 
 - Summary/Detailed reports do not include the new metrics
 
-In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/genai-perf/README.md#cli) documentation for a list of the flags that can be specified.
+In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/perf_analyzer/blob/main/genai-perf/README.md#command-line-options) documentation for a list of the flags that can be specified.
 
 LLMs can be optimized using either Quick or Brute search mode.
 

diff --git a/docs/metrics.md b/docs/metrics.md
@@ -1,5 +1,5 @@
 <!--
-Copyright (c) 2020-2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+Copyright (c) 2020-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
@@ -24,7 +24,7 @@ tags, which are used in various places to configure Model Analyzer.
 
 These metrics come from the perf analyzer and are parsed and processed by the
 model analyzer. See the [perf analyzer
-docs](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
+docs](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
 for more info on these
 
 * `perf_throughput`: The number of inferences per second measured by the perf

diff --git a/docs/model_types.md b/docs/model_types.md
@@ -119,7 +119,7 @@ _Profiling this model type has the following limitations:_
 
 - Summary/Detailed reports do not include the new metrics
 
-In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/genai-perf/README.md#cli) documentation for a list of the flags that can be specified.
+In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/perf_analyzer/blob/main/genai-perf/README.md) documentation for a list of the flags that can be specified.
 
 LLMs can be optimized using either Quick or Brute search mode.