diff --git a/docs/README.md b/docs/README.md index a1150e17..a43cde21 100644 --- a/docs/README.md +++ b/docs/README.md @@ -47,4 +47,4 @@ The User Guide describes how to configure Model Analyzer, choose launch and sear The following resources are recommended: -- [Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md): Perf Analyzer is a CLI application built to generate inference requests and measures the latency of those requests and throughput of the model being served. +- [Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md): Perf Analyzer is a CLI application built to generate inference requests and measures the latency of those requests and throughput of the model being served. diff --git a/docs/config.md b/docs/config.md index 4de28027..75140cb5 100644 --- a/docs/config.md +++ b/docs/config.md @@ -718,7 +718,7 @@ but profile `model_2` using GPU. This field allows the user to pass `perf_analyzer` any CLI options it needs to execute properly. Refer to [the `perf_analyzer` -docs](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) +docs](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) for more information on these options. ### Global options to apply to all instances of Perf Analyzer @@ -779,7 +779,7 @@ perf_analyzer_flags: If a model configuration has variable-sized dimensions in the inputs section, then the `shape` option of the `perf_analyzer_flags` option must be specified. More information about this can be found in the -[Perf Analyzer documentation](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md#input-data). +[Perf Analyzer documentation](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/input_data.md). ### SSL Support: @@ -810,7 +810,7 @@ profile_models: ``` More information about this can be found in the -[Perf Analyzer documentation](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md#ssltls-support). +[Perf Analyzer documentation](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/measurements_metrics.md#ssltls-support). #### **Important Notes**: diff --git a/docs/config_search.md b/docs/config_search.md index 80e47aa1..bc6aac55 100644 --- a/docs/config_search.md +++ b/docs/config_search.md @@ -98,7 +98,7 @@ It has two modes: The parameters that are automatically searched are [model maximum batch size](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#maximum-batch-size), -[model instance groups](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#instance-groups), and [request concurrencies](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/perf_analyzer.md#request-concurrency). +[model instance groups](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#instance-groups), and [request concurrencies](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/cli.md#measurement-options). Additionally, [dynamic_batching](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#dynamic-batcher) will be enabled if it is legal to do so. _An example model analyzer YAML config that performs an Automatic Brute Search:_ @@ -128,13 +128,13 @@ You can also modify the minimum/maximum values that the automatic search space w --- -### [Request Concurrency Search Space](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/docs/inference_load_modes.md#concurrency-mode)) +### [Request Concurrency Search Space](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/inference_load_modes.md#concurrency-mode) - `Default:` 1 to 1024 concurrencies, sweeping over powers of 2 (i.e. 1, 2, 4, 8, ...) - `--run-config-search-min-concurrency: `: Changes the request concurrency minimum automatic search space value - `--run-config-search-max-concurrency: `: Changes the request concurrency maximum automatic search space value -### [Request Rate Search Space](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/docs/inference_load_modes.md#request-rate-mode) +### [Request Rate Search Space](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/inference_load_modes.md#request-rate-mode) - `Default:` 1 to 1024 concurrencies, sweeping over powers of 2 (i.e. 1, 2, 4, 8, ...) - `--run-config-search-min-request-rate: `: Changes the request rate minimum automatic search space value @@ -422,7 +422,7 @@ _This mode has the following limitations:_ - Summary/Detailed reports do not include the new metrics -In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/genai-perf/README.md#cli) documentation for a list of the flags that can be specified. +In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/perf_analyzer/blob/main/genai-perf/README.md#command-line-options) documentation for a list of the flags that can be specified. LLMs can be optimized using either Quick or Brute search mode. diff --git a/docs/metrics.md b/docs/metrics.md index 18b87dfc..40a29d18 100644 --- a/docs/metrics.md +++ b/docs/metrics.md @@ -24,7 +24,7 @@ tags, which are used in various places to configure Model Analyzer. These metrics come from the perf analyzer and are parsed and processed by the model analyzer. See the [perf analyzer -docs](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) +docs](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) for more info on these * `perf_throughput`: The number of inferences per second measured by the perf diff --git a/docs/model_types.md b/docs/model_types.md index 6af0a09f..03ac3743 100644 --- a/docs/model_types.md +++ b/docs/model_types.md @@ -119,7 +119,7 @@ _Profiling this model type has the following limitations:_ - Summary/Detailed reports do not include the new metrics -In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/genai-perf/README.md#cli) documentation for a list of the flags that can be specified. +In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/perf_analyzer/blob/main/genai-perf/README.md) documentation for a list of the flags that can be specified. LLMs can be optimized using either Quick or Brute search mode.