Skip to content

Commit

Permalink
PA Migration: Doc Updates
Browse files Browse the repository at this point in the history
  • Loading branch information
fpetrini15 committed Aug 5, 2024
1 parent 0094dba commit 4e324f3
Show file tree
Hide file tree
Showing 5 changed files with 10 additions and 10 deletions.
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,4 @@ The User Guide describes how to configure Model Analyzer, choose launch and sear

The following resources are recommended:

- [Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md): Perf Analyzer is a CLI application built to generate inference requests and measures the latency of those requests and throughput of the model being served.
- [Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md): Perf Analyzer is a CLI application built to generate inference requests and measures the latency of those requests and throughput of the model being served.
6 changes: 3 additions & 3 deletions docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -718,7 +718,7 @@ but profile `model_2` using GPU.
This field allows the user to pass `perf_analyzer` any CLI options it needs to
execute properly. Refer to [the
`perf_analyzer`
docs](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
docs](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
for more information on these options.

### Global options to apply to all instances of Perf Analyzer
Expand Down Expand Up @@ -779,7 +779,7 @@ perf_analyzer_flags:
If a model configuration has variable-sized dimensions in the inputs section,
then the `shape` option of the `perf_analyzer_flags` option must be specified.
More information about this can be found in the
[Perf Analyzer documentation](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md#input-data).
[Perf Analyzer documentation](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/input_data.md).

### SSL Support:

Expand Down Expand Up @@ -810,7 +810,7 @@ profile_models:
```

More information about this can be found in the
[Perf Analyzer documentation](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md#ssltls-support).
[Perf Analyzer documentation](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/measurements_metrics.md#ssltls-support).

#### **Important Notes**:

Expand Down
8 changes: 4 additions & 4 deletions docs/config_search.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ It has two modes:

The parameters that are automatically searched are
[model maximum batch size](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#maximum-batch-size),
[model instance groups](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#instance-groups), and [request concurrencies](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/perf_analyzer.md#request-concurrency).
[model instance groups](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#instance-groups), and [request concurrencies](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/cli.md#measurement-options).
Additionally, [dynamic_batching](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#dynamic-batcher) will be enabled if it is legal to do so.

_An example model analyzer YAML config that performs an Automatic Brute Search:_
Expand Down Expand Up @@ -128,13 +128,13 @@ You can also modify the minimum/maximum values that the automatic search space w

---

### [Request Concurrency Search Space](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/docs/inference_load_modes.md#concurrency-mode))
### [Request Concurrency Search Space](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/inference_load_modes.md#concurrency-mode)

- `Default:` 1 to 1024 concurrencies, sweeping over powers of 2 (i.e. 1, 2, 4, 8, ...)
- `--run-config-search-min-concurrency: <val>`: Changes the request concurrency minimum automatic search space value
- `--run-config-search-max-concurrency: <val>`: Changes the request concurrency maximum automatic search space value

### [Request Rate Search Space](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/docs/inference_load_modes.md#request-rate-mode)
### [Request Rate Search Space](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/inference_load_modes.md#request-rate-mode)

- `Default:` 1 to 1024 concurrencies, sweeping over powers of 2 (i.e. 1, 2, 4, 8, ...)
- `--run-config-search-min-request-rate: <val>`: Changes the request rate minimum automatic search space value
Expand Down Expand Up @@ -422,7 +422,7 @@ _This mode has the following limitations:_

- Summary/Detailed reports do not include the new metrics

In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/genai-perf/README.md#cli) documentation for a list of the flags that can be specified.
In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/perf_analyzer/blob/main/genai-perf/README.md#command-line-options) documentation for a list of the flags that can be specified.

LLMs can be optimized using either Quick or Brute search mode.

Expand Down
2 changes: 1 addition & 1 deletion docs/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ tags, which are used in various places to configure Model Analyzer.

These metrics come from the perf analyzer and are parsed and processed by the
model analyzer. See the [perf analyzer
docs](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md)
docs](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md)
for more info on these

* `perf_throughput`: The number of inferences per second measured by the perf
Expand Down
2 changes: 1 addition & 1 deletion docs/model_types.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ _Profiling this model type has the following limitations:_

- Summary/Detailed reports do not include the new metrics

In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/genai-perf/README.md#cli) documentation for a list of the flags that can be specified.
In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/perf_analyzer/blob/main/genai-perf/README.md) documentation for a list of the flags that can be specified.

LLMs can be optimized using either Quick or Brute search mode.

Expand Down

0 comments on commit 4e324f3

Please sign in to comment.