From 0ec42e39b237bee00dfdf483d8c48d16809bda99 Mon Sep 17 00:00:00 2001 From: pvijayakrish Date: Tue, 13 Aug 2024 12:02:16 -0700 Subject: [PATCH] Revert "Update batching explanation in docs (#36)" This reverts commit a812e25fbb2c3a448a749d00638b7a86dade354c. --- genai-perf/README.md | 7 ------- genai-perf/docs/embeddings.md | 12 ------------ 2 files changed, 19 deletions(-) diff --git a/genai-perf/README.md b/genai-perf/README.md index 864dfcfb..617d58cf 100644 --- a/genai-perf/README.md +++ b/genai-perf/README.md @@ -335,13 +335,6 @@ You can optionally set additional model inputs with the following option: model with a singular value, such as `stream:true` or `max_tokens:5`. This flag can be repeated to supply multiple extra inputs. -For [Large Language Models](docs/tutorial.md), there is no batch size (i.e. -batch size is always `1`). Each request includes the inputs for one individual -inference. Other modes such as the [embeddings](docs/embeddings.md) and -[rankings](docs/rankings.md) endpoints support client-side batching, where -`--batch-size N` means that each request sent will include the inputs for `N` -separate inferences, allowing them to be processed together. -