From 0ec42e39b237bee00dfdf483d8c48d16809bda99 Mon Sep 17 00:00:00 2001
From: pvijayakrish <pvijayakrish@nvidia.com>
Date: Tue, 13 Aug 2024 12:02:16 -0700
Subject: [PATCH] Revert "Update batching explanation in docs (#36)"

This reverts commit a812e25fbb2c3a448a749d00638b7a86dade354c.
---
 genai-perf/README.md          |  7 -------
 genai-perf/docs/embeddings.md | 12 ------------
 2 files changed, 19 deletions(-)
diff --git a/genai-perf/README.md b/genai-perf/README.md
index 864dfcfb..617d58cf 100644
--- a/genai-perf/README.md
+++ b/genai-perf/README.md
@@ -335,13 +335,6 @@ You can optionally set additional model inputs with the following option:
   model with a singular value, such as `stream:true` or `max_tokens:5`. This
   flag can be repeated to supply multiple extra inputs.
 
-For [Large Language Models](docs/tutorial.md), there is no batch size (i.e.
-batch size is always `1`). Each request includes the inputs for one individual
-inference. Other modes such as the [embeddings](docs/embeddings.md) and
-[rankings](docs/rankings.md) endpoints support client-side batching, where
-`--batch-size N` means that each request sent will include the inputs for `N`
-separate inferences, allowing them to be processed together.
-
 </br>
 
 <!--
diff --git a/genai-perf/docs/embeddings.md b/genai-perf/docs/embeddings.md
index 5845125a..e508f9ef 100644
--- a/genai-perf/docs/embeddings.md
+++ b/genai-perf/docs/embeddings.md
@@ -68,18 +68,6 @@ genai-perf profile \
     --input-file embeddings.jsonl
 ```
 
-* `-m intfloat/e5-mistral-7b-instruct` is to specify what model you want to run
-  (`intfloat/e5-mistral-7b-instruct`)
-* `--service-kind openai` is to specify that the server type is OpenAI-API
-  compatible
-* `--endpoint-type embeddings` is to specify that the sent requests should be
-  formatted to follow the [embeddings
-  API](https://platform.openai.com/docs/api-reference/embeddings/create)
-* `--batch-size 2` is to specify that each request will contain the inputs for 2
-  individual inferences, making a batch size of 2
-* `--input-file embeddings.jsonl` is to specify the input data to be used for
-  inferencing
-
 This will use default values for optional arguments. You can also pass in
 additional arguments with the `--extra-inputs` [flag](../README.md#input-options).
 For example, you could use this command: