From fc89c6937d4da858b97c4a99457cb37c0a00ff31 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Thu, 19 Sep 2024 10:43:08 +0200 Subject: [PATCH 1/2] [DOCS] Adds pre-cleaning recommendation to ELSER docs. (#2796) (cherry picked from commit 34a6c7b5bf54c1b5af672a9609b8d6c0708f793d) # Conflicts: # docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc --- docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc index 007ba5946..a78a3fcb4 100644 --- a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc +++ b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc @@ -441,6 +441,27 @@ To learn more about ELSER performance, refer to the <>. [discrete] +<<<<<<< HEAD +======= +[[pre-cleaning]] +== Pre-cleaning input text + +The quality of the input text significantly affects the quality of the embeddings. +To achieve the best results, it's recommended to clean the input text before generating embeddings. +The exact preprocessing you may need to do heavily depends on your text. +For example, if your text contains HTML tags, use the {ref}/htmlstrip-processor.html[HTML strip processor] in an ingest pipeline to remove unnecessary elements. +Always review and clean your input text before ingestion to eliminate any irrelevant entities that might affect the results. + + +[discrete] +[[elser-adaptive-allocations]] +== Adaptive allocations + +include::ml-nlp-shared.asciidoc[tag=ml-nlp-adaptive-allocations] + + +[discrete] +>>>>>>> 34a6c7b5 ([DOCS] Adds pre-cleaning recommendation to ELSER docs. (#2796)) [[further-readings]] == Further reading From 3135e8cf8fa66116beced8f88a2ea7e6a2681adb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Thu, 19 Sep 2024 12:06:20 +0200 Subject: [PATCH 2/2] Apply suggestions from code review --- docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc index a78a3fcb4..8e8a74dc9 100644 --- a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc +++ b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc @@ -441,8 +441,6 @@ To learn more about ELSER performance, refer to the <>. [discrete] -<<<<<<< HEAD -======= [[pre-cleaning]] == Pre-cleaning input text @@ -453,15 +451,6 @@ For example, if your text contains HTML tags, use the {ref}/htmlstrip-processor. Always review and clean your input text before ingestion to eliminate any irrelevant entities that might affect the results. -[discrete] -[[elser-adaptive-allocations]] -== Adaptive allocations - -include::ml-nlp-shared.asciidoc[tag=ml-nlp-adaptive-allocations] - - -[discrete] ->>>>>>> 34a6c7b5 ([DOCS] Adds pre-cleaning recommendation to ELSER docs. (#2796)) [[further-readings]] == Further reading