diff --git a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc index 007ba5946..8e8a74dc9 100644 --- a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc +++ b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc @@ -441,6 +441,16 @@ To learn more about ELSER performance, refer to the <>. [discrete] +[[pre-cleaning]] +== Pre-cleaning input text + +The quality of the input text significantly affects the quality of the embeddings. +To achieve the best results, it's recommended to clean the input text before generating embeddings. +The exact preprocessing you may need to do heavily depends on your text. +For example, if your text contains HTML tags, use the {ref}/htmlstrip-processor.html[HTML strip processor] in an ingest pipeline to remove unnecessary elements. +Always review and clean your input text before ingestion to eliminate any irrelevant entities that might affect the results. + + [[further-readings]] == Further reading