From f382ff9b3f78198349452a002b1eebde7b5903c4 Mon Sep 17 00:00:00 2001 From: Sean Story Date: Tue, 30 Jan 2024 01:26:31 -0600 Subject: [PATCH] Suggest chunking for large ELSER fields (#2660) (cherry picked from commit f4dacc9dd2b116377ceea3c2707ad1f97356f582) --- docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc index 80e37da4a..3596e58e4 100644 --- a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc +++ b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc @@ -408,9 +408,11 @@ image::images/ml-nlp-elser-v2-test.png[alt="Testing ELSER",align="center"] * ELSER works best on small-to-medium sized fields that contain natural language. For connector or web crawler use cases, this aligns best with fields like _title_, _description_, _summary_, or _abstract_. As ELSER encodes the -first 512 tokens of a field, it may not be as good a match for `body_content` on -web crawler documents, or body fields resulting from extracting text from office -documents with connectors. +first 512 tokens of a field, it may not provide as relevant of results for large +fields. For example, `body_content` on web crawler documents, or body fields +resulting from extracting text from office documents with connectors. For larger +fields like these, consider "chunking" the content into multiple values, where +each chunk can be under 512 tokens. * Larger documents take longer at ingestion time, and {infer} time per document also increases the more fields in a document that need to be processed. * The more fields your pipeline has to perform inference on, the longer it takes @@ -521,4 +523,4 @@ image::images/ml-nlp-elser-v2-opt-bm-results.png[alt="ELSER V2 optimized benchma respectively 14 docs/s and 16 docs/s, indicating a performance improvement due to virtual cores of 12%. -image::images/ml-nlp-elser-v2-cp-bm-results.png[alt="ELSER V2 cross-platform benchmarks",align="center"] \ No newline at end of file +image::images/ml-nlp-elser-v2-cp-bm-results.png[alt="ELSER V2 cross-platform benchmarks",align="center"]