[ML] Chunking for the reranker #121567

dan-rubinstein · 2025-02-03T20:07:07Z

Description

While text_embedding and sparse_embedding are the first candidates to allow chunking at the inference call, we should also think about rerank. The current strategy today for Elasticsearch rerank is to truncate and we don't apply chunking in the text similarity retriever either. With chunking implemented in the rerank API we could also allow to extract the best fragments of the text with no additional cost and that's adaptable for any rerank provider.

Rerank is a good place to start but it is not straightforward for the Elastic reranker which is a cross-encoder model processing both query and document at the same time. The combined token count must be < 512 and the query length is variable which makes chunking the document more difficult, either it is done dynamically once the query is known or we pick a low number - say 256 tokens and truncate the query at 256 tokens.
Cohere truncate docs after 4096 tokens which is large enough for most reasonable chunk sizes

The purpose of this issue is to investigate how we can add chunking to rerank, design the solution, and implement it.

elasticsearchmachine · 2025-02-03T20:07:38Z

Pinging @elastic/ml-core (Team:ML)

dan-rubinstein added >enhancement needs:triage Requires assignment of a team area label :ml Machine learning Team:ML Meta label for the ML team labels Feb 3, 2025

elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Feb 3, 2025

dimkots assigned dan-rubinstein Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Chunking for the reranker #121567

[ML] Chunking for the reranker #121567

dan-rubinstein commented Feb 3, 2025 •

edited

Loading

elasticsearchmachine commented Feb 3, 2025

[ML] Chunking for the reranker #121567

[ML] Chunking for the reranker #121567

Comments

dan-rubinstein commented Feb 3, 2025 • edited Loading

Description

elasticsearchmachine commented Feb 3, 2025

dan-rubinstein commented Feb 3, 2025 •

edited

Loading