You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While text_embedding and sparse_embedding are the first candidates to allow chunking at the inference call, we should also think about rerank. The current strategy today for Elasticsearch rerank is to truncate and we don't apply chunking in the text similarity retriever either. With chunking implemented in the rerank API we could also allow to extract the best fragments of the text with no additional cost and that's adaptable for any rerank provider.
Rerank is a good place to start but it is not straightforward for the Elastic reranker which is a cross-encoder model processing both query and document at the same time. The combined token count must be < 512 and the query length is variable which makes chunking the document more difficult, either it is done dynamically once the query is known or we pick a low number - say 256 tokens and truncate the query at 256 tokens.
Cohere truncate docs after 4096 tokens which is large enough for most reasonable chunk sizes
The purpose of this issue is to investigate how we can add chunking to rerank, design the solution, and implement it.
The text was updated successfully, but these errors were encountered:
Description
While text_embedding and sparse_embedding are the first candidates to allow chunking at the inference call, we should also think about rerank. The current strategy today for Elasticsearch rerank is to truncate and we don't apply chunking in the text similarity retriever either. With chunking implemented in the rerank API we could also allow to extract the best fragments of the text with no additional cost and that's adaptable for any rerank provider.
Rerank is a good place to start but it is not straightforward for the Elastic reranker which is a cross-encoder model processing both query and document at the same time. The combined token count must be < 512 and the query length is variable which makes chunking the document more difficult, either it is done dynamically once the query is known or we pick a low number - say 256 tokens and truncate the query at 256 tokens.
Cohere truncate docs after 4096 tokens which is large enough for most reasonable chunk sizes
The purpose of this issue is to investigate how we can add chunking to rerank, design the solution, and implement it.
The text was updated successfully, but these errors were encountered: