Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Optimize embedding processors for update scenario #793

Open
martin-gaievski opened this issue Jun 14, 2024 · 2 comments
Open
Assignees
Labels
enhancement Enhancements Increases software capabilities beyond original client specifications

Comments

@martin-gaievski
Copy link
Member

What/Why

What are you proposing?

Text embedding processors can be more intelligent and skip unnecessary calls to model in update scenarios.

What users have asked for this feature?

I do have one customer case where they are doing following:

  • configure remote model
  • configure text embedding processor and attach it to ingest pipeline that is set as default at index level
  • ingest document, that creates embeddings
  • update some document fields not related to embedding field or original field that is base for embedding generation

our text embedding processor does not analyze document state and what exactly change, it just calls the model again. That doesn't add any value as it receives same embeddings and set them for new document. That model call can be avoided by simply copying embeddings from original document.

What problems are you trying to solve?

We can save on number of calls to remote model. For end customer that means:

  • lower bill as they typically pay by number of calls
  • increase stability as system can reach rate limit and other calls can be throttled by the model

What is the developer experience going to be?

We need a way to make this behavior configurable. In some cases you need to regenerate embeddings even if they previously were in the document, for example you've deployed updated version of the model.
I suggest following logic:

  • if embeddings in current document are empty - always make a call to the model
  • if embeddings are not empty - check the flag. if flag says 'update' then do the call. Otherwise copy embeddings from original document.
  • default behavior should be - 'always update`. This will ensure backward compatibility with today's code
  • processor behavior of update/not update should be configurable at the processor level to allow flexibility, so different processors may be configured differently in one cluster.

Are there any security considerations?

No security concerns regarding making calls to model, this functionality exists today. New flag that regulatesd processor behavior should be added in a safely manner.

Are there any breaking changes to the API

No, as per suggested logic today's behavior will remain the same

What is the user experience going to be?

If user wants to fine tune the processor behavior they will need to set a new processor parameter.

Are there breaking changes to the User Experience?

No

Why should it be built? Any reason not to?

Main reason is saving cost on calls to model and lowering chances of requests being throttled. If customer having such problems today they may not even realize the system is making unnecessary calls to model.

What will it take to execute?

This will be a code change in the plugin code, for every processor that we want to onboard for this feature. change is something like:

  • add parameter to processor factory so new behavior can be enabled
  • modify processor as per suggested logic - check if in original document the field for embedding exists, if it not - call the model, today's logic. if it does - check the param, if it says not to update - return, otherwise call the model.

Any remaining open questions?

Some edge cases may be:

@navneet1v navneet1v moved this from Backlog to Backlog (Hot) in Vector Search RoadMap Jun 14, 2024
@dblock dblock added Enhancements Increases software capabilities beyond original client specifications enhancement and removed untriaged labels Jul 1, 2024
@dblock
Copy link
Member

dblock commented Jul 1, 2024

[Catch All Triage - Attendees 1, 2, 3, 4, 5]

@zhichao-aws
Copy link
Member

Agree to add a processor-level flag to update. Are there any reasons we shouldn't implement this feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancements Increases software capabilities beyond original client specifications
Projects
Status: Backlog (Hot)
Development

No branches or pull requests

3 participants