Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove optimization for duplicate entities in online retrieval #4474

Open
tokoko opened this issue Sep 2, 2024 · 4 comments
Open

Remove optimization for duplicate entities in online retrieval #4474

tokoko opened this issue Sep 2, 2024 · 4 comments

Comments

@tokoko
Copy link
Collaborator

tokoko commented Sep 2, 2024

There's an optimization in online store codebase which looks for unique entity values in the input of get_online_features and deduplicates them prior to invoking online_read call. The optimization was introduced as part of #2223. While probably beneficial in some niche use cases, the code needed for this is too complex and makes it hard to make further changes.

I'm planning to work on refactoring in order to allow individual online store implementations to get data for multiple feature views at once which is currently impossible. (#3259) I'm thinking of removing the optimization first and maybe reintroducing it later after all the other changes are merged.

@judahrand
Copy link
Member

I agree that this optimization makes it tricky to do other refactoring (although it wasn't exactly easy before this was added). However, I also think that it would be a real shame to lose this optimization 'long term'. It can have a significant saving when batching calls to the feature store. I also don't think that pushing it down into the online store layer makes sense as in that case each store would have to reimplement the logic.

@tokoko
Copy link
Collaborator Author

tokoko commented Sep 2, 2024

yeah... I agree, this is probably not even the biggest offender there tbh, but it's too intertwined with the code that prepares the final input for a read_online call, so any meaningful change there still means making a lot of changes in this optimization as well.

I'll definitely revisit adding it back in, it will probably just have to move up a bit in the workflow, it should probably happen for all entities together rather than on an individual feature view level as it's implemented right now. That way it could still be left out of a concrete online store layer.

thanks for the feedback, that's really valuable.

@franciscojavierarceo
Copy link
Member

Actually looking at the online read code right now it's kind of wild. I'm sure it's designed for good reason but a lot of serialization and variable initialization happens in places you wouldn't expect.

@tokoko
Copy link
Collaborator Author

tokoko commented Sep 3, 2024

@judahrand Apparently dynamodb doesn't accept duplicates in BatchGetItem, so some sort of deduplication (at least for dynamo) looks unavoidable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants