-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OnDemandFeatureViews with RequestSource returns different columns depending on online/offline feature retrieval. #4479
Comments
Oh this is not ideal. Can you post your client usage? Just to make sure I understand it. I don't think it should be included unless explicitly requested though, so probably the |
@franciscojavierarceo Yes, I've just posted a minimal example in the description! As you can see, the offline transformation with pandas returns the additional columns: |
Cool, so I'm going to add a unit test first to reproduce this and then work backwards from there. This minimal example is really helpful. Ideally we can have a fix out before the next release. Also, Job, would you mind having a look at a slightly related (though not really) PR here: #4585 It's going to impact On Demand Feature Views so want to get community feedback. |
Okay, I was able to reproduce the behavior in a uint test here: #4615 I'll make a patch for this tomorrow and update the Also, I'll look at adding the python mode to |
(responding in Job's place as his colleague) Thanks @franciscojavierarceo!
Yes, would be valuable for some of our use cases. Would be great to further align online/offline feature retrieval.
Sorry, saw this message a bit late and see it's already merged. Let me know if I can review a PR for the above mentioned changes! I'll try to make sure I get notified properly by Github when tagged in this repo. |
Of course you still can! |
Looking into this, it looks the complicated piece is that it's going to vary per offline store. 😅 |
@franciscojavierarceo Do you have a link to the piece of code for Postgres? Ideally, there is some contract regardless of which database you are using, so that your ODFV always returns the same set of fields. Allowing you to swap your off/online store without having to change the your feature retrieval code. |
After reflecting upon this further, I think the current behavior makes sense. Online, you want to be space efficient when considering the impact of network latency so you want to send and receive the minimally required set of data, which means after transformation you only need to send the output in the response. Offline, you will probably prioritize reproducibility and being able to iterate, which means you will want the underlying inputs into your transformation, so the extra data is tolerable. I'm going to close this issue. Let me know if you have any concerns. |
I added the test in the PR to highlight this behavior and comment on why it's useful. |
Okay, PR is merged. Closing this out. |
@franciscojavierarceo Understand the reasoning, although I can imagine some people would value a consistent experience between offline and online feature retrieval more. That said, it is not as important as the two other issues we have open still: |
Context
I'm using an
ODFV
with aPandasTransformation
and aRequestSource
input. Furthermore, I'm using Postgres offline and online store.Expected Behavior
I would expect both the
get_online_features_async
method and theget_historical_features
method to return the same set of features. However, when retrieving features withget_online_features_async
, the input data from theRequestSource
is not present in the output response. On the other hand, when retrieving features withget_historical_features
, the input data from theRequestSource
is present in the output response.I'm not sure which behavior is to be expected. However, I think think that both approaches should return the same columns.
Current Behavior
When calling the
get_online_features_async
method I do not see the input from theRequestSource
back in the output response, while I would expect it to be there.One could also argue that it should not be in the output response. That would also be an option. However, I would assume that the online and offline feature retrieval would return the same output in terms of columns.
Steps to reproduce
docker-compose.yml
feature_store.yml
Insert into offline store (postgres)
postgres_init/create-offline-store-database.sql
bootstrap.py
materialize.py
inference.py
Specifications
Version: 0.36.0
Platform: macOS - M1
Subsystem: Sonoma 14.1.1
Possible Solution
The text was updated successfully, but these errors were encountered: