-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change ORM query style in MDS query #117
Conversation
The style in this PR agrees with This formatting comment was generated automatically by a script in uc-cdis/wool. |
Please find the detailed integration test report here Please find the ci env pod logs here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
src/mds/query.py
Outdated
@@ -83,10 +83,9 @@ def add_filter(query): | |||
query = query.where(Metadata.data[path].has_key(field)) | |||
else: | |||
values = ["*" if v == "\*" else v for v in values] | |||
path = path if path == "_guid_type" else list(path.split(".")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the performance been impacted because there is a list got passed as indices?
If so, instead of only looking for the _guid_type field, shouldn't we make sure it doesn't use a list if there is no . in the path for any field names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, hold on, this worked in the specific case where we want to filter on guid_type, but this presumably breaks for records that don't have that field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we populate that as a common pattern but it's not required, so I think we need to be a little more careful here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we just not pass a list if it's a single item without "."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path = path if "." not in path else list(path.split("."))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then it will work for any query with a single item path and not hard-code the _guid_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that your change is more generalized, Alex. However, the key point here is that there’s an index on data->>_guid_type
in the MIDRC's metadata database. Because of this, using a string instead of a list has a much greater impact. Without an index, the choice between a string and a list would make little to no difference. I can still make the change, since baking this into code seems like a bad idea to me as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I was trying to propose adding an index to _guid_type
is because we hardcode the MDS query from Data portal. --> code here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, hard-coding usage on the frontend is one thing, but hard-coding those patterns on the backend means every frontend and client (for which ours is not the only one) would need to behave with the same assumptions. We don't want to enforce a particular pattern of usage unless that structure is built into the API itself and the expectations are clear.
So I still think the general string is better. It means if there is an index for the particular field being searched, it will be used (which means better performance beyond just the _guid_type if someone has indexed other columns). Technically MDS has an endpoint to index columns, though I'm not sure how well used that is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is generalized enough
Please find the detailed integration test report here Please find the ci env pod logs here |
The PR description format somewhat polluted the release notes, please read this doc for guidance (eg an internal google doc should not be in the release notes) |
* Change ORM query style in MDS search to use DB indexing better --------- Co-authored-by: nss10 <[email protected]>
Link to JIRA ticket if there is one: MIDRC-363
Improvements
data._guid_type
for specific commons (MIDRC)Google doc with the observations : Discovery page loading issue MIDRC