Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change ORM query style in MDS query #117

Merged
merged 4 commits into from
Nov 1, 2024
Merged

Conversation

nss10
Copy link
Contributor

@nss10 nss10 commented Oct 29, 2024

Link to JIRA ticket if there is one: MIDRC-363

Improvements

  • MDS search response now makes use of MDS index on the data._guid_type for specific commons (MIDRC)

Google doc with the observations : Discovery page loading issue MIDRC

Copy link

The style in this PR agrees with black. ✔️

This formatting comment was generated automatically by a script in uc-cdis/wool.

Copy link

filepath $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{tests/test\_discoverypage.py}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_aggregate\_mds.py}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_metadata\_ingestion.py}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{3}}$$ $$\textcolor{#23d18b}{\tt{3}}$$

Please find the detailed integration test report here

Please find the ci env pod logs here

george42-ctds
george42-ctds previously approved these changes Oct 30, 2024
Copy link
Contributor

@george42-ctds george42-ctds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

src/mds/query.py Outdated
@@ -83,10 +83,9 @@ def add_filter(query):
query = query.where(Metadata.data[path].has_key(field))
else:
values = ["*" if v == "\*" else v for v in values]
path = path if path == "_guid_type" else list(path.split("."))
Copy link
Contributor

@mfshao mfshao Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the performance been impacted because there is a list got passed as indices?
If so, instead of only looking for the _guid_type field, shouldn't we make sure it doesn't use a list if there is no . in the path for any field names?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, hold on, this worked in the specific case where we want to filter on guid_type, but this presumably breaks for records that don't have that field

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we populate that as a common pattern but it's not required, so I think we need to be a little more careful here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we just not pass a list if it's a single item without "."?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path = path if "." not in path else list(path.split("."))

Copy link
Contributor

@Avantol13 Avantol13 Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then it will work for any query with a single item path and not hard-code the _guid_type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that your change is more generalized, Alex. However, the key point here is that there’s an index on data->>_guid_type in the MIDRC's metadata database. Because of this, using a string instead of a list has a much greater impact. Without an index, the choice between a string and a list would make little to no difference. I can still make the change, since baking this into code seems like a bad idea to me as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I was trying to propose adding an index to _guid_type is because we hardcode the MDS query from Data portal. --> code here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, hard-coding usage on the frontend is one thing, but hard-coding those patterns on the backend means every frontend and client (for which ours is not the only one) would need to behave with the same assumptions. We don't want to enforce a particular pattern of usage unless that structure is built into the API itself and the expectations are clear.

So I still think the general string is better. It means if there is an index for the particular field being searched, it will be used (which means better performance beyond just the _guid_type if someone has indexed other columns). Technically MDS has an endpoint to index columns, though I'm not sure how well used that is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! 🚀

Copy link
Contributor

@Avantol13 Avantol13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is generalized enough

Copy link

filepath $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{tests/test\_discoverypage.py}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_aggregate\_mds.py}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_metadata\_ingestion.py}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{3}}$$ $$\textcolor{#23d18b}{\tt{3}}$$

Please find the detailed integration test report here

Please find the ci env pod logs here

@nss10 nss10 merged commit eff41c4 into master Nov 1, 2024
8 checks passed
@nss10 nss10 deleted the chore/query_performance_orm branch November 1, 2024 16:03
@paulineribeyre
Copy link
Contributor

paulineribeyre commented Nov 4, 2024

The PR description format somewhat polluted the release notes, please read this doc for guidance (eg an internal google doc should not be in the release notes)

matthewwest55 pushed a commit to matthewwest55/metadata-service that referenced this pull request Jan 21, 2025
* Change ORM query style in MDS search to use DB indexing better

---------

Co-authored-by: nss10 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants