Change ORM query style in MDS query #117

nss10 · 2024-10-29T22:18:44Z

Link to JIRA ticket if there is one: MIDRC-363

Improvements

MDS search response now makes use of MDS index on the data._guid_type for specific commons (MIDRC)

Google doc with the observations : Discovery page loading issue MIDRC

github-actions · 2024-10-29T22:19:19Z

The style in this PR agrees with black. ✔️

This formatting comment was generated automatically by a script in uc-cdis/wool.

github-actions · 2024-10-29T23:12:06Z

filepath	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{tests/test\_discoverypage.py}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_aggregate\_mds.py}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_metadata\_ingestion.py}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$	$$\textcolor{#23d18b}{\tt{3}}$$	$$\textcolor{#23d18b}{\tt{3}}$$

Please find the detailed integration test report here

Please find the ci env pod logs here

george42-ctds

Looks good.

mfshao · 2024-10-30T01:34:08Z

src/mds/query.py

@@ -83,10 +83,9 @@ def add_filter(query):
                query = query.where(Metadata.data[path].has_key(field))
            else:
                values = ["*" if v == "\*" else v for v in values]
+                path = path if path == "_guid_type" else list(path.split("."))


So is the performance been impacted because there is a list got passed as indices?
If so, instead of only looking for the _guid_type field, shouldn't we make sure it doesn't use a list if there is no . in the path for any field names?

yeah, hold on, this worked in the specific case where we want to filter on guid_type, but this presumably breaks for records that don't have that field

we populate that as a common pattern but it's not required, so I think we need to be a little more careful here

could we just not pass a list if it's a single item without "."?

path = path if "." not in path else list(path.split("."))

then it will work for any query with a single item path and not hard-code the _guid_type

I agree that your change is more generalized, Alex. However, the key point here is that there’s an index on data->>_guid_type in the MIDRC's metadata database. Because of this, using a string instead of a list has a much greater impact. Without an index, the choice between a string and a list would make little to no difference. I can still make the change, since baking this into code seems like a bad idea to me as well.

The reason I was trying to propose adding an index to _guid_type is because we hardcode the MDS query from Data portal. --> code here

yeah, hard-coding usage on the frontend is one thing, but hard-coding those patterns on the backend means every frontend and client (for which ours is not the only one) would need to behave with the same assumptions. We don't want to enforce a particular pattern of usage unless that structure is built into the API itself and the expectations are clear.

So I still think the general string is better. It means if there is an index for the particular field being searched, it will be used (which means better performance beyond just the _guid_type if someone has indexed other columns). Technically MDS has an endpoint to index columns, though I'm not sure how well used that is

Updated! 🚀

Avantol13

I'm not sure this is generalized enough

github-actions · 2024-10-31T19:13:11Z

filepath	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{tests/test\_discoverypage.py}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_aggregate\_mds.py}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{tests/test\_metadata\_ingestion.py}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$	$$\textcolor{#23d18b}{\tt{3}}$$	$$\textcolor{#23d18b}{\tt{3}}$$

Please find the detailed integration test report here

Please find the ci env pod logs here

paulineribeyre · 2024-11-04T18:43:36Z

The PR description format somewhat polluted the release notes, please read this doc for guidance (eg an internal google doc should not be in the release notes)

* Change ORM query style in MDS search to use DB indexing better --------- Co-authored-by: nss10 <[email protected]>

Change ORM query style in MDS query

d6005e3

nss10 and others added 2 commits October 29, 2024 17:25

Increment version number in pyproject.toml

a1e0355

Apply automatic documentation changes

ee60695

nss10 requested review from george42-ctds and Avantol13 October 29, 2024 22:55

george42-ctds previously approved these changes Oct 30, 2024

View reviewed changes

mfshao reviewed Oct 30, 2024

View reviewed changes

Avantol13 requested changes Oct 30, 2024

View reviewed changes

Remove hardcoded _guid_type in MDS query

2829326

nss10 dismissed george42-ctds’s stale review via 2829326 October 31, 2024 18:17

nss10 requested a review from Avantol13 October 31, 2024 18:21

Avantol13 approved these changes Nov 1, 2024

View reviewed changes

nss10 merged commit eff41c4 into master Nov 1, 2024
8 checks passed

nss10 deleted the chore/query_performance_orm branch November 1, 2024 16:03

matthewwest55 pushed a commit to matthewwest55/metadata-service that referenced this pull request Jan 21, 2025

Change ORM query style in MDS query (uc-cdis#117)

0eaa8b1

* Change ORM query style in MDS search to use DB indexing better --------- Co-authored-by: nss10 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change ORM query style in MDS query #117

Change ORM query style in MDS query #117

nss10 commented Oct 29, 2024 •

edited

Loading

github-actions bot commented Oct 29, 2024

github-actions bot commented Oct 29, 2024

george42-ctds left a comment

mfshao Oct 30, 2024 •

edited

Loading

Avantol13 Oct 30, 2024

Avantol13 Oct 30, 2024

Avantol13 Oct 30, 2024

Avantol13 Oct 30, 2024

Avantol13 Oct 30, 2024 •

edited

Loading

nss10 Oct 31, 2024

nss10 Oct 31, 2024

Avantol13 Oct 31, 2024

nss10 Oct 31, 2024

Avantol13 left a comment

github-actions bot commented Oct 31, 2024

paulineribeyre commented Nov 4, 2024 •

edited

Loading

Change ORM query style in MDS query #117

Change ORM query style in MDS query #117

Conversation

nss10 commented Oct 29, 2024 • edited Loading

Improvements

github-actions bot commented Oct 29, 2024

github-actions bot commented Oct 29, 2024

george42-ctds left a comment

Choose a reason for hiding this comment

mfshao Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Avantol13 Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Avantol13 left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 31, 2024

paulineribeyre commented Nov 4, 2024 • edited Loading

nss10 commented Oct 29, 2024 •

edited

Loading

mfshao Oct 30, 2024 •

edited

Loading

Avantol13 Oct 30, 2024 •

edited

Loading

paulineribeyre commented Nov 4, 2024 •

edited

Loading