Improves `search` to handle smaller search terms. #4735

medha-14 · 2025-01-03T11:10:32Z

Description

Type of change

Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #) - note reverse order of PR #s. If necessary, also add to the list of breaking changes.

New feature (non-breaking change which adds functionality)
Optimization (back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)

Key checklist:

No style issues: $ pre-commit run (or $ nox -s pre-commit) (see CONTRIBUTING.md for how to set this up to run automatically when committing locally, in just two lines of code)
All tests pass: $ python -m pytest (or $ nox -s tests)
The documentation builds: $ python -m pytest --doctest-plus src (or $ nox -s doctests)

You can run integration tests, unit tests, and doctests together at once, using $ nox -s quick.

Further checks:

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

codecov · 2025-01-03T12:13:18Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.24%. Comparing base (a7253b8) to head (977f962).
Report is 24 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #4735      +/-   ##
===========================================
+ Coverage    99.22%   99.24%   +0.01%     
===========================================
  Files          303      303              
  Lines        23070    23262     +192     
===========================================
+ Hits         22891    23086     +195     
+ Misses         179      176       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

medha-14 · 2025-01-06T10:00:43Z

Could I get a review on this one?

kratman · 2025-01-06T15:31:04Z

@medha-14 Sorry, there is a bit of a backlog due to the upcoming release and everyone coming back from vacation. Don't worry, we will review this shortly

medha-14 · 2025-01-06T16:33:26Z

Thank you for the update! I just thought you missed this one.

agriyakhetarpal

Thanks! I think min_similarity could be slightly higher here because the threshold here is too low and can lead to false positives in the search. But, at the same time, being able to resolve potential typos in the search query would require a lower threshold. This makes me feel that we could expose it with a sensible default, but I don't yet know what a sensible default would be. It should be higher than the current 40%, though. In-line comment about this below:

agriyakhetarpal · 2025-01-07T14:18:35Z

src/pybamm/util.py

@@ -163,14 +185,24 @@ def search(self, keys: str | list[str], print_values: bool = False):
            search_keys = [k.strip().lower() for k in keys if k.strip()]

        known_keys = list(self.keys())
-        known_keys.sort()
-
+        min_similarity = 0.4


min_similarity is also defined above, so it gets defined twice. How about making it an argument for _find_matches()?

Follow-up question: do you think it would make sense to expose it publicly for users through search() as well?

medha-14 · 2025-01-12T05:06:02Z

Thanks! I think min_similarity could be slightly higher here because the threshold here is too low and can lead to false positives in the search. But, at the same time, being able to resolve potential typos in the search query would require a lower threshold. This makes me feel that we could expose it with a sensible default, but I don't yet know what a sensible default would be. It should be higher than the current 40%, though. In-line comment about this below:

Sorry for the delayed response. I think it's important to clarify that min_similarity is only relevant for substring matches when the search_key is found within the known keys. For cases where there are typos or no substring matches, difflib.get_close_matches() handles those independently, using its own cutoff threshold. Increasing the min_similarity too high would make it difficult to get even the relevant matches.
Eg: If we even set the threshold to 0.5 and a user searches for conc it will not be matched with concentration because it will have similarity ratio of 0.47(approximately) which will not qualify the threshold.

medha-14 added 2 commits January 3, 2025 16:16

modified method

5c19af6

added test

6710053

medha-14 requested a review from a team as a code owner January 3, 2025 11:10

added changelog

23a7855

kratman and others added 2 commits January 3, 2025 10:21

Merge branch 'develop' into search_improve

9691232

Merge branch 'develop' into search_improve

acd0fe4

Merge branch 'develop' into search_improve

977f962

Saransh-cpp requested a review from agriyakhetarpal January 7, 2025 13:53

agriyakhetarpal reviewed Jan 7, 2025

View reviewed changes

Merge branch 'develop' into search_improve

11cc9b0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improves `search` to handle smaller search terms. #4735

Improves `search` to handle smaller search terms. #4735

medha-14 commented Jan 3, 2025 •

edited

Loading

codecov bot commented Jan 3, 2025 •

edited

Loading

medha-14 commented Jan 6, 2025

kratman commented Jan 6, 2025

medha-14 commented Jan 6, 2025

agriyakhetarpal left a comment

agriyakhetarpal Jan 7, 2025

medha-14 commented Jan 12, 2025 •

edited

Loading

Improves search to handle smaller search terms. #4735

Are you sure you want to change the base?

Improves search to handle smaller search terms. #4735

Conversation

medha-14 commented Jan 3, 2025 • edited Loading

Description

Type of change

Key checklist:

Further checks:

codecov bot commented Jan 3, 2025 • edited Loading

Codecov Report

medha-14 commented Jan 6, 2025

kratman commented Jan 6, 2025

medha-14 commented Jan 6, 2025

agriyakhetarpal left a comment

Choose a reason for hiding this comment

agriyakhetarpal Jan 7, 2025

Choose a reason for hiding this comment

medha-14 commented Jan 12, 2025 • edited Loading

Improves `search` to handle smaller search terms. #4735

Improves `search` to handle smaller search terms. #4735

medha-14 commented Jan 3, 2025 •

edited

Loading

codecov bot commented Jan 3, 2025 •

edited

Loading

medha-14 commented Jan 12, 2025 •

edited

Loading