Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query refactoring #561

Merged
merged 3 commits into from
Jan 16, 2024
Merged

Query refactoring #561

merged 3 commits into from
Jan 16, 2024

Conversation

elshize
Copy link
Member

@elshize elshize commented Dec 30, 2023

Weights are now stored together with term IDs and resolved at construction time according to one of the policies. In our tools, we use the default policy that removes duplicates and sets the weight to the number of occurrences of the term in a query. Other policies are, for the time being, only available programmatically via the library API.

Some legacy code used to parse and process queries has been removed in favor of the text analyzer and the new query parser.

Because weights are resolved when a query object is created, I also refactored creating the cursors: now the weight is simply taken from the query.

Fixes #501

@elshize elshize requested a review from JMMackenzie December 30, 2023 01:42
@elshize elshize force-pushed the term-weights branch 2 times, most recently from eda2e12 to 088a40f Compare December 30, 2023 02:18
Copy link

codecov bot commented Dec 30, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (994d101) 93.21% compared to head (30ee2a7) 93.23%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #561      +/-   ##
==========================================
+ Coverage   93.21%   93.23%   +0.02%     
==========================================
  Files          91       90       -1     
  Lines        4483     4452      -31     
==========================================
- Hits         4179     4151      -28     
+ Misses        304      301       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@elshize elshize self-assigned this Dec 31, 2023
Copy link
Member

@JMMackenzie JMMackenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, nice work.

One quick thought - do we need to worry about the Thresholds tool/use here? I assume not, since thresholds are applied on a per-query basis, but I am just trying to think of any corner cases we might have missed. It might also be worth running some regression testing on this since it's quite a sensitive (but very positive) change to the inner processing logic? I'd be happy to run some tests comparing to master in the coming week or two if that would be useful.

include/pisa/query.hpp Outdated Show resolved Hide resolved
@elshize
Copy link
Member Author

elshize commented Jan 4, 2024

Right, I don't think thresholds should be any different than queries, but I'll give it another look.

Good idea about regression tests. There's a test docker image that I created for that, but haven't finished. Maybe it's a good idea to continue with it to make it easier to repeat in the future (or even automate).

@elshize
Copy link
Member Author

elshize commented Jan 15, 2024

I need to fix the conflicts, and after that, I'll run the docker that was just merged in the other PR to evaluate and see if there's any regression.

elshize and others added 3 commits January 14, 2024 21:25
Weights are now stored together with term IDs and resolved at
construction time according to one of the policies. In our tools, we use
the default policy that removes duplicates and sets the weight to the
number of occurrences of the term in a query. Other policies are, for
the time being, only available programmatically via the library API.

Some legacy code used to parse and process queries has been removed in
favor of the text analyzer and the new query parser.

Because weights are resolved when a query object is created, I also
refactored creating the cursors: now the weight is simply taken from the
query.
@elshize
Copy link
Member Author

elshize commented Jan 15, 2024

@JMMackenzie The regression test was successful. Are you ok merging it?

@JMMackenzie
Copy link
Member

Great, let's merge!

@elshize elshize merged commit 2ba2753 into master Jan 16, 2024
10 checks passed
@elshize elshize deleted the term-weights branch January 16, 2024 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query::term_weights is not assigned
2 participants