Query refactoring #561

elshize · 2023-12-30T01:41:59Z

Weights are now stored together with term IDs and resolved at construction time according to one of the policies. In our tools, we use the default policy that removes duplicates and sets the weight to the number of occurrences of the term in a query. Other policies are, for the time being, only available programmatically via the library API.

Some legacy code used to parse and process queries has been removed in favor of the text analyzer and the new query parser.

Because weights are resolved when a query object is created, I also refactored creating the cursors: now the weight is simply taken from the query.

Fixes #501

codecov · 2023-12-30T02:38:53Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (994d101) 93.21% compared to head (30ee2a7) 93.23%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #561      +/-   ##
==========================================
+ Coverage   93.21%   93.23%   +0.02%     
==========================================
  Files          91       90       -1     
  Lines        4483     4452      -31     
==========================================
- Hits         4179     4151      -28     
+ Misses        304      301       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

JMMackenzie

This looks great, nice work.

One quick thought - do we need to worry about the Thresholds tool/use here? I assume not, since thresholds are applied on a per-query basis, but I am just trying to think of any corner cases we might have missed. It might also be worth running some regression testing on this since it's quite a sensitive (but very positive) change to the inner processing logic? I'd be happy to run some tests comparing to master in the coming week or two if that would be useful.

include/pisa/query.hpp

elshize · 2024-01-04T11:39:29Z

Right, I don't think thresholds should be any different than queries, but I'll give it another look.

Good idea about regression tests. There's a test docker image that I created for that, but haven't finished. Maybe it's a good idea to continue with it to make it easier to repeat in the future (or even automate).

elshize · 2024-01-15T02:21:52Z

I need to fix the conflicts, and after that, I'll run the docker that was just merged in the other PR to evaluate and see if there's any regression.

Weights are now stored together with term IDs and resolved at construction time according to one of the policies. In our tools, we use the default policy that removes duplicates and sets the weight to the number of occurrences of the term in a query. Other policies are, for the time being, only available programmatically via the library API. Some legacy code used to parse and process queries has been removed in favor of the text analyzer and the new query parser. Because weights are resolved when a query object is created, I also refactored creating the cursors: now the weight is simply taken from the query.

elshize · 2024-01-15T13:32:02Z

@JMMackenzie The regression test was successful. Are you ok merging it?

JMMackenzie · 2024-01-16T00:07:29Z

Great, let's merge!

elshize requested a review from JMMackenzie December 30, 2023 01:42

elshize force-pushed the term-weights branch 2 times, most recently from eda2e12 to 088a40f Compare December 30, 2023 02:18

elshize force-pushed the term-weights branch from 088a40f to 8778efe Compare December 31, 2023 13:39

elshize self-assigned this Dec 31, 2023

JMMackenzie approved these changes Jan 4, 2024

View reviewed changes

include/pisa/query.hpp Outdated Show resolved Hide resolved

JMMackenzie approved these changes Jan 15, 2024

View reviewed changes

elshize and others added 3 commits January 14, 2024 21:25

Typo fix

dab0c01

Add efficiency disclaimer to keep_duplicates

30ee2a7

elshize force-pushed the term-weights branch from e1262f9 to 30ee2a7 Compare January 15, 2024 02:25

elshize merged commit 2ba2753 into master Jan 16, 2024
10 checks passed

elshize deleted the term-weights branch January 16, 2024 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query refactoring #561

Query refactoring #561

elshize commented Dec 30, 2023

codecov bot commented Dec 30, 2023 •

edited

Loading

JMMackenzie left a comment

elshize commented Jan 4, 2024

elshize commented Jan 15, 2024

elshize commented Jan 15, 2024

JMMackenzie commented Jan 16, 2024

Query refactoring #561

Query refactoring #561

Conversation

elshize commented Dec 30, 2023

codecov bot commented Dec 30, 2023 • edited Loading

Codecov Report

JMMackenzie left a comment

Choose a reason for hiding this comment

elshize commented Jan 4, 2024

elshize commented Jan 15, 2024

elshize commented Jan 15, 2024

JMMackenzie commented Jan 16, 2024

codecov bot commented Dec 30, 2023 •

edited

Loading