Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query::term_weights is not assigned #501

Closed
elshize opened this issue Dec 24, 2022 · 11 comments · Fixed by #561
Closed

Query::term_weights is not assigned #501

elshize opened this issue Dec 24, 2022 · 11 comments · Fixed by #561
Assignees
Labels

Comments

@elshize
Copy link
Member

elshize commented Dec 24, 2022

Query::term_weights seems to be never assigned. The --weighted flag in queries and evaluate_queries programs ignore those, and instead resolve weights when creating cursors.

@seanmacavaney
Copy link

I noticed this recently when working on the Python integration. Fixing this would simplify the integration since we wouldn't need to resort to term repetition to apply the weighting.

@elshize
Copy link
Member Author

elshize commented Dec 31, 2022

@seanmacavaney thanks for letting us know. Just to be clear, by fixing you mean that term_weights should be used and should affect the query processing, right?

I have some other stuff open now that I want to merge first, but I can look closer into this after that.

@seanmacavaney
Copy link

Yes. I expected that term_weights would be honoured during query processing, but they appear to only be used in intersection.

The issue isn't urgent though. Query term repetition is good enough for the time being.

@elshize
Copy link
Member Author

elshize commented Dec 31, 2022

I expected that term_weights would be honoured during query processing

Yeah, I was also surprised :)

@JMMackenzie
Copy link
Member

I always figured we had term_weights for future proofing in case we wanted to support query inputs like: hello:20 world:10 or something like.

Would it be more or less painful to have a vector of pairs/structs? It seems a bit tedious/error prone to have separate vectors accessed by index, at least in my opinion.

Thinking something like:

struct query_term {
  uint32_t term_id;
  double weight;
   ...
};

@elshize
Copy link
Member Author

elshize commented Jan 26, 2023

Would it be more or less painful to have a vector of pairs/structs?

Possibly. This sounds like a good idea, but would have to look at the code again to see if there's anything preventing (or discouraging) that.

@elshize elshize self-assigned this Dec 22, 2023
@seanmacavaney
Copy link

Amazing -- thanks for the fix @elshize!

@elshize
Copy link
Member Author

elshize commented Jan 16, 2024

@seanmacavaney please note that this is quite a rewrite around query parsing/handling. Not sure how much that would affect your Python binding once you upgrade.

We would love to provide some better stability in our APIs, but I'm currently actively trying to improve multiple parts of the library, so it will get worse before it gets better unfortunately.

If at any point you have any questions or issues, I'd be more than happy to help with any future upgrades.

@seanmacavaney
Copy link

Thanks for letting me know. So it's best to hold off on any changes to the Python integration until the API stabilises a bit.

Are there some specific gh issues that you recommend I subscribe to to help keep an eye on this progress?

@elshize
Copy link
Member Author

elshize commented Jan 18, 2024

Are there some specific gh issues that you recommend I subscribe to to help keep an eye on this progress?

Not really, but it may be a good idea to open a tracking issue. Let me think briefly on how to best organize it, and I'll let you know.

@elshize
Copy link
Member Author

elshize commented Jan 18, 2024

@seanmacavaney I created an issue, not much there now, but you can subscribe to get updates: #569

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants