-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Index tags #10441
base: main
Are you sure you want to change the base?
[core] Index tags #10441
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋
We have an invariant that data_sources_nodes in ES and in core are in sync
I think we should keep this invariant, so we should move tags inside nodes. It will then, for free, be indexed by the regular process
IMO would make all the code a lot simpler too, remove some duplication, etc.
I agree but I remember we had a discussion a month ago about putting tags in node and we decided not to (not sure why exactly) ? |
IMO because at the time it was not needed; Tags as filter just recently came around in go/product |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 sorry for the delay, thanks for moving tags to nodes :)
left a few other coms
@@ -45,6 +45,19 @@ | |||
"provider_visibility": { | |||
"type": "keyword", | |||
"index": false | |||
}, | |||
"tags": { | |||
"type": "text", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we'd need to understand if that's the right way of indexing
Will you never sort by tags, will you sometimes need exact tag match, will you sometimes filter rather than rank, etc.
Is there doc somewhere on the type of query that should be supported long term?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see there's exact match (so "keyword") and prefix; for prefix I think keyword is fine too although not sure
I see in the design doc there are questions about fuzzy search. Was this sorted out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to @flvndvd keywords cannot handle prefix directly, we need ngram (so edge) for that. Question about fuzzy search is not clear to me, I don't see the point of fuzzy searching on tags, but I've added the mapping in case we need it.
&self, | ||
query: Option<String>, | ||
query_type: Option<TagsQueryType>, | ||
data_sources: Option<Vec<String>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not reuse the datasourceviewfilter?
providing proper DSV filters is how we ensure permissionning in core, so I would stick to it if possible
@@ -0,0 +1,231 @@ | |||
use bb8::Pool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scripts in bin/ are meant to be frequently reused; temporary backfill scripts should go in migrations
does this one really belong here in your opinion?
if it does, could it become part of backfill_nodes? it'd be better if there are not too many different scripts there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's temporary, i'll move it to migrations
fixes: https://github.com/dust-tt/tasks/issues/2047
fixes: https://github.com/dust-tt/tasks/issues/2046
Description
Update schema for tags in ES. Adds a new tags field with keyword, text/standard and text/edge_analyzer .
Update tags when upserting table or document.
Add a backfill for existing docs/tables.
Adds endpoint to query tags
Tests
run simple query to list tags values :
For the same query with core endpoint :
Risk
Deploy Plan
run migration :
deploy core
run backfills