-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: full text search for pubs #919
Conversation
… data while fetching
like, really slightly
useEffect(() => { | ||
const down = (e: KeyboardEvent) => { | ||
if (e.key === "k" && (e.metaKey || e.ctrlKey)) { | ||
e.preventDefault(); | ||
onOpenChange?.(true); | ||
setOpen(true); | ||
} | ||
}; | ||
|
||
document.addEventListener("keydown", down); | ||
return () => document.removeEventListener("keydown", down); | ||
}, [onOpenChange]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may be done more elegantly, we should probably have some global thing for this, but for now this is fine i think
<CommandEmpty className="flex w-full flex-col items-center p-6"> | ||
{isFetching ? ( | ||
<Loader2 className="h-4 w-4 animate-spin" /> | ||
) : isError ? ( | ||
<AlertCircle className="h-4 w-4" /> | ||
) : ( | ||
"No results found" | ||
)} | ||
</CommandEmpty> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not very elegant, lots of states not well represented
@@ -5,7 +5,7 @@ import type { GetPubsResult } from "./server"; | |||
export type PubTitleProps = { | |||
title?: string | null; | |||
createdAt: Date; | |||
values: | |||
values?: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need the values for this technically
Title: "Machine Learning Fundamentals", | ||
Description: "An introduction to basic ML concepts", | ||
Content: "This article covers neural networks and deep learning basics.", | ||
Tags: "AI, ML, Technology", | ||
}, | ||
}, | ||
{ | ||
pubType: "Basic Pub", | ||
values: { | ||
Title: "Deep Learning Applications", | ||
Description: "Advanced applications of deep learning", | ||
Content: "Exploring real-world uses of neural networks in computer vision.", | ||
Tags: "AI, Deep Learning, CV", | ||
}, | ||
}, | ||
{ | ||
pubType: "Basic Pub", | ||
values: { | ||
Title: "Web Development Guide", | ||
Description: "Complete guide to modern web development", | ||
Content: "Learn about React, TypeScript, and Node.js fundamentals.", | ||
Tags: "Web, JavaScript, Programming", | ||
}, | ||
}, | ||
{ | ||
pubType: "Basic Pub", | ||
values: { | ||
Title: "Database Design Patterns", | ||
Description: "Essential patterns for database architecture", | ||
Content: "Understanding SQL and NoSQL database design principles.", | ||
Tags: "Database, SQL, Architecture", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely wrote these myself
ALTER TABLE "pubs" ADD COLUMN "searchVector" tsvector; | ||
|
||
-- CreateIndex | ||
CREATE INDEX "pubs_searchVector_idx" ON "pubs" USING GIN ("searchVector"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this adds the column and the idx (note: we should add more indices)
WITH updates AS ( | ||
SELECT | ||
affected."pubId", | ||
CASE | ||
WHEN tmp."pubId" IS NULL THEN pubs."title" | ||
WHEN TG_OP = 'DELETE' OR tmp."value" IS NULL THEN NULL | ||
ELSE tmp."value" | ||
END AS new_title | ||
FROM ( | ||
SELECT DISTINCT "pubId" | ||
FROM inserted_updated_deleted_rows | ||
) AS affected | ||
LEFT JOIN tmp_affected_pubs tmp ON tmp."pubId" = affected."pubId" | ||
JOIN pubs ON pubs.id = affected."pubId" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just slightly rewriting this as a CTE, st i can use that title in the search vector as well
platform/core/prisma/migrations/20241219131700_fix_title_update_trigger/migration.sql
Lines 46 to 67 in 78aab11
UPDATE | |
"pubs" | |
SET | |
"updatedAt" = CURRENT_TIMESTAMP, | |
"title" = CASE | |
-- this is what's changed from the earlier function | |
WHEN tmp."pubId" IS NULL THEN | |
pubs."title" | |
WHEN TG_OP = 'DELETE' OR tmp."value" IS NULL THEN | |
NULL | |
ELSE | |
tmp."value" | |
END | |
FROM ( SELECT DISTINCT | |
"pubId" | |
FROM | |
inserted_updated_deleted_rows | |
) AS affected | |
LEFT JOIN tmp_affected_pubs tmp ON tmp."pubId" = affected."pubId" | |
WHERE | |
"pubs"."id" = affected."pubId"; | |
RETURN NULL; |
UPDATE "pubs" | ||
SET | ||
"updatedAt" = CURRENT_TIMESTAMP, | ||
"title" = updates.new_title, | ||
-- we weight the searchVector based on the title and its values | ||
"searchVector" = ( | ||
SELECT | ||
setweight(to_tsvector('english', COALESCE(updates.new_title, '')), 'A') || | ||
setweight(to_tsvector('english', COALESCE( | ||
(SELECT string_agg(CAST(value #>> '{}' AS TEXT), ' ') | ||
FROM pub_values | ||
WHERE "pubId" = updates."pubId"), | ||
'')), 'B') | ||
) | ||
FROM updates | ||
WHERE "pubs"."id" = updates."pubId"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this, very naively, just sets the search vector as
(with A priority) title
(with B priority) `${pub.Values.join(' ')}`
we could do this more sophisticatedly, ignore all the values we don't want to look up (maybe only add stringlike fields), join with values we do want to look up (add the user info instead of the memberId for instance), or give certain fields different priorities (rich text fields should probably have different priorities in the future)
but for now, i think this is fiiine
"flex h-full w-full flex-col overflow-hidden rounded-md bg-white text-gray-950", | ||
"flex h-full w-full flex-col overflow-hidden bg-white text-gray-950", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting: for some reason this "rounded" caused the whole command list to be blurry
<DialogTitle className={cn(showTitle || "sr-only")}>{title}</DialogTitle> | ||
<DialogDescription className={cn(showDescription || "sr-only")}> | ||
{description} | ||
</DialogDescription> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are necessary for accessibilty, otherwise you get a bunch of errors in the console/in dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels great to use. Thanks for all the code comments and great job on those tests too!
Issue(s) Resolved
High-level Explanation of PR
This PR adds, in order
searchVector
TSVECTOR
column on the pubs databasets_query
Rough explanation of postgres FTS
I'd recommend this excellent video of the great Ben Awad, will do a much better job of explaining how it works than I or the postgres docs can: https://www.youtube.com/watch?v=szfUbzsKvtE&t=362s
But here's a shot at it:
Postgres can create a vectorized representation of a bunch of stuff with it's
to_tsvector
function/TSVECTOR
column.It will basically create small, weighted "stubs" of words, which are easy to compare. eg it will unconjugate verbs and unpluralize nouns, so
running->run:
andducks->duck:
, roughly. You can then assign weights to certain parts of those vectors, which helps in ranking.Then postgres does some magic and you can search it with this syntax
very weird syntax, but it works really well.
Doing
to_tsvector
every query is very slow though, so I created a column on thepubs
table calledsearchVector
, and added an index and a trigger that updates it. This makes like a 500ms -> 1ms diff in perf.Then I made some search thingy using shadcn's Command thing, which uses it.
That's basically it! Much easier than I thought it would be.
Downsides
Absolutely no permission filtering!
Should definitely be done!
But even if we filter permissions, we won't be find related pubs the user has permissions to through the parent pub!
Maybe kind of slow in the long term, but the query can definitely be optimized a bunch.
No typed return responses (should do that)
Test Plan
Go to arcadia
CMD+K
Look up something latin-y, like
vilis
.See a lot of search results!
Navigate to search result on keyboard
Enter
Be taken to pub!
Modify a pub, ideally it's title
Check you are able to find it with this new title
Screenshots (if applicable)
Screenshare.-.2025-02-03.7_00_20.PM.mp4
Screenshare.-.2025-02-03.7_05_33.PM.mp4
Notes