Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better search #858

Merged
merged 93 commits into from
Feb 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
dc569a3
Bitap impl!!
lukebrody Jan 4, 2025
0d23a02
Crazy bitap algo
lukebrody Jan 4, 2025
f2cf5a7
Let’s try using a lib
lukebrody Jan 4, 2025
48ba580
Results are looking good
lukebrody Jan 4, 2025
9acf3d2
Write down the new plan
lukebrody Jan 5, 2025
430a777
Redo search, seems to be working well enough
lukebrody Jan 5, 2025
5fe042e
Let’s cache token matches
lukebrody Jan 5, 2025
3b1e0eb
Speed up cache somewhat
lukebrody Jan 5, 2025
c939eac
More optimization
lukebrody Jan 5, 2025
2b94085
Looking good!
lukebrody Jan 5, 2025
706a1d9
Merge branch 'master' of https://github.com/kavigupta/urbanstats into…
lukebrody Jan 5, 2025
3abfc2e
handle historical cds
lukebrody Jan 5, 2025
9fb99b5
clean up
lukebrody Jan 5, 2025
f4e3b3b
Add memory tests
lukebrody Jan 5, 2025
78c41fd
Merge branch 'memory-tests' into better-search
lukebrody Jan 5, 2025
3782ce9
Use caching when running testcafe
lukebrody Jan 5, 2025
7ffb4a3
Merge branch 'master' of https://github.com/kavigupta/urbanstats into…
lukebrody Jan 5, 2025
e6bee34
Merge branch 'testcafe-cache' into better-search
lukebrody Jan 5, 2025
514512a
Revert "Merge branch 'testcafe-cache' into better-search"
lukebrody Jan 9, 2025
6cc5e07
Merge branch 'master' of github.com:kavigupta/urbanstats into origin/…
lukebrody Jan 9, 2025
856cd2a
Discard the decompressed search index when the user is not using the …
lukebrody Jan 9, 2025
a0bcb91
lint
lukebrody Jan 9, 2025
f88de97
split up memory tests
lukebrody Jan 9, 2025
2d98d15
decrease article threshold
lukebrody Jan 9, 2025
972f543
going to add placement statistic
lukebrody Jan 11, 2025
7df1ddb
Add position hueristic for results
lukebrody Jan 11, 2025
8e8734f
changes
lukebrody Jan 11, 2025
d428d7a
log result explanation
lukebrody Jan 11, 2025
46b5bc5
do tokenization
lukebrody Jan 13, 2025
416b526
going to introduce new search algorithm
lukebrody Jan 14, 2025
5b98cc5
happy with results, now to work on performance
lukebrody Jan 14, 2025
4c9c1c1
improve performance
lukebrody Jan 14, 2025
2492a65
High impact optimization
lukebrody Jan 14, 2025
de53a8d
remove many marginal optimizations that added complexity
lukebrody Jan 14, 2025
db763f8
signatures for tokens
lukebrody Jan 14, 2025
7d50587
let's see if we can encode the existence of two letters
lukebrody Jan 14, 2025
bd68992
with bigints
lukebrody Jan 14, 2025
25afd95
quite happy with how search is working/performing
lukebrody Jan 14, 2025
d7e803c
nicer performance logging
lukebrody Jan 14, 2025
30a3066
Remove caching from protobufs
lukebrody Jan 16, 2025
dbf667b
Merge branch 'master' of https://github.com/kavigupta/urbanstats into…
lukebrody Jan 16, 2025
2066c32
Add unit test runner and example test
lukebrody Jan 19, 2025
012bb77
Add run unit tests command
lukebrody Jan 19, 2025
5f2043f
Fix check dependency
lukebrody Jan 19, 2025
0c713a1
Make the test pass again
lukebrody Jan 19, 2025
0fd76cc
Merge branch 'master' of https://github.com/kavigupta/urbanstats into…
lukebrody Jan 19, 2025
889a47b
Merge branch 'unit-testing' of https://github.com/kavigupta/urbanstat…
lukebrody Jan 19, 2025
d2b8e4e
Add search test
lukebrody Jan 19, 2025
75c327a
Merge branch 'master' of https://github.com/kavigupta/urbanstats into…
lukebrody Jan 19, 2025
0f726b1
Doing weights, but need more information (e.g. incomplete tokens match)
lukebrody Jan 19, 2025
2f2350a
Add incomplete and swap matches
lukebrody Jan 20, 2025
43fb97a
Fix up tests
lukebrody Jan 20, 2025
5601133
Sum weights to 1
lukebrody Jan 20, 2025
182b7da
Remove `allowPartial` in bitap search
lukebrody Jan 20, 2025
6b08587
clean up performance logging
lukebrody Jan 20, 2025
883eac4
Merge branch 'master' of https://github.com/kavigupta/urbanstats into…
lukebrody Jan 22, 2025
ea98e77
Merge branch 'master' of https://github.com/kavigupta/urbanstats into…
lukebrody Jan 22, 2025
368bd4a
Count full search index load time
lukebrody Jan 27, 2025
15854ee
Merge branch 'master' of https://github.com/kavigupta/urbanstats into…
lukebrody Feb 1, 2025
1875412
async search index
lukebrody Feb 1, 2025
d983686
Revert "async search index"
lukebrody Feb 1, 2025
ac11d31
Implement worker and clean up
lukebrody Feb 1, 2025
dfdf670
Include iframes and workers in memory
lukebrody Feb 1, 2025
fbe0502
Merge branch 'master' of https://github.com/kavigupta/urbanstats into…
lukebrody Feb 1, 2025
76a6ff3
fix memory meter
lukebrody Feb 1, 2025
12e7fcd
Redo memory util to include all targets
lukebrody Feb 1, 2025
306ef51
Merge branch 'master' of https://github.com/kavigupta/urbanstats into…
lukebrody Feb 1, 2025
cb75f24
Use token pointers to save memory
lukebrody Feb 1, 2025
05a19f3
Merge branch 'betterMemoryTest' into better-search
lukebrody Feb 1, 2025
8642eaf
use queue for worker so message are ordered
lukebrody Feb 1, 2025
7bc4bb7
Use new memory monitor
lukebrody Feb 1, 2025
84eec68
Discard the raw index when we’re done with it
lukebrody Feb 1, 2025
8aa6675
Merge branch 'master' of https://github.com/kavigupta/urbanstats into…
lukebrody Feb 1, 2025
2601e14
update memory tests
lukebrody Feb 1, 2025
e50f28e
Fix memory tests
lukebrody Feb 2, 2025
08ea10d
Fix comment accuracy
lukebrody Feb 2, 2025
641b4a5
Minor fixes
lukebrody Feb 2, 2025
18d5220
Refactor and fix tests involving search
lukebrody Feb 2, 2025
64e35b1
Merge branch 'betterSearchTests' into better-search
lukebrody Feb 2, 2025
390210c
update search screenshots
lukebrody Feb 2, 2025
55bdd06
Merge branch 'master' of https://github.com/kavigupta/urbanstats into…
lukebrody Feb 2, 2025
b5e9d00
Merge branch 'master' of https://github.com/kavigupta/urbanstats into…
lukebrody Feb 2, 2025
2c78727
Simplify search worker
lukebrody Feb 2, 2025
900c1f9
Fix focus range
lukebrody Feb 2, 2025
e372205
Fix search
lukebrody Feb 2, 2025
bd3ac63
update screenshot
lukebrody Feb 2, 2025
8aa1c96
Increase errors and add test
lukebrody Feb 2, 2025
442779f
add some tests
kavigupta Feb 3, 2025
54da5e6
some tests were bad
kavigupta Feb 3, 2025
cffddd5
Reference token objects directly, rather than with indices
lukebrody Feb 3, 2025
4e5f4dd
Merge branch 'better-search' of https://github.com/kavigupta/urbansta…
lukebrody Feb 3, 2025
94406cb
bump version
lukebrody Feb 3, 2025
4f4b7db
Merge branch 'master' of https://github.com/kavigupta/urbanstats into…
lukebrody Feb 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions react/eslint.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@ export default tseslint.config(
ignore: ['eslint-enable']
}],
"@typescript-eslint/method-signature-style": ["error", "property"], // https://www.totaltypescript.com/method-shorthand-syntax-considered-harmful
'@typescript-eslint/no-inferrable-types': 'off',
},
},
{
Expand Down
138 changes: 48 additions & 90 deletions react/src/components/search.tsx
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
import React, { CSSProperties, ReactNode, useEffect, useMemo, useRef, useState } from 'react'
import React, { CSSProperties, ReactNode, useEffect, useRef, useState } from 'react'

import { loadProtobuf } from '../load_json'
import { Navigator } from '../navigation/Navigator'
import { useColors } from '../page_template/colors'
import { useSetting } from '../page_template/settings'
import { isHistoricalCD } from '../utils/is_historical'
import '../common.css'
import { SearchParams } from '../search'

export function SearchBox(props: {
onChange?: (inp: string) => void
Expand All @@ -21,18 +20,17 @@ export function SearchBox(props: {

// Keep these in sync
const [query, setQuery] = useState('')
const normalizedQuery = useRef('')
const queryRef = useRef('')

const [focused, setFocused] = React.useState(0)

const searchQuery = normalizedQuery.current
const firstCharacter = searchQuery.length === 0 ? undefined : searchQuery[0]
const searchQuery = queryRef.current

const indexCache = useMemo(() => firstCharacter === undefined ? undefined : loadProtobuf(`/index/pages_${firstCharacter}.gz`, 'SearchIndex'), [firstCharacter])
const searchWorker = useRef<SearchWorker | undefined>()

const reset = (): void => {
setQuery('')
normalizedQuery.current = ''
queryRef.current = ''
setMatches([])
setFocused(0)
}
Expand Down Expand Up @@ -70,36 +68,24 @@ export function SearchBox(props: {

// Do the search
useEffect(() => {
if (indexCache === undefined) {
// Occurs when query is empty
setMatches([])
setFocused(0)
return
}
void indexCache.then(({ elements, priorities }) => {
// we can skip searching if the query has changed since we were waiting on the indexCache
if (normalizedQuery.current !== searchQuery) {
void (async () => {
if (searchQuery === '') {
setMatches([])
setFocused(0)
return
}

let matchesNew = []
for (let i = 0; i < elements.length; i++) {
const matchCount = isAMatch(searchQuery, normalize(elements[i]))
if (matchCount === 0) {
continue
}
if (!showHistoricalCDs) {
if (isHistoricalCD(elements[i])) {
continue
}
}
matchesNew.push([matchCount, i, matchCount - priorities[i] / 10])
if (searchWorker.current === undefined) {
searchWorker.current = createSearchWorker()
}
matchesNew = top10(matchesNew)
matchesNew = matchesNew.map(idx => elements[idx])
setMatches(matchesNew)
})
}, [searchQuery, indexCache, showHistoricalCDs])
const result = await searchWorker.current({ unnormalizedPattern: searchQuery, maxResults: 10, showHistoricalCDs })
// we should throw away the result if the query has changed since we submitted the search
if (queryRef.current !== searchQuery) {
return
}
setMatches(result)
setFocused(f => Math.max(0, Math.min(f, result.length - 1)))
})()
}, [searchQuery, showHistoricalCDs, searchWorker])

return (
<form
Expand All @@ -113,14 +99,23 @@ export function SearchBox(props: {
type="text"
className="serif"
style={{
...props.style }}
...props.style,
}}
placeholder={props.placeholder}
onKeyUp={onTextBoxKeyUp}
onChange={(e) => {
setQuery(e.target.value)
normalizedQuery.current = normalize(e.target.value)
queryRef.current = e.target.value
}}
value={query}
onFocus={() => {
if (searchWorker.current === undefined) {
searchWorker.current = createSearchWorker()
}
}}
onBlur={() => {
searchWorker.current = undefined
}}
/>

<div
Expand Down Expand Up @@ -171,59 +166,22 @@ export function SearchBox(props: {
)
}

function top10(matches: number[][]): number[] {
const numPrioritized = 3
const sortKey = (idx: number) => {
return (a: number[], b: number[]) => {
if (a[idx] !== b[idx]) {
return b[idx] - a[idx]
}
return a[1] - b[1]
}
}
matches.sort(sortKey(2))
const overallMatches = []
for (let i = 0; i < Math.min(numPrioritized, matches.length); i++) {
overallMatches.push(matches[i][1])
matches[i][0] = -100
}
matches.sort(sortKey(0))
for (let i = 0; i < Math.min(10 - numPrioritized, matches.length); i++) {
if (matches[i][0] === -100) {
break
}
overallMatches.push(matches[i][1])
}
return overallMatches
}
const workerTerminatorRegistry = new FinalizationRegistry<Worker>((worker) => { worker.terminate() })

/*
Check whether a is a substring of b (does not have to be contiguous)

*/
function isAMatch(a: string, b: string): number {
let i = 0
let matchCount = 0
let prevMatch = true
// eslint-disable-next-line @typescript-eslint/prefer-for-of -- b is a string
for (let j = 0; j < b.length; j++) {
if (a[i] === b[j]) {
i++
if (prevMatch) {
matchCount++
}
prevMatch = true
}
else {
prevMatch = false
}
if (i === a.length) {
return matchCount + 1
}
}
return 0
}
type SearchWorker = (params: SearchParams) => Promise<string[]>

function normalize(a: string): string {
return a.toLowerCase().normalize('NFD').replace(/[\u0300-\u036f]/g, '')
function createSearchWorker(): SearchWorker {
const worker = new Worker(new URL('../searchWorker', import.meta.url))
const messageQueue: ((results: string[]) => void)[] = []
worker.addEventListener('message', (message: MessageEvent<string[]>) => {
messageQueue.shift()!(message.data)
})
const result: SearchWorker = (params) => {
worker.postMessage(params)
return new Promise((resolve) => {
messageQueue.push(resolve)
})
}
workerTerminatorRegistry.register(result, worker)
return result
}
2 changes: 1 addition & 1 deletion react/src/page_template/template.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ function TemplateFooter(): ReactNode {
}

function Version(): ReactNode {
return <span id="current-version">23.6.3</span>
return <span id="current-version">23.7.0</span>
}

function LastUpdated(): ReactNode {
Expand Down
Loading
Loading