Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a highlighter #205

Open
valencik opened this issue Mar 30, 2024 · 2 comments
Open

Add a highlighter #205

valencik opened this issue Mar 30, 2024 · 2 comments

Comments

@valencik
Copy link
Contributor

It's important to show users their query in the context of the resulting documents.
Consider the below example where the terms cats, effect, and effects are bolded in the search results display:

Screenshot 2024-03-30 at 10-38-36 cats-effect at DuckDuckGo

The design space for a highlighter is reasonably large. Lucene has several implementations.
I'm hoping we can get something basic without too much trouble.

@valencik
Copy link
Contributor Author

Collecting some rough thoughts here for a first attempt.

for each doc in docs
  for each fragment in doc
    score query against fragment
    update max scoring fragment for doc
  format max scoring fragment

What the heck is a fragment? Good question.
Ideally it's a small enough snippet of document content that you can comfortably render it on your search engine results page.
This could be "sentences", maybe it's "paragraphs", or perhaps "sections".
Clearly this would need to be configurable, as it depends a lot on your document structure.

Hopefully we can reuse a lot of existing pieces here.
For example, if we can get fragments for each doc then we can index the fragments as if they were documents, query that new fragment index, and take the top result.
Can we prepare some of this ahead of time? If we record the fragment boundaries at indexing time, perhaps we wouldn't need to create a new fragment index during the highlighting stage.

@valencik
Copy link
Contributor Author

Some initial work done in #255

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant