Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message search: substring matching #1437

Open
jhwheeler opened this issue Jan 20, 2025 · 4 comments
Open

Message search: substring matching #1437

jhwheeler opened this issue Jan 20, 2025 · 4 comments
Assignees

Comments

@jhwheeler
Copy link

jhwheeler commented Jan 20, 2025

Problem

Stream Chat's message search seems to lack support for continuous substring/text matching. The current operators either match individual tokenized words ($q) or require exact full string matches ($eq). Unless I'm missing something -- which I hope I am!

Current Behavior

When searching for the phrase "a b", using text: { $q: "a b" }, we get these results:

  • "a b" (good match)
  • "a b c" (good match)
  • "a" (unwanted match)
  • "a message" (unwanted match)
  • "in a row" (unwanted match)

Even wrapping in quotes doesn't help:

text: { $q: '"a b"' }

Desired Behavior

We need to match messages where the search string appears as a continuous substring. For example, searching for "a b" should match:

"a b"
"a b c"
"hello a b there"

But NOT match:

"a thing b"
"this is a message with b"

Why Client-Side Filtering Isn't an Ideal Solution

While results could be filtered client-side, this breaks pagination:

  1. User searches for "a b"
  2. Server returns first page of 100 results, many without the exact phrase
  3. After client-side filtering, might only have 20 actual matches
  4. When user scrolls, we need multiple page requests to get enough true matches
  5. Results in excessive API calls and inaccurate result counts

Current Implementation

const messageFilter: MessageFilters = {
  text: { $q: `"${searchTerm}"` }
};

const sort: SearchMessageSort<DefaultGenerics> = [{ created_at: -1 }];

const results = await chatClient.search(
  channelFilter,
  messageFilter,
  {
    sort,
    limit: PAGE_SIZE
  }
);

(I added the quotes around the searchTerm in order to try to force the substring matching, but to no avail.)

Proposal

Hopefully, there is a way to do this already, and I simply haven't been able to figure it out. If so, please disabuse me of my ignorance and let me know how to do this with Stream Chat's existing capabilities.

If not, then I propose we add support for MongoDB's $regex operator for message search. This would allow for flexible pattern matching including continuous substrings:

text: { $regex: "a b" }

This is a well-established pattern that would solve the continuous text matching problem while leveraging existing MongoDB functionality. It would also provide additional flexibility for other search patterns when needed.

Thank you!

@szuperaz szuperaz self-assigned this Jan 20, 2025
@szuperaz
Copy link
Contributor

Hi,

We don't use MongoDB on our backend, so we can't leverage existing MongoDB functionality. That being said we plan to add support for this feature, but it'll be part of a bigger overhaul of the search functionality, and I can't give you an ETA for this.

@jhwheeler
Copy link
Author

Thank you for the update, @szuperaz 🙏

@jhwheeler
Copy link
Author

@szuperaz I've noticed now that this also seems to apply to substrings within words that are not at the beginning of the word. For example, if there is a message foobar and I search for oo or oobar, it doesn't match; it only works if I include the f, e.g. fo or foob, etc.

Is this also not supported? Seems like quite a large limitation...

@szuperaz
Copy link
Contributor

I'm able to reproduce this, but it's the same as the previous issue; I can't give an ETA for a fix; we plan to tackle this with our search overhaul. I also passed along your other requests for consideration for the overhaul.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants