Skip to content
This repository has been archived by the owner on Nov 2, 2020. It is now read-only.

Search for words including special characters does not produce expected result #3

Open
kahlep opened this issue May 2, 2017 · 0 comments
Labels

Comments

@kahlep
Copy link

kahlep commented May 2, 2017

Search for e.g. "wa§§er" does not match/highlight the results correctly.
Results include hits for "wa" and "er".

EDIT:
The tokenizer of Solr omits search words that are put in quotes when processing the query.
E.g. searching for "wa§§er" (with quotes) reduces the hits to 20 from 121 in collection 4048 (as users are working here, numbers may differ by now).
However, the hits also include strings like "wa= §§er".

While experimenting with the search feature for this issue I came across the following things:

  • Special character escaping: In TrpSearcher there is a searchText.trim().replaceAll(" ", "\ ") call which deals with single spaces. Older versions of Solr required escaping of characters, reserved by the query syntax (see https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Escaping%20Special%20Characters). Might this be needed here too, in order to make those chars searchable or is it already done elsewhere?
  • Although the quoted search term narrows results, the highlighting does not work as expected. This should be checked in postprocessing of solr result if there is possibly an issue with that.
  • Faulty pagination of results in TranskribusSwtGui: unclear if this is a bug within TranskribusSearch or in another component. Although there should be 10 (of 20) results on each page, when using the query from above the first page includes 8 hits and the second page 5.
@kahlep kahlep added the bug label May 2, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant