-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Irregular results #114
Comments
Hello, I noticed same kind of issue on YunoHost's documentation. |
Try disabling the stemmer and rebuilding the index after that. See if that helps. |
Already tried that. Same irregular results. |
Just tried your link ant it shows 2 same results for "geokoordina" and "geokoo". |
Cannot confirm. In fact all of this should output more that 2 results. |
I don't know enough about German language but the difference between "geokoordina" and "geokoordinat" looks like really a stemmer issue. Setting stemmer to 'no' should return the same results for both queries. |
Yeah, I thought the same. |
Without stemmer:
With stemmer:
Check if you really disabled the stemmer, i.e. set it to 'no', because at least the last difference is a stemmer issue. Stemming is a complex beast, so to properly debug issue I would disable it for now. |
Like I said. I tested it without stemmer at the beginning and two more times after. Results are the same. |
Same here: But if I add the final "e", I get no results: My config: enabled: true
search_route: /search
query_route: /s
built_in_css: true
built_in_js: true
built_in_search_page: true
enable_admin_page_events: true
search_type: auto
fuzzy: true
phrases: true
stemmer: "no"
display_route: true
display_hits: true
display_time: true
live_uri_update: true
limit: '20'
min: '3'
snippet: '300'
index_page_by_default: true
scheduled_index:
enabled: false
at: '30 3 * * *'
logs: logs/tntsearch-index.out
filter:
items:
- [email protected]
published: true
powered_by: false
search_object_type: Grav PHP:
Grav and TNTSearch are up to date. |
Did you rebuild index after disabling the stemmer? Delete old index file fully. Again this looks like a German stemmer issue:
|
Of course, I've rebuilt the index several times. And the German stemmer was never activated at any time. I've even deleted the index files before reindexing to make sure they are built from scratch.
EDIT: It looks like the indexing process does something weird, as I can't find the word "Spectre" in the index:
|
Same problem is still present here. |
Okay, I don't know why, but it seems the Indexer still used the PorterStemmer even though I had "no" in my config. Now after changing the value via the Grav Admin interface and then setting it back to "no" via text editor (the same thing I did yesterday), it seems to work correctly and the word "Spectre" is indexed fine. On a sidenote: Selecting "Disable" from the Grav Admin thingy turns the Yaml into |
Interesting indeed. Could be related to #116 which is still waiting for merge, unfortunately. Also check https://github.com/teamtnt/tntsearch/pull/243/files . Not sure which Grav version you are using and how up-to-date TNTSearch library it includes. |
I can also confirm, on v3.3.1, Grav v1.7.18. Stemmer does make a difference, but disabling it doesn't fix the problem.
With
With
I don't know if I'm setting something wrong, but this is too unreliable. |
@bgdnlp try with patches in https://github.com/teamtnt/tntsearch/pull/243/files and #116 |
this fixed my problem. thanks |
I've updated to the latest Grav 1.7.23 and applied the changes noted in #114 (comment) but I'm still not getting the desired results. Test case is a search for "spk", which should return "spk1000" and "spk7457", but only the first appears: A search for "spk7", returns "spk7457", which should also appear in the previous search: I don't believe I've missed anything, but here is a diff showing the changes I've applied:
|
@thekenshow your case is different than this issue. This issue deals with stemmer which operates only on normal words. If the numbers are involved you should create a separate issue ticket. |
Ah, good to know, thanks. Filed a new issue. |
Everything is up to date and freshly indexed. I have two pages containing the word "geokoordinaten" which is German and stands for Geocoordinates.
When searching for "geokoordinaten" i get 2 results, which is correct.
"geokoordina" outputs 0 results.
"geokoo" outputs 2 results again.
On the other hand if searching for "geoko" i get 2 results again, but different pages. Should this not output 4 pages?
I don't know if this is an issue or something is badly setup, but i don't see whats the problem here.
You can try it out here:
https://www.jfewo.de/docs/de/suche?q=geokoordinaten
This is the config:
The text was updated successfully, but these errors were encountered: