-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove IN BOOLEAN MODE #14
base: master
Are you sure you want to change the base?
Conversation
Removed IN BOOLEAN MODE, as it does not seem to add any features.
Hi there, thanks for the contribution, nice to see you use this library. There is quite a few reasons why things could not result in data when using boolean mode. Things like index token size can sometimes make results unpredictbable. Also InnoDB and MyISAM handle fulltext rather differently. From what i remember the boolean mode is also a performance consideration. I will need to test this against a few of our running applications to see how this affects results. Hopefully i can find the time later this week and see if the results are still sane. Also i'll try and dig in to see if i can find out what the considerations where to use boolean mode. |
Hi, Ah, if it is for performance then ignore my pull request. |
Hi, @WebDevEtc, I spend the last day figuring out this issue. The reason why this is happening is that by default, MySQL/MariaDB collations treat space(" "), periods("."), and commas(",") as punctuation. Long story short, collations "weight" characters to determine how to filter or sort them. The punctuations mentioned above are considered EOL or 'stopwords.' We need to have MySQL/MariaDB treat those punctuations as characters rather than punctuations to solve this issue. We are presented with three solutions in the MySQL documentation. The first one requires changing the source code and recompiling, which isn't a very viable option for me. The second and third options are good and aren't too hard to follow.
First things first:We need to know which character we're trying to fix. Take a look link below and find the HEX equivalent to the character you're trying to fix. In my case, it was 2E, the period. Now, we need to find the collation files in the database server.
The result should return a table with a value of a directory path. I was using docker, so mine came back as At this point, I opened a second terminal, but this is necessary. Back in the server, outside of the MySQL/MariaDB command line:
NOTE: Before continuing the second step, open latin1.xml and look closely at the
Assign the User-defined Collation to our database/table/colum.All we need to do is assign our collation to our database, table, or column. In my case, I just needed to assign it to two columns, so I ran the following command: Here are some links that might be helpful: This should solve your problem if you don't have any existing data in the table. If you do have existing data and you try to run the query above, you might have gotten an error similar to the one below:
The issue here is due to attempting to convert a 4byte character into a 3byte character. To solve this, we need to convert our data from 4bytes to binary, then to 3bytes(latin1). For more info, check out this link. Run the following query in the mysql/mariadb command line: Then follow it with: We are done. We can now search a term with our character, and our database engine will match against it. |
Removed IN BOOLEAN MODE, as it does not seem to add any features.
I like this package, have used it on several projects and also on https://github.com/WebDevEtc/BlogEtc
However, I've had a problem with some searches not showing any results. I had a play around, and it works if I remove the boolean mode parts of the query.
for example, if I change the (generated) SQL query from:
to
(the first query shows zero results, the second shows results, as it should as there is content in
indexed_title
with the exact text "what is the difference between something and something else". I don't really why know it wasn't working).I did some digging in the source code, and as far as I can tell, it isn't possible for the user to add any boolean search queries, as in
TermBuilder::terms()
it will remove any boolean characters (such as-
or+
).So, I was wondering - why is the boolean mode stuff there (the
+
before each word, and theIN BOOLEAN MODE
)?I thought that maybe it used to be possible to do boolean searches (such as
restaurant -pizza
, but then with this fix: #9 that functionality got removed, but the "IN BOOLEAN MODE" parts stayed?Anyway, I've removed the IN BOOLEAN MODE. Maybe there was some other reason that I missed, and it should stay. If that is the case, I'd be interested in the reason why.
I've attached a pull request.
Description
Removed
$termsBool
andIN BOOLEAN MODE
.Tested on local machine and a production machine, it works fine.
Motivation and context
See above
How has this been tested?
Tested, seems ok.
Screenshots (if appropriate)
n/a
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Checklist: