Page 1 of 1
Forum search discards too many common words
Posted: Thu Sep 25, 2025 1:35 pm
by Koub
I just ran an advanced search looking for the
old,
mod, and
version words (
search.php?keywords=old+mod+version&ter ... mit=Search)
Result came back ignoring
old and
mod, and flooding me with 50+ pages of posts containing the "
version" word (see below).

- 2025-09-25 15_33_10-Factorio Forums - Search et 26 pages de plus - Profil 1 – Microsoft Edge.jpg (12.4 KiB) Viewed 2534 times
Would it be possible to set up the forum search no accept any string with length >= 2 during a search please ? I'm a heavy search user, and I really need to be able to search even common words.
Re: Forum search discards too many common words
Posted: Thu Sep 25, 2025 2:14 pm
by Loewchen
Happened around the last forum update:
viewtopic.php?p=661790#p661790.
Re: Forum search discards too many common words
Posted: Thu Sep 25, 2025 3:23 pm
by Sanqui
Let me offer some insight into the search situation. phpBB offers three search backends that we can use:
phpBB Native Fulltext,
MySQL Fulltext, and
Sphinx Fulltext.
We were using
phpBB Native Fulltext for years. There were reports that it had multiple issues, such as
not supporting words with underscores as well as often failing to find all results.
In January this year, I switched the backend to use
MySQL Fulltext. This backend appears to be reliable in finding posts, and it performs better with quoted queries (supports
underscores and
phrases), but as this thread notes, when not using quotes it does have a limit of 4 characters per word which doesn't appear possible to change. You are able to search for the phrase
"old mod version".
The third option is Sphinx, an
open source search server. Its website is a bit dated as it features a "Chat on Skype" link on the homepage, but it did receive one fresh release this year, so it's not completely abandoned. What are its advantages and what exciting new issues it would surface is unclear to me without setting it up to try... so if there is interest I can bump the priority.
EDIT: There is a
MySQL config variable to change the minimum word length too. I'm not sure if phpBB will understand if it's changed but it's something I will attempt.
Re: Forum search discards too many common words
Posted: Thu Sep 25, 2025 4:02 pm
by Sanqui
I've set the MySQL variables ft_min_word_len and innodb_ft_min_token_size to 2, confirmed that the setting is applied on the server, and rebuilt the post index, but so far I can't see a difference in the search results...
Re: Forum search discards too many common words
Posted: Thu Sep 25, 2025 4:25 pm
by Amarula
Ah the good old days when my search for the error message "there are no trains" returned no results because all the words were stripped out as common! Happy to report that at least now I get results when I search using quotation marks.
Re: Forum search discards too many common words
Posted: Thu Sep 25, 2025 5:15 pm
by eugenekay
Sanqui wrote: Thu Sep 25, 2025 3:23 pmThe third option is Sphinx, an
open source search server. Its website is a bit dated as it features a "Chat on Skype" link on the homepage, but it did receive one fresh release this year, so it's not completely abandoned. What are its advantages and what exciting new issues it would surface is unclear to me without setting it up to try... so if there is interest I can bump the priority.
Sphinx /
searchd are basically "finished software", so it is not surprising that the release cadence has slowed since 2001. Please note that
Version 3 and above are
no longer GPL Licensed or open source - and it is seemingly maintained maintained by a single Developer / Corporation. The
Version 2 codebase was forked in 2017 to become
Manticore Search. I have never tested it with phpBB, but it
reportedly works good. Ready-to-install packages are provided for most major Linux Distributions. Be aware that some configuration options have been Deprecated or Renamed - which may lead to unexpected startup errors.
The biggest problem I have seen with Sphinx installations has been the Indexer choking the system on Disk Access - this was in the era before Solid State Disks were commonplace.
Re: Forum search discards too many common words
Posted: Thu Sep 25, 2025 7:50 pm
by Koub
Sanqui wrote: Thu Sep 25, 2025 4:02 pm
I've set the MySQL variables
ft_min_word_len and
innodb_ft_min_token_size to
2, confirmed that the setting is applied on the server, and rebuilt the post index, but so far I can't see a difference in the search results...
Doesn't seem to change anything, the words
old, and
mod are still ignored in my search :

- 2025-09-25 21_47_38-Window.jpg (15.2 KiB) Viewed 2361 times
Note that I'm searching for any combination of the words
old,
mod, and
version anywhere in the post, and not for the string "
old mod version".
Re: Forum search discards too many common words
Posted: Thu Sep 25, 2025 8:01 pm
by Tertius
The 3 letter words seem to be already ignored by the forum software, not even being sent to the mysql search engine. Otherwise it wouldn't be indicated as "ignored". There must be some setting to increase the length within the forum software as well. Or it read from the mysql config, cached the value and didn't update when the index length was modified within mysql.
Re: Forum search discards too many common words
Posted: Thu Sep 25, 2025 8:20 pm
by eugenekay
Tertius wrote: Thu Sep 25, 2025 8:01 pmThere must be some setting to increase the length within the forum software as well,
fulltext_mysql.php:
Code: Select all
// check word length
$clean_len = utf8_strlen(str_replace('*', '', $clean_word));
if (($clean_len < $this->config['fulltext_mysql_min_word_len']) || ($clean_len > $this->config['fulltext_mysql_max_word_len']))
{
$this->common_words[] = $word;
unset($this->split_words[$i]);
}
The
fulltext_mysql_min_word_len configuration value (in phpBB) has a
default Value of 4. The computed array of "common words" is later used by the actual
search function to craft the "error message" shown in the Screenshot.
PHP is a hell of a drug.
Re: Forum search discards too many common words
Posted: Thu Oct 09, 2025 1:51 am
by Osmo
I find myself somewhat frequently trying to search for "GUI" in Bug Reports and modding forums
Re: Forum search discards too many common words
Posted: Mon Oct 13, 2025 9:46 am
by Sanqui
So I went ahead and changed the
fulltext_mysql_min_word_len setting to
2 manually in the database. After clearing the cache and together with the change made to MySQL/MariaDB earlier, it seems like it is now possible to search for
old mod version as Koub wished. 3- and 2- character words are no longer ignored.
Re: Forum search discards too many common words
Posted: Mon Oct 13, 2025 12:27 pm
by Rseding91
Sanqui wrote: Mon Oct 13, 2025 9:46 am
… 3- and 2- character words are no longer ignored.
Nice!
Re: Forum search discards too many common words
Posted: Mon Oct 13, 2025 2:06 pm
by Loewchen
Thank you, having the letter limit removed is great. There are still many words that just cannot be searched for though, it seems to run against an English frequency list while at the same time some popular words work.
E.g. the search for
cannot use much more known example without it today will actually just search for "today", ignoring the other words while pretending to actually use them:
Searched query: +cannot +use +much +more +known +example +without +it +today
There are of course words that a fine to discard, but words like
without, cannot, example, not... would be great to be able to use.
Re: Forum search discards too many common words
Posted: Mon Oct 13, 2025 2:42 pm
by Koub
Sanqui wrote: Mon Oct 13, 2025 9:46 am
[...]
Much wow so unexpected, I'm very grateful, thanks

Re: Forum search discards too many common words
Posted: Mon Oct 13, 2025 2:55 pm
by eugenekay
Loewchen wrote: Mon Oct 13, 2025 2:06 pm
Thank you, having the letter limit removed is great. There are still many words that just cannot be searched for though, it seems to run against an English frequency list while at the same time more popular words work.
MariaDB - Full-Text Index Stopwords
Depending upon the Table type MySQL/MariaDB will (by default) use a built-in list of Stopwords. It is not clear (as a user) if the MyISAM or the InnoDB table type is used for this forum... These words from the query appear on the MyISAM list:
"cannot"
"example"
"it"
"known"
"more"
"much"
"use"
"without"
Whereas the InnoDB list only matches "it". This suggests that the generally-less-performant MyISAM table type is being used! Switching to InnoDB is almost always a good idea, unless the application's SQL behaviour precludes it. Documentation dating from the 2010s-era claims that InnoDB does not support FULLTEXT search - this was changed with MySQL 5.6 release, so it
ought to work fine. Converting tables between engine types is a bit of an adventure.
There are corresponding MySQL/MariaDB system variables which can be used to override this behavior by including a custom (empty) stopword file:
ft_stopword_file and
innodb_ft_user_stopword_table.
Good Luck!
Re: Forum search discards too many common words
Posted: Mon Oct 13, 2025 3:54 pm
by Loewchen
eugenekay wrote: Mon Oct 13, 2025 2:55 pm
...
Thanks for the info, I don't know what the impact for the server would be, but the default InnoDB would be perfect. I would gladly do without any 2 letter words if that is necessary to compensate.
Re: Forum search discards too many common words
Posted: Mon Oct 13, 2025 4:14 pm
by eugenekay
Loewchen wrote: Mon Oct 13, 2025 3:54 pm
Thanks for the info, I don't know what the impact for the server would be, but the default InnoDB would be perfect. I would gladly do without any 2 letter words if that is necessary to compensate.
Each workload / dataset will benchmark differently under either engine, so "it depends"
In general, modern InnoDB (meaning: SSDs and 3+ Ghz multiple-core processors) is as-fast-or-faster than MyISAM, while supporting "real SQL" features like Transactions, rollbacks, and more-reliable Replication. In the
old days MyISAM was
usually faster for simple benchmarks; but the enforced global Lock during writes made it slower for multiple users in real workloads.
In a perfect world everything would be on Postgres (or SQLite for simple apps).
