Page 1 of 1
[0.18.31] Searching production for Uranium shows things unrelated to Uranium
Posted: Fri Jun 12, 2020 4:47 pm
by MiniHerc
What did you do?
I searched the Production log for Uranium
What happened?
In addition to Uranium ore, U-235 and U-238, copper wire, rails and repair packs also were listed.
What did you expect to happen instead? It might be obvious to you, but do it anyway!
I expected only Uranium ore, U-235 and U-238 to be listed.

Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Posted: Fri Jun 12, 2020 4:56 pm
by Bilka
This is a result of having fuzzy search turned on, you can turn it off in the settings.

Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Posted: Fri Jun 12, 2020 5:22 pm
by MiniHerc
Bilka wrote: Fri Jun 12, 2020 4:56 pm
This is a result of having fuzzy search turned on, you can turn it off in the settings.
Thanks, but how the hell does fuzzy search get copper wire from searching uranium ?!?
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Posted: Fri Jun 12, 2020 8:31 pm
by boskid
Heh, i looked into why it happens:
Code: Select all
44.916 Info StringMatcher.cpp:71: can also be used to manually connect and disconnect electric poles and power switches with [font=default-semibold][color=#80cef0]left mouse button[/color][/font]. | uranium | true
44.916 Info StringMatcher.cpp:71: copper cable | uranium | false
As you can see, it clearly matches

most of the letters goes from description, but last 2 "UM" goes from "default-semibold" of the font tag
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Posted: Wed Jun 17, 2020 7:16 pm
by movax20h
boskid wrote: Fri Jun 12, 2020 8:31 pm
Heh, i looked into why it happens:
Code: Select all
44.916 Info StringMatcher.cpp:71: can also be used to manually connect and disconnect electric poles and power switches with [font=default-semibold][color=#80cef0]left mouse button[/color][/font]. | uranium | true
44.916 Info StringMatcher.cpp:71: copper cable | uranium | false
As you can see, it clearly matches

most of the letters goes from description, but last 2 "UM" goes from "default-semibold" of the font tag
One would argue that matching shouldn't be affected by font or color tags.
Bug.
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Posted: Wed Jun 17, 2020 7:56 pm
by invisus
movax20h wrote: Wed Jun 17, 2020 7:16 pm
One would argue that matching shouldn't be affected by font or color tags.
Bug.
From a UX perspective, I agree.
One wouldn't reasonably expect a search in the production GUI to match on "hidden" tags, any more than they'd expect a "ctrl+f" search in a browser to match on HTML tags.
From the user standpoint, this seems like bad behavior indeed.
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Posted: Wed Jun 17, 2020 9:11 pm
by blahfasel2000
movax20h wrote: Wed Jun 17, 2020 7:16 pm
One would argue that matching shouldn't be affected by font or color tags.
Uhm, yeah, but that ignores the elephant in the room. Just looking for the characters in the order they are in the search word without regard to what's in between those characters is certainly a creative, but not a very useful implementation of "fuzzy search". Just look at how it actually found the "uranium" in the description string (assuming I understood boskid correctly):
can also be Used to manually connect and disconnect electRic poles ANd power swItches with (font=defaUlt-seMibold)(color=#80cef0)left mouse button(/color)(/font).
Fuzzy search is generally supposed to find matches that are similar to the search phrase, to find something even if the spelling isn't 100% right. I wouldn't call two strings similar just because one happens to have the characters from the other strewn around in random places that just so happen to be in the right order...
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Posted: Wed Jun 17, 2020 10:48 pm
by Impatient
Makes me think that the search algo may be a regex with ".*" for the allowed characters inbetween. Then maybe ".{0,x}" could ease the problem.
But this does not address the problem of character displacement in the search string like in "uanruim". A proper fuzzy search algo like levenshtein can handle this
(
https://en.wikipedia.org/wiki/Levenshtein_distance ),
but may be too expensive, as it is of O(strLenght1 * strLength2).
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Posted: Thu Jun 18, 2020 2:14 am
by blahfasel2000
Levenshtein itself is only suitable for a fuzzy string comparison, not for a fuzzy substring search.
None of the on-line (meaning that only the search pattern can be preprocessed, there's no prior indexing of the data to be searched) fuzzy search algorithms are really performance wonders. However, we aren't talking about something that has to search megabytes of data on every tick 60 times a second, we are talking about occasionally searching a few hundred, maybe a few thousand in a heavily modded game, strings for a generally short pattern. The bitap algorithm as implemented by the agrep utility for example can search ~560,000 lines of logfiles (~54MB data in total) in less than .2 seconds on my machine. My PC has a Ryzen 5 3600X CPU, so no slouch, however with the much smaller relevant dataset in Factorio it should be plenty of fast enough for an interactive search even on a potato.
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Posted: Thu Jun 18, 2020 6:24 pm
by movax20h
blahfasel2000 wrote: Thu Jun 18, 2020 2:14 am
Levenshtein itself is only suitable for a fuzzy string comparison, not for a fuzzy substring search.
None of the on-line (meaning that only the search pattern can be preprocessed, there's no prior indexing of the data to be searched) fuzzy search algorithms are really performance wonders. However, we aren't talking about something that has to search megabytes of data on every tick 60 times a second, we are talking about occasionally searching a few hundred, maybe a few thousand in a heavily modded game, strings for a generally short pattern. The bitap algorithm as implemented by the agrep utility for example can search ~560,000 lines of logfiles (~54MB data in total) in less than .2 seconds on my machine. My PC has a Ryzen 5 3600X CPU, so no slouch, however with the much smaller relevant dataset in Factorio it should be plenty of fast enough for an interactive search even on a potato.
There is no need to preprocess the pattern or prepare the data (index). There is less than 1000 elements to be searched, most less than 20 character, and the pattern is also pretty short, and only input by the human. The most naive implementations would be fast even on a potato.
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Posted: Thu Jun 18, 2020 6:53 pm
by Impatient
blahfasel2000 wrote: Thu Jun 18, 2020 2:14 am
Levenshtein itself is only suitable for a fuzzy string comparison, not for a fuzzy substring search.
...
Can you elaborate on this? A levenshtein distance can be calculated for any 2 strings. In what ways does a string comparison differ from a substring search in the context of levenshtein? I am inclined to think, that it makes no difference to a levenshtein algo.
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Posted: Thu Jun 18, 2020 9:18 pm
by netmand
All this talk of elephants and Levenshtein's potato is a bit kooky. Come on! Repair packs and Rails should totally be a part of the fuzzy search results for uranium!