[0.18.31] Searching production for Uranium shows things unrelated to Uranium
[0.18.31] Searching production for Uranium shows things unrelated to Uranium
What did you do?
I searched the Production log for Uranium
What happened?
In addition to Uranium ore, U-235 and U-238, copper wire, rails and repair packs also were listed.
What did you expect to happen instead? It might be obvious to you, but do it anyway!
I expected only Uranium ore, U-235 and U-238 to be listed.
I searched the Production log for Uranium
What happened?
In addition to Uranium ore, U-235 and U-238, copper wire, rails and repair packs also were listed.
What did you expect to happen instead? It might be obvious to you, but do it anyway!
I expected only Uranium ore, U-235 and U-238 to be listed.
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
This is a result of having fuzzy search turned on, you can turn it off in the settings.
I'm an admin over at https://wiki.factorio.com. Feel free to contact me if there's anything wrong (or right) with it.
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Heh, i looked into why it happens:
As you can see, it clearly matches
most of the letters goes from description, but last 2 "UM" goes from "default-semibold" of the font tag
Code: Select all
44.916 Info StringMatcher.cpp:71: can also be used to manually connect and disconnect electric poles and power switches with [font=default-semibold][color=#80cef0]left mouse button[/color][/font]. | uranium | true
44.916 Info StringMatcher.cpp:71: copper cable | uranium | false
most of the letters goes from description, but last 2 "UM" goes from "default-semibold" of the font tag
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
One would argue that matching shouldn't be affected by font or color tags.boskid wrote: ↑Fri Jun 12, 2020 8:31 pmHeh, i looked into why it happens:As you can see, it clearly matchesCode: Select all
44.916 Info StringMatcher.cpp:71: can also be used to manually connect and disconnect electric poles and power switches with [font=default-semibold][color=#80cef0]left mouse button[/color][/font]. | uranium | true 44.916 Info StringMatcher.cpp:71: copper cable | uranium | false
most of the letters goes from description, but last 2 "UM" goes from "default-semibold" of the font tag
Bug.
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
From a UX perspective, I agree.
One wouldn't reasonably expect a search in the production GUI to match on "hidden" tags, any more than they'd expect a "ctrl+f" search in a browser to match on HTML tags.
From the user standpoint, this seems like bad behavior indeed.
-
- Inserter
- Posts: 49
- Joined: Sat Mar 28, 2020 2:10 pm
- Contact:
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Uhm, yeah, but that ignores the elephant in the room. Just looking for the characters in the order they are in the search word without regard to what's in between those characters is certainly a creative, but not a very useful implementation of "fuzzy search". Just look at how it actually found the "uranium" in the description string (assuming I understood boskid correctly):
Fuzzy search is generally supposed to find matches that are similar to the search phrase, to find something even if the spelling isn't 100% right. I wouldn't call two strings similar just because one happens to have the characters from the other strewn around in random places that just so happen to be in the right order...can also be Used to manually connect and disconnect electRic poles ANd power swItches with (font=defaUlt-seMibold)(color=#80cef0)left mouse button(/color)(/font).
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Makes me think that the search algo may be a regex with ".*" for the allowed characters inbetween. Then maybe ".{0,x}" could ease the problem.
But this does not address the problem of character displacement in the search string like in "uanruim". A proper fuzzy search algo like levenshtein can handle this
( https://en.wikipedia.org/wiki/Levenshtein_distance ),
but may be too expensive, as it is of O(strLenght1 * strLength2).
But this does not address the problem of character displacement in the search string like in "uanruim". A proper fuzzy search algo like levenshtein can handle this
( https://en.wikipedia.org/wiki/Levenshtein_distance ),
but may be too expensive, as it is of O(strLenght1 * strLength2).
-
- Inserter
- Posts: 49
- Joined: Sat Mar 28, 2020 2:10 pm
- Contact:
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Levenshtein itself is only suitable for a fuzzy string comparison, not for a fuzzy substring search.
None of the on-line (meaning that only the search pattern can be preprocessed, there's no prior indexing of the data to be searched) fuzzy search algorithms are really performance wonders. However, we aren't talking about something that has to search megabytes of data on every tick 60 times a second, we are talking about occasionally searching a few hundred, maybe a few thousand in a heavily modded game, strings for a generally short pattern. The bitap algorithm as implemented by the agrep utility for example can search ~560,000 lines of logfiles (~54MB data in total) in less than .2 seconds on my machine. My PC has a Ryzen 5 3600X CPU, so no slouch, however with the much smaller relevant dataset in Factorio it should be plenty of fast enough for an interactive search even on a potato.
None of the on-line (meaning that only the search pattern can be preprocessed, there's no prior indexing of the data to be searched) fuzzy search algorithms are really performance wonders. However, we aren't talking about something that has to search megabytes of data on every tick 60 times a second, we are talking about occasionally searching a few hundred, maybe a few thousand in a heavily modded game, strings for a generally short pattern. The bitap algorithm as implemented by the agrep utility for example can search ~560,000 lines of logfiles (~54MB data in total) in less than .2 seconds on my machine. My PC has a Ryzen 5 3600X CPU, so no slouch, however with the much smaller relevant dataset in Factorio it should be plenty of fast enough for an interactive search even on a potato.
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
There is no need to preprocess the pattern or prepare the data (index). There is less than 1000 elements to be searched, most less than 20 character, and the pattern is also pretty short, and only input by the human. The most naive implementations would be fast even on a potato.blahfasel2000 wrote: ↑Thu Jun 18, 2020 2:14 amLevenshtein itself is only suitable for a fuzzy string comparison, not for a fuzzy substring search.
None of the on-line (meaning that only the search pattern can be preprocessed, there's no prior indexing of the data to be searched) fuzzy search algorithms are really performance wonders. However, we aren't talking about something that has to search megabytes of data on every tick 60 times a second, we are talking about occasionally searching a few hundred, maybe a few thousand in a heavily modded game, strings for a generally short pattern. The bitap algorithm as implemented by the agrep utility for example can search ~560,000 lines of logfiles (~54MB data in total) in less than .2 seconds on my machine. My PC has a Ryzen 5 3600X CPU, so no slouch, however with the much smaller relevant dataset in Factorio it should be plenty of fast enough for an interactive search even on a potato.
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
Can you elaborate on this? A levenshtein distance can be calculated for any 2 strings. In what ways does a string comparison differ from a substring search in the context of levenshtein? I am inclined to think, that it makes no difference to a levenshtein algo.blahfasel2000 wrote: ↑Thu Jun 18, 2020 2:14 amLevenshtein itself is only suitable for a fuzzy string comparison, not for a fuzzy substring search.
...
Re: [0.18.31] Searching production for Uranium shows things unrelated to Uranium
All this talk of elephants and Levenshtein's potato is a bit kooky. Come on! Repair packs and Rails should totally be a part of the fuzzy search results for uranium!