Cyrillic search is case sensitive
Cyrillic search is case sensitive
Hello!
I noticed, that with English alphabet searching for an intem in inventory menus is not case-sensitive. But in Russian version all searchings for Russian-named items are case-sensitive, and it makes inconvenience every time, when i need to find an intem in a menu.
Usually, i just drop a first letter of the item's name, cause it may be of any kind.
I noticed, that with English alphabet searching for an intem in inventory menus is not case-sensitive. But in Russian version all searchings for Russian-named items are case-sensitive, and it makes inconvenience every time, when i need to find an intem in a menu.
Usually, i just drop a first letter of the item's name, cause it may be of any kind.
Re: Cyrillic search is case sensitive
Thanks for the report however I don't think this will change. We don't have any language-agnostic system for converting a character to lowercase and so non-English characters have this issue.
If you want to get ahold of me I'm almost always on Discord.
- TruePikachu
- Filter Inserter
- Posts: 978
- Joined: Sat Apr 09, 2016 8:39 pm
- Contact:
Re: Cyrillic search is case sensitive
Is `std::tolower` not sufficient for this when provided with a suitable locale?
Re: Cyrillic search is case sensitive
Nope. std::tolower only supports single-character values - no UTF support.TruePikachu wrote: ↑Sun Dec 23, 2018 7:42 am Is `std::tolower` not sufficient for this when provided with a suitable locale?
If you want to get ahold of me I'm almost always on Discord.
Re: Cyrillic search is case sensitive
To get any multibyte support you have to use a multibyte aware container, std::string is not multibyte aware.
A more sane solution would be to use an unicode library for any string that can be unicode, for C++ i guess that'd be ICU http://site.icu-project.org/
A more sane solution would be to use an unicode library for any string that can be unicode, for C++ i guess that'd be ICU http://site.icu-project.org/
My Mods: mods.factorio.com
- TruePikachu
- Filter Inserter
- Posts: 978
- Joined: Sat Apr 09, 2016 8:39 pm
- Contact:
Re: Cyrillic search is case sensitive
Modern C++ (and C, for that matter) does have native Unicode support.
`std::basic_string` and its derivatives aren't naturallymultibyte variable-length-character aware, but by using e.g. `std::u32string` ≡ `std::basic_string<char32_t>` for string storage you can avoid the problems of variable-length characters, or by using e.g. `std::codecvt<char32_t, char, std::mbstate_t>` as a conduit for converting UTF-8 to and from UCS-4, one can keep the memory savings of using UTF-8 while still being able to resolve UCS-4 codepoints.
EDIT: Just did some testing, Windows doesn't appear to like doing locale-based case conversions in UCS-4, but everything is fine when using `wchar_t` as the intermediate (which is UCS-2 under Windows, and likely sufficient):
("FACTORI⚙" if anyone's curious)
`std::basic_string` and its derivatives aren't naturally
EDIT: Just did some testing, Windows doesn't appear to like doing locale-based case conversions in UCS-4, but everything is fine when using `wchar_t` as the intermediate (which is UCS-2 under Windows, and likely sufficient):
Code: Select all
#include <iomanip>
#include <iostream>
#include <locale>
#include <string>
using namespace std;
locale::id codecvt<char32_t,char,mbstate_t>::id;
int main() {
locale::global(locale("en_US.utf8"));
// UTF-8 encoded string
string data = u8"\uff26\uff21\uff23\uff34\uff2f\uff32\uff29\u2699";
cout << "UTF-8:";
for(auto c : data) {
cout << " 0x" << uppercase << hex << setw(2) << setfill('0')
<< static_cast<int>(static_cast<uint8_t>(c));
}
cout << endl;
// Conversion to wide string, not using C++17 depreciated functionality
auto& facet = use_facet<codecvt<wchar_t,char,mbstate_t>>(locale());
wstring wide(data.size(),'\0');
mbstate_t state = {};
const char* d_next;
wchar_t* w_next;
facet.in(state,
&data[0], &data[data.size()], d_next,
&wide[0], &wide[wide.size()], w_next);
wide.resize(w_next - &wide[0]);
cout << "Wide: ";
for(auto c : wide)
cout << " 0x" << uppercase << hex << setw(4) << setfill('0') << c;
cout << endl;
cout << "Lower:";
for(auto c : wide)
cout << " 0x" << uppercase << hex <<setw(4) << setfill('0')
<< tolower(c,locale());
return 0;
}
Code: Select all
UTF-8: 0xEF 0xBC 0xA6 0xEF 0xBC 0xA1 0xEF 0xBC 0xA3 0xEF 0xBC 0xB4 0xEF 0xBC 0xAF 0xEF 0xBC 0xB2 0xEF 0xBC 0xA9 0xE2 0x9A 0x99
Wide: 0xFF26 0xFF21 0xFF23 0xFF34 0xFF2F 0xFF32 0xFF29 0x2699
Lower: 0xFF46 0xFF41 0xFF43 0xFF54 0xFF4F 0xFF52 0xFF49 0x2699
Re: Cyrillic search is case sensitive
That's interesting, I guess my C is a bit rusty.
My Mods: mods.factorio.com
Re: Cyrillic search is case sensitive
This bug hurts me a lot. I end up searching science packs and other items/settings as "cience pack", "ogistics requests", "oboport", etc. since sometimes the main word is not the 1st one (i.e., Russian's name for SE rocket science is "Science pack of rocketry" while other are "... science pack". I hope this would be fixed in 2.0.
Re: Cyrillic search is case sensitive
This is still an issue in version 2.0.15, and with Space Age, it is much more impactful. For example depending on which letter you start with, the Iron or Copper plate recipes will either show up as regular smelting or foundry smelting.
It is incredibly inconvenient, and it should be possible to use a different or custom function to turn letters lowercase that will fix search everywhere for a large portion of users who don't use Latin.- xargo-sama
- Long Handed Inserter
- Posts: 57
- Joined: Mon Jun 05, 2023 1:04 pm
- Contact:
Re: Cyrillic search is case sensitive
I didn't want to comment until I was 100% this was making it in, but I will be sharing some good news tomorrow regarding this.
Re: Cyrillic search is case sensitive
Not the OP, but finally. Thank you very much!
Are other languages like Greek also case insensitive after this change?
Are other languages like Greek also case insensitive after this change?