Cyrillic search is case sensitive

We are aware of them, but they have low priority. We have more important things to do. They go here in order not to take space in the main bug thread list.
Post Reply
rokot108
Burner Inserter
Burner Inserter
Posts: 5
Joined: Fri Feb 03, 2017 3:23 pm
Contact:

Cyrillic search is case sensitive

Post by rokot108 »

Hello!

I noticed, that with English alphabet searching for an intem in inventory menus is not case-sensitive. But in Russian version all searchings for Russian-named items are case-sensitive, and it makes inconvenience every time, when i need to find an intem in a menu.
Usually, i just drop a first letter of the item's name, cause it may be of any kind.

Rseding91
Factorio Staff
Factorio Staff
Posts: 13209
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: Cyrillic search is case sensitive

Post by Rseding91 »

Thanks for the report however I don't think this will change. We don't have any language-agnostic system for converting a character to lowercase and so non-English characters have this issue.
If you want to get ahold of me I'm almost always on Discord.

User avatar
TruePikachu
Filter Inserter
Filter Inserter
Posts: 978
Joined: Sat Apr 09, 2016 8:39 pm
Contact:

Re: Cyrillic search is case sensitive

Post by TruePikachu »

Is `std::tolower` not sufficient for this when provided with a suitable locale?

Rseding91
Factorio Staff
Factorio Staff
Posts: 13209
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: Cyrillic search is case sensitive

Post by Rseding91 »

TruePikachu wrote:
Sun Dec 23, 2018 7:42 am
Is `std::tolower` not sufficient for this when provided with a suitable locale?
Nope. std::tolower only supports single-character values - no UTF support.
If you want to get ahold of me I'm almost always on Discord.

User avatar
Optera
Smart Inserter
Smart Inserter
Posts: 2916
Joined: Sat Jun 11, 2016 6:41 am
Contact:

Re: Cyrillic search is case sensitive

Post by Optera »

To get any multibyte support you have to use a multibyte aware container, std::string is not multibyte aware.

A more sane solution would be to use an unicode library for any string that can be unicode, for C++ i guess that'd be ICU http://site.icu-project.org/

User avatar
TruePikachu
Filter Inserter
Filter Inserter
Posts: 978
Joined: Sat Apr 09, 2016 8:39 pm
Contact:

Re: Cyrillic search is case sensitive

Post by TruePikachu »

Modern C++ (and C, for that matter) does have native Unicode support.
`std::basic_string` and its derivatives aren't naturally multibyte variable-length-character aware, but by using e.g. `std::u32string` ≡ `std::basic_string<char32_t>` for string storage you can avoid the problems of variable-length characters, or by using e.g. `std::codecvt<char32_t, char, std::mbstate_t>` as a conduit for converting UTF-8 to and from UCS-4, one can keep the memory savings of using UTF-8 while still being able to resolve UCS-4 codepoints.

EDIT: Just did some testing, Windows doesn't appear to like doing locale-based case conversions in UCS-4, but everything is fine when using `wchar_t` as the intermediate (which is UCS-2 under Windows, and likely sufficient):

Code: Select all

#include <iomanip>
#include <iostream>
#include <locale>
#include <string>
using namespace std;

locale::id codecvt<char32_t,char,mbstate_t>::id;

int main() {
    locale::global(locale("en_US.utf8"));
    // UTF-8 encoded string
    string data = u8"\uff26\uff21\uff23\uff34\uff2f\uff32\uff29\u2699";
    cout << "UTF-8:";
    for(auto c : data) {
        cout << " 0x" << uppercase << hex << setw(2) << setfill('0')
            << static_cast<int>(static_cast<uint8_t>(c));
    }
    cout << endl;
    // Conversion to wide string, not using C++17 depreciated functionality
    auto& facet = use_facet<codecvt<wchar_t,char,mbstate_t>>(locale());
    wstring wide(data.size(),'\0');
    mbstate_t state = {};
    const char* d_next;
    wchar_t* w_next;
    facet.in(state,
            &data[0], &data[data.size()], d_next,
            &wide[0], &wide[wide.size()], w_next);
    wide.resize(w_next - &wide[0]);
    cout << "Wide: ";
    for(auto c : wide)
        cout << " 0x" << uppercase << hex << setw(4) << setfill('0') << c;
    cout << endl;
    cout << "Lower:";
    for(auto c : wide)
        cout << " 0x" << uppercase << hex <<setw(4) << setfill('0')
            << tolower(c,locale());
    return 0;
}

Code: Select all

UTF-8: 0xEF 0xBC 0xA6 0xEF 0xBC 0xA1 0xEF 0xBC 0xA3 0xEF 0xBC 0xB4 0xEF 0xBC 0xAF 0xEF 0xBC 0xB2 0xEF 0xBC 0xA9 0xE2 0x9A 0x99
Wide:  0xFF26 0xFF21 0xFF23 0xFF34 0xFF2F 0xFF32 0xFF29 0x2699
Lower: 0xFF46 0xFF41 0xFF43 0xFF54 0xFF4F 0xFF52 0xFF49 0x2699
("FACTORI⚙" if anyone's curious)

User avatar
Optera
Smart Inserter
Smart Inserter
Posts: 2916
Joined: Sat Jun 11, 2016 6:41 am
Contact:

Re: Cyrillic search is case sensitive

Post by Optera »

That's interesting, I guess my C is a bit rusty.

Post Reply

Return to “Minor issues”