Cyrillic search is case sensitive

This subforum contains all the issues which we already resolved.
rokot108
Burner Inserter
Burner Inserter
Posts: 5
Joined: Fri Feb 03, 2017 3:23 pm
Contact:

Cyrillic search is case sensitive

Post by rokot108 »

Hello!

I noticed, that with English alphabet searching for an intem in inventory menus is not case-sensitive. But in Russian version all searchings for Russian-named items are case-sensitive, and it makes inconvenience every time, when i need to find an intem in a menu.
Usually, i just drop a first letter of the item's name, cause it may be of any kind.
Rseding91
Factorio Staff
Factorio Staff
Posts: 14798
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: Cyrillic search is case sensitive

Post by Rseding91 »

Thanks for the report however I don't think this will change. We don't have any language-agnostic system for converting a character to lowercase and so non-English characters have this issue.
If you want to get ahold of me I'm almost always on Discord.
User avatar
TruePikachu
Filter Inserter
Filter Inserter
Posts: 978
Joined: Sat Apr 09, 2016 8:39 pm
Contact:

Re: Cyrillic search is case sensitive

Post by TruePikachu »

Is `std::tolower` not sufficient for this when provided with a suitable locale?
Rseding91
Factorio Staff
Factorio Staff
Posts: 14798
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: Cyrillic search is case sensitive

Post by Rseding91 »

TruePikachu wrote: Sun Dec 23, 2018 7:42 am Is `std::tolower` not sufficient for this when provided with a suitable locale?
Nope. std::tolower only supports single-character values - no UTF support.
If you want to get ahold of me I'm almost always on Discord.
User avatar
Optera
Smart Inserter
Smart Inserter
Posts: 2920
Joined: Sat Jun 11, 2016 6:41 am
Contact:

Re: Cyrillic search is case sensitive

Post by Optera »

To get any multibyte support you have to use a multibyte aware container, std::string is not multibyte aware.

A more sane solution would be to use an unicode library for any string that can be unicode, for C++ i guess that'd be ICU http://site.icu-project.org/
User avatar
TruePikachu
Filter Inserter
Filter Inserter
Posts: 978
Joined: Sat Apr 09, 2016 8:39 pm
Contact:

Re: Cyrillic search is case sensitive

Post by TruePikachu »

Modern C++ (and C, for that matter) does have native Unicode support.
`std::basic_string` and its derivatives aren't naturally multibyte variable-length-character aware, but by using e.g. `std::u32string` ≡ `std::basic_string<char32_t>` for string storage you can avoid the problems of variable-length characters, or by using e.g. `std::codecvt<char32_t, char, std::mbstate_t>` as a conduit for converting UTF-8 to and from UCS-4, one can keep the memory savings of using UTF-8 while still being able to resolve UCS-4 codepoints.

EDIT: Just did some testing, Windows doesn't appear to like doing locale-based case conversions in UCS-4, but everything is fine when using `wchar_t` as the intermediate (which is UCS-2 under Windows, and likely sufficient):

Code: Select all

#include <iomanip>
#include <iostream>
#include <locale>
#include <string>
using namespace std;

locale::id codecvt<char32_t,char,mbstate_t>::id;

int main() {
    locale::global(locale("en_US.utf8"));
    // UTF-8 encoded string
    string data = u8"\uff26\uff21\uff23\uff34\uff2f\uff32\uff29\u2699";
    cout << "UTF-8:";
    for(auto c : data) {
        cout << " 0x" << uppercase << hex << setw(2) << setfill('0')
            << static_cast<int>(static_cast<uint8_t>(c));
    }
    cout << endl;
    // Conversion to wide string, not using C++17 depreciated functionality
    auto& facet = use_facet<codecvt<wchar_t,char,mbstate_t>>(locale());
    wstring wide(data.size(),'\0');
    mbstate_t state = {};
    const char* d_next;
    wchar_t* w_next;
    facet.in(state,
            &data[0], &data[data.size()], d_next,
            &wide[0], &wide[wide.size()], w_next);
    wide.resize(w_next - &wide[0]);
    cout << "Wide: ";
    for(auto c : wide)
        cout << " 0x" << uppercase << hex << setw(4) << setfill('0') << c;
    cout << endl;
    cout << "Lower:";
    for(auto c : wide)
        cout << " 0x" << uppercase << hex <<setw(4) << setfill('0')
            << tolower(c,locale());
    return 0;
}

Code: Select all

UTF-8: 0xEF 0xBC 0xA6 0xEF 0xBC 0xA1 0xEF 0xBC 0xA3 0xEF 0xBC 0xB4 0xEF 0xBC 0xAF 0xEF 0xBC 0xB2 0xEF 0xBC 0xA9 0xE2 0x9A 0x99
Wide:  0xFF26 0xFF21 0xFF23 0xFF34 0xFF2F 0xFF32 0xFF29 0x2699
Lower: 0xFF46 0xFF41 0xFF43 0xFF54 0xFF4F 0xFF52 0xFF49 0x2699
("FACTORI⚙" if anyone's curious)
User avatar
Optera
Smart Inserter
Smart Inserter
Posts: 2920
Joined: Sat Jun 11, 2016 6:41 am
Contact:

Re: Cyrillic search is case sensitive

Post by Optera »

That's interesting, I guess my C is a bit rusty.
User avatar
Hares
Filter Inserter
Filter Inserter
Posts: 620
Joined: Sat Oct 22, 2022 8:05 pm
Contact:

Re: Cyrillic search is case sensitive

Post by Hares »

This bug hurts me a lot. I end up searching science packs and other items/settings as "cience pack", "ogistics requests", "oboport", etc. since sometimes the main word is not the 1st one (i.e., Russian's name for SE rocket science is "Science pack of rocketry" while other are "... science pack". I hope this would be fixed in 2.0.
Osmo
Burner Inserter
Burner Inserter
Posts: 7
Joined: Wed Oct 23, 2024 12:08 pm
Contact:

Re: Cyrillic search is case sensitive

Post by Osmo »

This is still an issue in version 2.0.15, and with Space Age, it is much more impactful. For example depending on which letter you start with, the Iron or Copper plate recipes will either show up as regular smelting or foundry smelting.
изображение.png
изображение.png (106.21 KiB) Viewed 566 times
изображение.png
изображение.png (94.52 KiB) Viewed 566 times
It is incredibly inconvenient, and it should be possible to use a different or custom function to turn letters lowercase that will fix search everywhere for a large portion of users who don't use Latin.
User avatar
xargo-sama
Long Handed Inserter
Long Handed Inserter
Posts: 57
Joined: Mon Jun 05, 2023 1:04 pm
Contact:

Re: Cyrillic search is case sensitive

Post by xargo-sama »

I didn't want to comment until I was 100% this was making it in, but I will be sharing some good news tomorrow regarding this.
DeltaKilo
Burner Inserter
Burner Inserter
Posts: 14
Joined: Wed May 02, 2018 9:18 pm
Contact:

Re: Cyrillic search is case sensitive

Post by DeltaKilo »

Not the OP, but finally. Thank you very much!
Are other languages like Greek also case insensitive after this change?
Post Reply

Return to “Resolved Problems and Bugs”