Solved: Regular expression help

Place to get help with not working mods / modding interface.
Pi-C
Smart Inserter
Smart Inserter
Posts: 1742
Joined: Sun Oct 14, 2018 8:13 am
Contact:

Solved: Regular expression help

Post by Pi-C »

I need some help with regular expressions. I want to get the base name of an entity that "AAI Programmable Vehicles" has appended a suffix to. These suffixes can differ, like this:

Code: Select all

item-_-ghost
item-0-_-driver
I want to get "item" in both cases. This works, but I have to use a different regex for each name pattern:

Code: Select all

item_name = string.match("item-_-ghost", "^(.+)%-_%-.+$")
log(tostring(item_name))

item_name = string.match("item-0-_-driver", "^(.+)%-[0-9]+%-_%-.+$")
log(tostring(item_name))
I tried to cover both cases with one regex, but this returns just nil:

Code: Select all

item_name = string.match("item-_-ghost", "^(.+)(%-[0-9]+)*%-_%-.+$")
log(tostring(item_name))

item_name = string.match("item-0-_-driver", "^(.+)(%-[0-9]+)*%-_%-.+$")
log(tostring(item_name))
How could I get this to work?
Last edited by Pi-C on Fri May 01, 2020 7:37 pm, edited 1 time in total.
A good mod deserves a good changelog. Here's a tutorial (WIP) about Factorio's way too strict changelog syntax!
posila
Factorio Staff
Factorio Staff
Posts: 5409
Joined: Thu Jun 11, 2015 1:35 pm
Contact:

Re: Regular expression help

Post by posila »

Split from gameplay questions thread
asdff45
Long Handed Inserter
Long Handed Inserter
Posts: 65
Joined: Sun Aug 07, 2016 1:23 pm
Contact:

Re: Regular expression help

Post by asdff45 »

If you want to ignore the rest after the `-`, then `^[^-]+` is a way to do it.

Code: Select all

item_name = string.match("item-_-ghost", "^[^-]+")
log(tostring(item_name))

item_name = string.match("item-0-_-driver", "^[^-]+")
log(tostring(item_name))
This will print out `item` in both cases.

To clarify how it works:
`^` to specify that this has to be at the beginning of the string.
`[^-]` matches everything, that is not an `-` character
`+` The previous token, has to be there one or more times.

When it finds a `-` the only token we have specified is not given anymore, so it will move onwords. Since the regex is finished at that point, it just returns `item`.

Here is a really helpful site, to develop a regex: https://regexr.com/
Pi-C
Smart Inserter
Smart Inserter
Posts: 1742
Joined: Sun Oct 14, 2018 8:13 am
Contact:

Re: Regular expression help

Post by Pi-C »

asdff45 wrote: Fri Apr 24, 2020 10:16 pm If you want to ignore the rest after the `-`, then `^[^-]+` is a way to do it.

Code: Select all

item_name = string.match("item-_-ghost", "^[^-]+")
log(tostring(item_name))

item_name = string.match("item-0-_-driver", "^[^-]+")
log(tostring(item_name))
This will print out `item` in both cases.
Sorry, that wouldn't work! However, it's my fault because I didn't make it clear that "item" was meant to represent a string that could include '-'. It was an over-simplification on my part. Anyway, I've tested my mod with different vehicle mods; here are some of the item names I want to get:

Code: Select all

better-cargo-plane
crawler
dirigible-blimp
Dodge-Challenger
Hauling-Truck
hcraft-entity
I've even found a case that is worse because the name actually contains a number:

Code: Select all

raven-1
When it finds a `-` the only token we have specified is not given anymore, so it will move onwords. Since the regex is finished at that point, it just returns `item`.
Sorry, but for the names from the list above, that would only match "crawler" (part of the item name I want to get would be cut off for the other items). AAI always appends "-_-STRING" to the names, and in some cases there even is "-[0-9]+" before that, that's why I included that in my regex.

I already have a fall-back, so if the regex returns an item name that doesn't match any item it's not really a problem. But of course I want the regex to be general enough that it finds a correct result in most cases. :-D

An explanation what I wanted to do:

Code: Select all

"^(.+)(%-[0-9]+)*%-_%-.+$"

-- Use everything from the start of the string
^(.+)			

-- Look for one or more numbers preceded by '-'.  It's not guaranteed to exist (indicated by '*')
(%-[0-9]+)*		

-- Look for AAI's separator
%-_%-

-- Match everything from that position to the end of the string because AAI uses different suffixes
.+$

Here is a really helpful site, to develop a regex: https://regexr.com/
Thanks, I'll check that out!
A good mod deserves a good changelog. Here's a tutorial (WIP) about Factorio's way too strict changelog syntax!
Tynach
Inserter
Inserter
Posts: 31
Joined: Sun Aug 21, 2016 9:22 pm
Contact:

Re: Regular expression help

Post by Tynach »

Your examples include the names you want to pull, but not the sorts of things that those names might be embedded in. Will the parts you want to cut off at the end always include either an underscore or a number, and always be preceded by a dash?
Pi-C
Smart Inserter
Smart Inserter
Posts: 1742
Joined: Sun Oct 14, 2018 8:13 am
Contact:

Re: Regular expression help

Post by Pi-C »

Tynach wrote: Sun Apr 26, 2020 12:12 am Your examples include the names you want to pull, but not the sorts of things that those names might be embedded in. Will the parts you want to cut off at the end always include either an underscore or a number, and always be preceded by a dash?
These are actual names I've seen in the logs:

Code: Select all

hover-car-0-_-driver
Dodge-Challenger-0-_-driver
Hauling-Truck-0-_-drive
hcraft-entity-0-_-driver
car-vehicle-machine-gun-_-driver
tank-tank-cannon-_-driver
So all of them will contain "-_-STRING", and some of them may contain "-NUMBER". Looking at AAI's contol file, STRING may be any of these:

Code: Select all

"solid"
"ghost"
"navigator"
"driver"
"buffer"
So far, "0" is the only value for NUMBER I've seen yet -- but it may be that other numbers could also occur. That's why I've used "[0-9]+" in the regex.
A good mod deserves a good changelog. Here's a tutorial (WIP) about Factorio's way too strict changelog syntax!
asdff45
Long Handed Inserter
Long Handed Inserter
Posts: 65
Joined: Sun Aug 07, 2016 1:23 pm
Contact:

Re: Regular expression help

Post by asdff45 »

So, `.+?(?=-_)` would match everything until `-_`, so it will return

Code: Select all

hover-car-0-_-driver			=> hover-car-0
Dodge-Challenger-0-_-driver		=> Dodge-Challenger-0
Hauling-Truck-0-_-drive			=> Hauling-Truck-0
hcraft-entity-0-_-driver		=> hcraft-entity-0
car-vehicle-machine-gun-_-driver	=> car-vehicle-machine-gun
tank-tank-cannon-_-driver		=> tank-tank-cannon
And `^.+?(?=-[0-9]?[-_])` would match everything until `-_` or `-0`, so it will return

Code: Select all

hover-car-0-_-driver			=> hover-car
Dodge-Challenger-0-_-driver		=> Dodge-Challenger
Hauling-Truck-0-_-drive			=> Hauling-Truck
hcraft-entity-0-_-driver		=> hcraft-entity
car-vehicle-machine-gun-_-driver	=> car-vehicle-machine-gun
tank-tank-cannon-_-driver		=> tank-tank-cannon
User avatar
Oktokolo
Filter Inserter
Filter Inserter
Posts: 884
Joined: Wed Jul 12, 2017 5:45 pm
Contact:

Re: Regular expression help

Post by Oktokolo »

Pi-C wrote: Sun Apr 26, 2020 12:43 am These are actual names I've seen in the logs:

Code: Select all

hover-car-0-_-driver
Dodge-Challenger-0-_-driver
Hauling-Truck-0-_-drive
hcraft-entity-0-_-driver
car-vehicle-machine-gun-_-driver
tank-tank-cannon-_-driver
So all of them will contain "-_-STRING", and some of them may contain "-NUMBER". Looking at AAI's contol file, STRING may be any of these:

Code: Select all

solid|ghost|navigator|driver|buffer"
So far, "0" is the only value for NUMBER I've seen yet -- but it may be that other numbers could also occur. That's why I've used "[0-9]+" in the regex.
The following should match the desired item names whether they contain an unsigned int or not:
prefix, n, type = string.match(name, "^(.-)(%-[0-9]+)?%-_%-([a-z]+)$")
You should check type against the list of allowed type strings. LUA decided to do yet another homebrew regex dialect more limited than already existing dialects and therefore does not support (solid|ghost|navigator|driver|buffer).
If at least one int is preceding the dash-underscore-dash separator, the last such int is captured as n.
You should check for item names consisting of only prefix or prefix .. n.

I did not actually test this, so it might match shit, fail with an error, or create a singularity destroying the known universe. :twisted:
asdff45
Long Handed Inserter
Long Handed Inserter
Posts: 65
Joined: Sun Aug 07, 2016 1:23 pm
Contact:

Re: Regular expression help

Post by asdff45 »

I just noticed, that LUA has no full RegEx implementation and both mine and Oktokolo's are not working.

So i created one within the LUA playground: `^(.+)%-_%-.+`, this returns the same, as the first one in my last post:

Code: Select all

prefix = string.match(name, "^(.+)%-_%-.+")

hover-car-0-_-driver			=> hover-car-0
Dodge-Challenger-0-_-driver		=> Dodge-Challenger-0
Hauling-Truck-0-_-drive			=> Hauling-Truck-0
hcraft-entity-0-_-driver		=> hcraft-entity-0
car-vehicle-machine-gun-_-driver	=> car-vehicle-machine-gun
tank-tank-cannon-_-driver		=> tank-tank-cannon
Getting the number away with only one call, seems not possible.
As soon as im adding the number into that pattern, it will break or nothing changes :(
Pi-C
Smart Inserter
Smart Inserter
Posts: 1742
Joined: Sun Oct 14, 2018 8:13 am
Contact:

Re: Regular expression help

Post by Pi-C »

Oktokolo wrote: Sun Apr 26, 2020 8:54 pm The following should match the desired item names whether they contain an unsigned int or not:
prefix, n, type = string.match(name, "^(.-)(%-[0-9]+)?%-_%-([a-z]+)$")
Typo? Guess it should be "^(.+)" instead of "^(.-)". :-)
You should check type against the list of allowed type strings. LUA decided to do yet another homebrew regex dialect more limited than already existing dialects and therefore does not support (solid|ghost|navigator|driver|buffer).
I don't think that such a hard-coded list is a good idea! It's well possible that I missed some suffixes, it also could be that Earendel decides to add new ones or change existing ones in the future. So I thought the more general "%-(.+)$" would work better.
I did not actually test this, so it might match shit, fail with an error, or create a singularity destroying the known universe. :twisted:
I got that to work in my editor (also with sed, after removing the percent signs and adding escape characters for bash). However, it doesn't work in the game. As asdff45 pointed out, that's because of Lua's limited RegEx implementation, so that really isn't your fault. :-)

Anyway, thanks for your help!
A good mod deserves a good changelog. Here's a tutorial (WIP) about Factorio's way too strict changelog syntax!
Pi-C
Smart Inserter
Smart Inserter
Posts: 1742
Joined: Sun Oct 14, 2018 8:13 am
Contact:

Re: Regular expression help

Post by Pi-C »

asdff45 wrote: Sun Apr 26, 2020 11:12 pm I just noticed, that LUA has no full RegEx implementation and both mine and Oktokolo's are not working.
Yep, I've got the same result. :-(
So i created one within the LUA playground: `^(.+)%-_%-.+`, this returns the same, as the first one in my last post:
-- snip --
Getting the number away with only one call, seems not possible.
As soon as im adding the number into that pattern, it will break or nothing changes :(
That's exactly what I've experienced! I was sure it must work, but still got the wrong results. In the end, I've worked around it with this definition:

Code: Select all

item_name = string.match(state.car.name, "(.+)%-[0-9]+%-_%-.+$") or
                         string.match(state.car.name, "^(.+)%-_%-.+$")

For the first match, I used "(.+)%-[0-9]+" so numbers that actually belong to the entity name (like "raven-1" mentioned above) would be preserved. Needless to say, a one-catches-everything expression would have been so much nicer, and it bothered me quite a lot that I couldn't get it to work. (I always thought I knew a bit or two about regexes -- of course, getting the syntax right in different environments like sed, grep, awk etc., possibly with having to escape special characters in a shell, still is something else!)

So, thanks a lot for taking the time to test things -- and for confirming that the problem is indeed caused by the limitations of Lua! :-D
A good mod deserves a good changelog. Here's a tutorial (WIP) about Factorio's way too strict changelog syntax!
User avatar
Oktokolo
Filter Inserter
Filter Inserter
Posts: 884
Joined: Wed Jul 12, 2017 5:45 pm
Contact:

Re: Regular expression help

Post by Oktokolo »

Pi-C wrote: Mon Apr 27, 2020 12:11 pm
Oktokolo wrote: Sun Apr 26, 2020 8:54 pm The following should match the desired item names whether they contain an unsigned int or not:
prefix, n, type = string.match(name, "^(.-)(%-[0-9]+)?%-_%-([a-z]+)$")
Typo? Guess it should be "^(.+)" instead of "^(.-)". :-)
No, that should be the non-greedy quantifier (manual confirms support for this). If the first group is greedy, there never is anything left for the optional second group to match.
But LUA doesn't support quantifiers at closing paranthesis, so i have to get the question mark inside the second group:
prefix, n, type = string.match(name, "^(.-)(%-?[0-9]*)%-_%-([a-z]+)$")
This pattern is slightly less strict but still matches the number separately.
LUA Playground
Pi-C
Smart Inserter
Smart Inserter
Posts: 1742
Joined: Sun Oct 14, 2018 8:13 am
Contact:

Re: Regular expression help

Post by Pi-C »

Sorry, I must have missed this post!
Oktokolo wrote: Mon Apr 27, 2020 2:40 pm
Pi-C wrote: Mon Apr 27, 2020 12:11 pm
Oktokolo wrote: Sun Apr 26, 2020 8:54 pm The following should match the desired item names whether they contain an unsigned int or not:
prefix, n, type = string.match(name, "^(.-)(%-[0-9]+)?%-_%-([a-z]+)$")
Typo? Guess it should be "^(.+)" instead of "^(.-)". :-)
No, that should be the non-greedy quantifier (manual confirms support for this). If the first group is greedy, there never is anything left for the optional second group to match.
Ah, yes. That's useful! :-)
But LUA doesn't support quantifiers at closing paranthesis, so i have to get the question mark inside the second group:
prefix, n, type = string.match(name, "^(.-)(%-?[0-9]*)%-_%-([a-z]+)$")
This pattern is slightly less strict but still matches the number separately.
Great, this is just what I needed! Thanks a lot for your help. :-)
A good mod deserves a good changelog. Here's a tutorial (WIP) about Factorio's way too strict changelog syntax!
Post Reply

Return to “Modding help”