Five-in-a-row game, with deep-learning AI

This board is to show, discuss and archive useful combinator- and logic-creations.
Smart triggering, counters and sensors, useful circuitry, switching as an art :), computers.
Please provide if possible always a blueprint of your creation.
Post Reply
deep_remi
Burner Inserter
Burner Inserter
Posts: 13
Joined: Wed Mar 23, 2022 9:17 pm
Contact:

Five-in-a-row game, with deep-learning AI

Post by deep_remi »

Hi,

I made a game of five-in-a-row (gomoku narabe) with combinators. It includes a small convolutional neural network for the AI Opponent.

Youtube demo: https://www.youtube.com/watch?v=-Ng8E_s6Xgo

Image
Image

blueprint: https://www.kayufu.com/factorio/gomoku_blueprint.txt
savegame: https://www.kayufu.com/factorio/gomoku.zip

This blueprint is 5Mb in size, and pushes the limits of Factorio a little bit. If you wish to paste it into the editor, then make sure to be in zoomed-out map mode. Otherwise Factorio will become unusable. When you paste it, Factorio will freeze for about one minute, but should then continue to run smoothly. Also, you'll have to explore a rather large area beforehand, to accommodate its large size.

The AI is strong enough to be fun and challenging for a beginner like me. It could be made much stronger with a larger network, but it would hurt FPS. On my machine, this blueprint runs a bit more than 50 FPS, but cannot reach 60 FPS. Optimization tips are welcome.

I thought I could also create a similar system for games such as Othello or Go. I chose gomoku, because the rules of the game are easy to wire into combinators. If anybody can build the rules of Othello or Go with combinators, then I'd be glad to provide a small convnet for AI.

Questions and feedback are welcome.

Rémi
Last edited by deep_remi on Thu Mar 24, 2022 9:25 pm, edited 1 time in total.

Qon
Smart Inserter
Smart Inserter
Posts: 2091
Joined: Thu Mar 17, 2016 6:27 am
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by Qon »

Welcome to the forums. As this is your first post, I'd say this is decent enough for a beginner. You get passing marks, but you really should beat AlphaGo at Go with your next project or I would consider you to have grown complacent.
Ok fine, it's slightly above expectations and I'm mildly impressed.

deep_remi
Burner Inserter
Burner Inserter
Posts: 13
Joined: Wed Mar 23, 2022 9:17 pm
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by deep_remi »

Qon wrote:
Thu Mar 24, 2022 8:29 pm
Welcome to the forums. As this is your first post, I'd say this is decent enough for a beginner. You get passing marks, but you really should beat AlphaGo at Go with your next project or I would consider you to have grown complacent.
Haha, I am working on it.

aka13
Filter Inserter
Filter Inserter
Posts: 671
Joined: Sun Sep 29, 2013 1:18 pm
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by aka13 »

This is very cool!
Pony/Furfag avatar? Opinion discarded.

mmmPI
Smart Inserter
Smart Inserter
Posts: 2674
Joined: Mon Jun 20, 2016 6:10 pm
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by mmmPI »

I was expecting more discussion about this !

I wouldn't mind reading the full story and design philosophy of why you made it this way. Or at least some more vulgarisation to bridge the gap of understanding between demonstration and explanation.

I'm especially interested about the way the AI understand the value of the position, on a theorical level i expected such machine to require training. I have intuitition that the rules of which move is allowed and which is forbidden is hardcoded. Implictly explained when you say you would do the convnet if anybody can build the rules of Othello or Go with combinators.

I'm curious how should such rules are to be written using combinators. Maybe it would require a quick explanation on what is the AI seeing ? such as the tile of the board must have coordinate, the AI knows the displayed position and uses the neural network to choose the best "next move" that respect the rules ?

Rules are then only describing what is allowed for different players ? Do they contain implicit objectives ? like what is "win" or is it part of the neural network ?

If you take the example of tic tac toe on a 3x3 grid the rules would be something like : "all coordinate are valid except those already used" ? so the first move for the AI would use the neural network to sort out the 8 different tile that can be played and choose one amongst them ? if the human player goes first.

The rules would be materialized as the bunch of combinators that provide the valid coordinate ?

Amarula
Filter Inserter
Filter Inserter
Posts: 509
Joined: Fri Apr 27, 2018 1:29 pm
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by Amarula »

Very Cool!
Have you thought about writing up an article in a more depth for Alt-F4? I would love to read that...
My own personal Factorio super-power - running out of power.

deep_remi
Burner Inserter
Burner Inserter
Posts: 13
Joined: Wed Mar 23, 2022 9:17 pm
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by deep_remi »

mmmPI wrote:
Sat Apr 23, 2022 2:01 pm
I was expecting more discussion about this !

I wouldn't mind reading the full story and design philosophy of why you made it this way. Or at least some more vulgarisation to bridge the gap of understanding between demonstration and explanation.
Thanks for your interest. I am glad to give more details.
I'm especially interested about the way the AI understand the value of the position, on a theorical level i expected such machine to require training. I have intuitition that the rules of which move is allowed and which is forbidden is hardcoded. Implictly explained when you say you would do the convnet if anybody can build the rules of Othello or Go with combinators.
This neural is trained with an algorithm similar to DeepMind's AlphaZero:
https://www.deepmind.com/research/highl ... ch/alphago
That website is mainly PR, but if you wish to understand how it works, you can take a look at the academic papers at the bottom of the page.

In summary, the neural network learns to play from scratch, by playing a lot of games against itself. Learning takes place offline (not within Factorio). I wrote a piece of code that converts the resulting neural network file into a Factorio blueprint automatically.
I'm curious how should such rules are to be written using combinators. Maybe it would require a quick explanation on what is the AI seeing ? such as the tile of the board must have coordinate, the AI knows the displayed position and uses the neural network to choose the best "next move" that respect the rules ?

Rules are then only describing what is allowed for different players ? Do they contain implicit objectives ? like what is "win" or is it part of the neural network ?
The neural network takes the board position as input. This is what is visualized at the end of the Youtube video. There are three input channels: one for opponent pieces, one for my pieces, and one for empty squares. The output of the neural network indicates where to play the next move.

The game logic is wired by hand. It handles preventing play in an already occupied square, alternating side to move, choosing AI's side, and detecting the end of the game.

In order to make this work for another game, such as Othello, it would be necessary to wire the logic to flip the disks, detect legal moves, and play a pass when no other play is available. The neural network would be identical: it takes the 3 channels as input (opponent, myself, and empty), and produces an evaluation of all the moves.

Detecting the win and scoring the end of the game is not strictly necessary for the game to work. But it makes using the game more pleasant for the user.
If you take the example of tic tac toe on a 3x3 grid the rules would be something like : "all coordinate are valid except those already used" ? so the first move for the AI would use the neural network to sort out the 8 different tile that can be played and choose one amongst them ? if the human player goes first.

The rules would be materialized as the bunch of combinators that provide the valid coordinate ?
Yes. Tic tac toe is pretty much identical to gomoku narabe. The only rule is to not play in an already occupied square.

For other games, legal moves are more tricky to determine, and the board has to be updated in a more complex way after each move (flipping disks for Othello, or capturing stones in Go).

Programming the rules of Go with combinators looks like a very fun and difficult challenge.

deep_remi
Burner Inserter
Burner Inserter
Posts: 13
Joined: Wed Mar 23, 2022 9:17 pm
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by deep_remi »

Amarula wrote:
Sat Apr 23, 2022 2:12 pm
Very Cool!
Have you thought about writing up an article in a more depth for Alt-F4? I would love to read that...
Thanks for the suggestion. I did not know about Alt-F4. I might write something when I have time. But I am too busy right now. I thought about making a video with more detailed explanations of each part of the system. Interactively visualizing combinators in action might give a better understanding of how things work.

quyxkh
Smart Inserter
Smart Inserter
Posts: 1027
Joined: Sun May 08, 2016 9:01 am
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by quyxkh »

deep_remi wrote:
Sat Apr 23, 2022 5:45 pm
Programming the rules of Go with combinators looks like a very fun and difficult challenge.
Heh. I'd guess you could keep the game as a memory buffer of zobrist hashes for ko detection, do superko and forbid repeats ever? One array for the move history, a second array carrying four liberty-for-group signals per intersection, plus the two hash values for the moves, a third array of group registers, ... the board update mechanics don't seem unmanageable.

deep_remi
Burner Inserter
Burner Inserter
Posts: 13
Joined: Wed Mar 23, 2022 9:17 pm
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by deep_remi »

quyxkh wrote:
Sat Apr 23, 2022 6:23 pm
deep_remi wrote:
Sat Apr 23, 2022 5:45 pm
Programming the rules of Go with combinators looks like a very fun and difficult challenge.
Heh. I'd guess you could keep the game as a memory buffer of zobrist hashes for ko detection, do superko and forbid repeats ever? One array for the move history, a second array carrying four liberty-for-group signals per intersection, plus the two hash values for the moves, a third array of group registers, ... the board update mechanics don't seem unmanageable.
Well, if you manage to make a Go board that works, then I'll be happy to provide a small neural network for AI.

My system uses one different signal for each square of the board. I could not find more than 260 or so signals. So max square size is 15x15 or 16x16 (but with almost no signal left for auxilliary variables). If there are ways to get 361 signals, then 19x19 is possible.

Qon
Smart Inserter
Smart Inserter
Posts: 2091
Joined: Thu Mar 17, 2016 6:27 am
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by Qon »

deep_remi wrote:
Sat Apr 23, 2022 5:45 pm
mmmPI wrote:
Sat Apr 23, 2022 2:01 pm
I was expecting more discussion about this !

I wouldn't mind reading the full story and design philosophy of why you made it this way. Or at least some more vulgarisation to bridge the gap of understanding between demonstration and explanation.
Thanks for your interest. I am glad to give more details.
Thanks for writing more about this! Feel free to write in more detail about any detail :D
Is the entire circuit network + game generated as a single blueprint from your code? Did you use any libraries or tools to generate the blueprint? What language did you program in? How much time did all the parts take to make, at least a rough estimation? And more...
deep_remi wrote:
Sat Apr 23, 2022 7:03 pm
If there are ways to get 361 signals, then 19x19 is possible.
If you install a mod that adds signals or items, yes. You don't have to use the extra items of course, but the extra items become extra signals. There are mods that focus mainly on adding lots of virtual signals as well so you don't have to install that many, even 1 might be enough.

The mod https://mods.factorio.com/mod/speaker-signals-2 adds 220 new signals (the image shows 210 but 10 more were probably added after picture was made), all in similar style. It's 22 different symbols with 10 color and style variations each. If you need to write the names of each signal into your code then this should have an easy pattern so you can generate all of them from a simple loop. Though a lua command could be used to export all the signal names anyways...

The mod https://mods.factorio.com/mod/SchallVirtualSignal which adds 90 new signals signal for each number 10-99, 24 greek letters, 6 colors, 30 mathematical symbols, 30 arrows, 16 signs for communication (like warnings) and 152 more signals which looks similar to the previously listed but are larger. So I count to a total of 248 additional signals from that mod. The XL icons are probably large in alt-view as well, which might be distracting if you use that for debugging.

There's also a modpack for the big ones that add virtual signals https://mods.factorio.com/mod/snouz_all ... /downloads
Install it through in game mod portal to get all the "non optional" ones automatically and click the arrow in game on the optional ones to get them also easily if you think you can use them all 8-)

https://mods.factorio.com/query/signal?version=1.1

deep_remi
Burner Inserter
Burner Inserter
Posts: 13
Joined: Wed Mar 23, 2022 9:17 pm
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by deep_remi »

Qon wrote:
Sun Apr 24, 2022 7:32 am
Thanks for writing more about this! Feel free to write in more detail about any detail :D
Is the entire circuit network + game generated as a single blueprint from your code? Did you use any libraries or tools to generate the blueprint? What language did you program in? How much time did all the parts take to make, at least a rough estimation? And more...
The code generates separate blueprints for different parts of the system:
  • the board,
  • the convolutional neural network,
  • the debug boards (small boards located to the right of the network),
  • the argmax function to select the best move,
  • and the table of constants used to detect 5-in-a-row lines.
The rest of the game logic, and the glue between these parts was made by hand.

I code in C++. I used this library for json: https://github.com/nlohmann/json

It took me about one week of work to develop this factorio code. But I had thought about it for a long time before, and I started with clear ideas of how to make everything work.

This does not include the time to develop the neural-network training code. I develop board game AI as a job (https://www.kayufu.com/), and have been developing my system for years.

Thanks for the mod suggestion.

Qon
Smart Inserter
Smart Inserter
Posts: 2091
Joined: Thu Mar 17, 2016 6:27 am
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by Qon »

Thanks for the info!

Have you considered chess as well? A lot more rules to take into account, but it's the game I'm more familiar with. Also the different pieces are perhaps much more difficult to visualize so that might be the bigger issue. But Recursive Blueprints mod could be used to display detailed graphics with tiles. With 2-4 types of tiles you could just paste over and only the differing tiles would have to be replaced by bots, and not many tiles change each move so might actually be quite quick for bots to update the board if you also research (with lua command) bot speed 50 or something.

deep_remi
Burner Inserter
Burner Inserter
Posts: 13
Joined: Wed Mar 23, 2022 9:17 pm
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by deep_remi »

Qon wrote:
Sun Apr 24, 2022 12:48 pm
Thanks for the info!

Have you considered chess as well? A lot more rules to take into account, but it's the game I'm more familiar with. Also the different pieces are perhaps much more difficult to visualize so that might be the bigger issue. But Recursive Blueprints mod could be used to display detailed graphics with tiles. With 2-4 types of tiles you could just paste over and only the differing tiles would have to be replaced by bots, and not many tiles change each move so might actually be quite quick for bots to update the board if you also research (with lua command) bot speed 50 or something.
A chess user interface in Factorio is doable, but is going to be really very difficult to make. Even more difficult than Go. Graphics is not the most difficult part. The most difficult part is the complexity of the rules. A move is not just pointing at a square, but moving from one square to another. And you also have to handle special moves such as castling, promotions, etc.

Again, if anobody manages to implement the rules of chess in Factorio, then I can provide a small convolutional neural network for AI.

mmmPI
Smart Inserter
Smart Inserter
Posts: 2674
Joined: Mon Jun 20, 2016 6:10 pm
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by mmmPI »

deep_remi wrote:
Sat Apr 23, 2022 5:45 pm
This neural is trained with an algorithm similar to DeepMind's AlphaZero:
In summary, the neural network learns to play from scratch, by playing a lot of games against itself. Learning takes place offline (not within Factorio). I wrote a piece of code that converts the resulting neural network file into a Factorio blueprint automatically.
i'm glad i asked the question before trying to understand what kind of math are doing the bottom left 150 000 combinators :)

I had read some academic papers out of curiosity but not enough to associate each terminology to a mathematical representation, i have read about different methods to implement bayesian inference network, i'm more familiar with the general information concept rather than the specific equation and training which gives the name to the algorithm, the mathematical formalisation on academic paper is cryptic to me. Most of the time i'm not sure i can tell the difference between marketing terms or different function only by the name :(

I had better luck with videos like this https://www.youtube.com/watch?v=aircAruvnKk


I've spent a bit of time to try and understand how it works and made this picture :

gomoku.png
gomoku.png (30.11 KiB) Viewed 6094 times
It roughly correspond to the different parts you mentionned :
deep_remi wrote:
Sun Apr 24, 2022 11:44 am
The code generates separate blueprints for different parts of the system:
  • the board,
  • the convolutional neural network,
  • the debug boards (small boards located to the right of the network),
  • the argmax function to select the best move,
  • and the table of constants used to detect 5-in-a-row lines.
The rest of the game logic, and the glue between these parts was made by hand.
deep_remi wrote:
Sat Apr 23, 2022 5:45 pm
It handles preventing play in an already occupied square, alternating side to move, choosing AI's side, and detecting the end of the game.
There are some things hidden under my paint edit, but i think i got them name placed correctly on the map. However the following text is my own understanding so probably contain many mistakes or approximations :)


The weird colored arrow represent the flow of information during an update loop (as i see in my mind). When a player makes a move by changing a chest quantity in the display board, the signal that carry this information will go through an horizontal bus in the convnet. I'm not quite sure what is called a neuron, what is called a layer, but that's what i tried to show with "parameter1" "parameter2" and "filter". Those 3 element are repeated 68 times. in a grid of 8x9 with one row not full.

I assume one of the parameter is to give a name/identify the adress of the associated filter. and the other parameter are value injected into the filter alongside the data from the board. It seem to me that the "filter" part are identical, it's difficult to understand what is happening inside one of the "filter" due to them being 2000 combinators, following the wiring is very difficult, i will study one of them rather than looking at the whole system next time :)

To describe the 8x9 grid, i used the term "layer" because it seem to me that the 2nd row is fed with results from the first row which is the only one that receive direct input from the board. The 3rd row receiving values from the 2nd, the 4rth from the 3rd and so on. which creates the vertical axis for transmission of informations inside the network.

Would you say there is 68 neurons ? in 9 layers ?

Then to follow the update loop, it seem to me that those value after the filter are all gathered along side another vertical wire to bring the results to the "move filter". At this point there is also some interaction with the red box and some value stored on constant combinator are injected alongside the previous data and the ouput is only 1 move. ( the signal corresponding to one of the tile of the board ). argmax function to select the best move ?

At all time the update loop is connected to the "win check" and transmit the state of the board. The "win check" seem to me like a dictionnary of all possible position for a 5-in-a-row. If ever after a move is done, the board contain one of the winning combo, it's game over.

At this point : what is the rule for making rules ?

It seem like for another game the "win check" would need to be adapted, that's not necessary but makes the game more pleasant for user you said. Let's take the example of chess, that would mean having a bunch of wired combinators able to detect when there is a check mate, so probably programmed around the possibility for the King to move ? For a simpler game like checker that would mean detecting when one player has no more pieces ? for Go it would be counting stones at the end ? That's to me like 1 part of the "rules".

At this point ( with the chess example) it would be required to program the different possible move for each pieces to evaluate the king's possible move. And also to prevent the human player from cheating because i would imagine in this case the convnet would output 2 information, (1) which piece to move (2) where to move, and the AI would already know because of its training which moves are allowed and would never propose forbidden move. That could simplify writing another part of the "rules" ? assuming human won't cheat at first might make it simpler to build and potentially adding a anti-cheat module later only for convenience ?

There's still this part of handling the special interaction after one move, such as flipping stones, eating a pawn, promotions. At this point it seem to me that this is a module that would interact mostly with the display/board that has 2 main focus => 1)feed the convnet with the information necessary for deciding next move. 2) display it to the player. It feels a little outside of the "rules" but more an adapation of the red box in my mind, the "update loop"/ player turn.

more generic: and about the convnet

I was interested in how the ai would evaluate the position, but now i think the wording is not appropiate to describe how it function. We don't know how the AI evaluate the position or why one is considered a good move. Ai see the board and tell next square because it has trained and "know" it is a good move = a move that leads to "win" with highest probability. The convnet i see as filled memory system, where the parameters are carefully self-adjusted/weighted over many games after deafeat and victories against itself. The neuron itself given enough time i could understand i think the wiring but some quick explanation would help a lot. The magic is in the architecture of the convnet now :)

about the game in the game

I have played several games, sometimes i mess up when placing things on chests and i play twice in a row, and then the AI change side and beats me with my own color. One need to be careful to only place things 1 at a time, and not forgot the game is paused sometimes !

Other times the AI just beat me. It happened once that the AI didn't choose the winning move right away but lost 1 turn to block my attempt before winning i was confused. Also i'm not sure how to properly start a game, the AI always makes the 3rd move, this happenened on the save game, i didn't try the blueprint, maybe i did something wrong, i didn't put their pawn in the corner but i could have just so i could get a win for once :twisted:

deep_remi
Burner Inserter
Burner Inserter
Posts: 13
Joined: Wed Mar 23, 2022 9:17 pm
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by deep_remi »

mmmPI wrote:
Tue Apr 26, 2022 12:35 am
I had better luck with videos like this https://www.youtube.com/watch?v=aircAruvnKk
This is a good video, but it does not explain the "convolutional" part. Each layer of my neural network performs a 3x3 convolution. You can find an explanation in that video:
https://youtu.be/YRhxdVk_sIs?t=266
I've spent a bit of time to try and understand how it works and made this picture :
Nice!
I assume one of the parameter is to give a name/identify the adress of the associated filter. and the other parameter are value injected into the filter alongside the data from the board. It seem to me that the "filter" part are identical, it's difficult to understand what is happening inside one of the "filter" due to them being 2000 combinators, following the wiring is very difficult, i will study one of them rather than looking at the whole system next time :)
What you labelled "parameter 1" are the 3x3 convolution weights. "parameter 2" is the bias, and ReLU (Rectified Linear Unit) activation. The large arrays with 3x3 blocks are indeed all identical to each other. They perform the additions of the convolutions (add all 3x3 values around each point of the board).
To describe the 8x9 grid, i used the term "layer" because it seem to me that the 2nd row is fed with results from the first row which is the only one that receive direct input from the board. The 3rd row receiving values from the 2nd, the 4rth from the 3rd and so on. which creates the vertical axis for transmission of informations inside the network.
That's correct. If you look carefuly, you'll notice that there also some "shortcut" connections that skip one layer. It is a so-called "residual" neural network:
https://en.wikipedia.org/wiki/Residual_neural_network
Would you say there is 68 neurons ? in 9 layers ?
Yes. Only one channel of the 4 outputs is used. The other 3 are evaluation neurons that estimate the probability of win, draw, and loss. But they are not used here.
Then to follow the update loop, it seem to me that those value after the filter are all gathered along side another vertical wire to bring the results to the "move filter". At this point there is also some interaction with the red box and some value stored on constant combinator are injected alongside the previous data and the ouput is only 1 move. ( the signal corresponding to one of the tile of the board ). argmax function to select the best move ?
That's correct. Injected values from the constant combinators are used to make sure that only one move is selected, even if the neural network gives two identical values to two different moves. There is a bitwise "and" combinator that clears the 8 least significant bits, and the constants have a different value for each signal (1, 2, 3, ...) to make sure that the resulting signals are all different from each other. Also, a big negative value is added to occupied squares, such that an illegal move is not selected.
At all time the update loop is connected to the "win check" and transmit the state of the board. The "win check" seem to me like a dictionnary of all possible position for a 5-in-a-row. If ever after a move is done, the board contain one of the winning combo, it's game over.

At this point : what is the rule for making rules ?

It seem like for another game the "win check" would need to be adapted, that's not necessary but makes the game more pleasant for user you said. Let's take the example of chess, that would mean having a bunch of wired combinators able to detect when there is a check mate, so probably programmed around the possibility for the King to move ? For a simpler game like checker that would mean detecting when one player has no more pieces ? for Go it would be counting stones at the end ? That's to me like 1 part of the "rules".

At this point ( with the chess example) it would be required to program the different possible move for each pieces to evaluate the king's possible move. And also to prevent the human player from cheating because i would imagine in this case the convnet would output 2 information, (1) which piece to move (2) where to move, and the AI would already know because of its training which moves are allowed and would never propose forbidden move. That could simplify writing another part of the "rules" ? assuming human won't cheat at first might make it simpler to build and potentially adding a anti-cheat module later only for convenience ?

There's still this part of handling the special interaction after one move, such as flipping stones, eating a pawn, promotions. At this point it seem to me that this is a module that would interact mostly with the display/board that has 2 main focus => 1)feed the convnet with the information necessary for deciding next move. 2) display it to the player. It feels a little outside of the "rules" but more an adapation of the red box in my mind, the "update loop"/ player turn.
For games such as gomoku, Othello, or Go, a move is a point of the board, which is simpler to handle. The convolutional neural network produces a value for each point of the board, and may give a high value to points that are illegal moves. That's why it is important to filter out illegal moves when picking the move chosen by the network.

I don't do this, but it would be possible to train the neural network to give low values to illegal moves. But it would still be not very safe to rely on the network's decision about move legality, because it might produce inaccurate outputs. So the system should be able to determine which move is legal or not.

For games where pieces are moved, such as chess, things are more complicated. We usually have a different output channel for each piece of the board, and each direction of movement. DeepMind's AlphaZero uses 73 convolutional output neurons to encode a chess move (see table S2 in that paper: https://arxiv.org/pdf/1712.01815.pdf).
more generic: and about the convnet

I was interested in how the ai would evaluate the position, but now i think the wording is not appropiate to describe how it function. We don't know how the AI evaluate the position or why one is considered a good move. Ai see the board and tell next square because it has trained and "know" it is a good move = a move that leads to "win" with highest probability. The convnet i see as filled memory system, where the parameters are carefully self-adjusted/weighted over many games after deafeat and victories against itself. The neuron itself given enough time i could understand i think the wiring but some quick explanation would help a lot. The magic is in the architecture of the convnet now :)
Vertical wires in Factorio overlap, so it makes it a bit difficult to visualize the connection pattern. The explanations I gave above, and the link to the convnet youtube video should help to make things a bit clearer. I'll try to give more details:
  • The "parameter 1" blocks multiply each input channel by 9 constant values
  • The 9 outputs are summed over input channels, so it produces 9 different values that are fed into the large array of 3x3 additions. These are full-board values (one different signal for each square of the board)
  • Each of the 3x3 block of the large array computes one signal as the sum of the 3x3 different signals from the local neighborhood. You may notice that 3x3 blocks on the side are a bit different, because there is not value coming from the outside of the board.
  • The 15x15 signals obtained by these 3x3 addition blocks are combined into the output for this channel
about the game in the game

I have played several games, sometimes i mess up when placing things on chests and i play twice in a row, and then the AI change side and beats me with my own color. One need to be careful to only place things 1 at a time, and not forgot the game is paused sometimes !
Yes. I should disable move input until the AI's move is played. The game-playing logic has a timer that counts cycles to wait for the neural-network output. I could have used it to disable user input as well, but I was too lazy to do it.
Other times the AI just beat me. It happened once that the AI didn't choose the winning move right away but lost 1 turn to block my attempt before winning i was confused. Also i'm not sure how to properly start a game, the AI always makes the 3rd move, this happenened on the save game, i didn't try the blueprint, maybe i did something wrong, i didn't put their pawn in the corner but i could have just so i could get a win for once :twisted:
The AI needs at least one piece of each color on the board in order to play. It was not trained to play from the empty board, and produces nonsense moves if it tries.

Serious gomoku players use a special protocol for opening moves, to balance the advantage of playing first. Otherwise, the first player has too big an advantage. Expert players can win easily if they play first and there are no restrictions on the first move. A simple way to balance the game is to force the first player to play one square away from the side of the board, like I did in the youtube video.

Here is a gomoku opening book I built with a strong neural network. It can help you to choose a balanced opening:
https://www.crazy-sensei.com/book/gomoku_15x15/

I also noticed that the AI will sometimes miss an immediate win. 8 layers of 8 neurons is really a very small network, and it is not big enough to produce a really very smart AI. The typical size of strong game-playing networks is 40 layers of 256 neurons. But that would be too big for Factorio. And also way too strong for most people to have fun.

I tried to optimize performance by powering off the network when it is not used, but it does not help at all. It seems that combinators use almost exatcly the same CPU time whether they are powered on or off.

mmmPI
Smart Inserter
Smart Inserter
Posts: 2674
Joined: Mon Jun 20, 2016 6:10 pm
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by mmmPI »

deep_remi wrote:
Tue Apr 26, 2022 8:46 am
Each layer of my neural network performs a 3x3 convolution. You can find an explanation in that video:
https://youtu.be/YRhxdVk_sIs?t=266
Thank you, never heard of that, i had to watch some more videos to not feel lost :

How Blurs & Filters Work - Computerphile ( 8 min ) about kernel convolution, what math is being done, visually explained.
https://www.youtube.com/watch?v=C_zFhWdM4ic

Finding the Edges (Sobel Operator) - Computerphile (8 min ) combining several of them and using it for an image editing purpose
https://www.youtube.com/watch?v=uihBwtPIBxM

CNN: Convolutional Neural Networks Explained ( 15 min ) how they are combined in a neural network like the very genery one described in the early video.
https://www.youtube.com/watch?v=py5byOOHZM8

That's not related to gaming AI, but there are pictures and drawing very similar to a board game, those helped me "vizualize" what you called "perform a 3x3 convolution" ; what is could be used for ; some examples of vulgarisation too. The videos are referenced in each others as they are related.
deep_remi wrote:
Tue Apr 26, 2022 8:46 am
  • The "parameter 1" blocks multiply each input channel by 9 constant values
  • The 9 outputs are summed over input channels, so it produces 9 different values that are fed into the large array of 3x3 additions. These are full-board values (one different signal for each square of the board)
  • Each of the 3x3 block of the large array computes one signal as the sum of the 3x3 different signals from the local neighborhood. You may notice that 3x3 blocks on the side are a bit different, because there is not value coming from the outside of the board.
  • The 15x15 signals obtained by these 3x3 addition blocks are combined into the output for this channel
Vertical wires in Factorio overlap, so it makes it a bit difficult to visualize the connection pattern.
It's easier to see the wiring when the 3x3 alternate rotation. Now i realise the large array is composed of 3x3 blocks organised in a 15x15 grid like the board. (45x45)
deep_remi wrote:
Tue Apr 26, 2022 8:46 am
What you labelled "parameter 1" are the 3x3 convolution weights. "parameter 2" is the bias, and ReLU (Rectified Linear Unit) activation. The large arrays with 3x3 blocks are indeed all identical to each other. They perform the additions of the convolutions (add all 3x3 values around each point of the board).
That make sense now, the word "filter" where i used it is not correct; because it has another particular meaning in this context. seemed to be associated with the convolution weights in the video you linked; a "filter" is a a 3x3 matrix containing some numbers that the AI while training adjusted to detect something that helped her win games.

Whereas the large array is a way to do paralellization of all operation that constitute the convolution. It is the factorio implementation of the operation, only arithmetic combinators.

Injected values from the constant combinators are used to make sure that only one move is selected, even if the neural network gives two identical values to two different moves. There is a bitwise "and" combinator that clears the 8 least significant bits, and the constants have a different value for each signal (1, 2, 3, ...) to make sure that the resulting signals are all different from each other. Also, a big negative value is added to occupied squares, such that an illegal move is not selected.
I spotted this located at the intersection of the green and red box in the annoted pictures.

For games such as gomoku, Othello, or Go, a move is a point of the board, which is simpler to handle. The convolutional neural network produces a value for each point of the board, and may give a high value to points that are illegal moves. That's why it is important to filter out illegal moves when picking the move chosen by the network.

I don't do this, but it would be possible to train the neural network to give low values to illegal moves. But it would still be not very safe to rely on the network's decision about move legality, because it might produce inaccurate outputs. So the system should be able to determine which move is legal or not.

For games where pieces are moved, such as chess, things are more complicated. We usually have a different output channel for each piece of the board, and each direction of movement. DeepMind's AlphaZero uses 73 convolutional output neurons to encode a chess move (see table S2 in that paper: https://arxiv.org/pdf/1712.01815.pdf).
This is surprising, i expected the machine to only be allowed to do legal move during training by design and therefore be unable to even propose an illegal move when the resulting network is exported and playing. But an illegal "ouput" is more complex to define in chess as it depend on what piece is selected first which generates illegal move (pieces do not have same rule of movement) and then on what is on the board around that piece ( what square is occupied, by who ) whereas in Gomoku, it's only the 2nd part that matter. Maybe that explain the different approach ?

The AI needs at least one piece of each color on the board in order to play.

Here is a gomoku opening book I built with a strong neural network. It can help you to choose a balanced opening:
https://www.crazy-sensei.com/book/gomoku_15x15/

I also noticed that the AI will sometimes miss an immediate win. 8 layers of 8 neurons is really a very small network, and it is not big enough to produce a really very smart AI.
I have found 1 way to win when i put their pawn on the corner and i start with 2 in the middle, hahaha, i don't need balanced opening i need less obvious way to cheat :D I tried playing the exact same again to see if the AI would do something different, in all i've looked i haven't seen some pseudo-randomness added, but i couldn't remember exactly my own moves and the game went different. I expect the same exact game to be happening though if i write down my move ?

deep_remi wrote:
Tue Apr 26, 2022 8:46 am
If you look carefuly, you'll notice that there also some "shortcut" connections that skip one layer. It is a so-called "residual" neural network:https://en.wikipedia.org/wiki/Residual_neural_network

Only one channel of the 4 outputs is used. The other 3 are evaluation neurons that estimate the probability of win, draw, and loss. But they are not used here.

I tried to optimize performance by powering off the network when it is not used, but it does not help at all. It seems that combinators use almost exatcly the same CPU time whether they are powered on or off.
(I have re-ordered and altered some quote)

I noticed the "shortcut", the wikipage contain 3 times "clarifications needed" in the only 3 paragraphs that are not obscure formula :). Though i don't think i really need to understand WHY they are here for my idea of optimization. They do make things a bit more complicated but the math is not fundamentally different.

( i removed the 3 unused neurons for my testings )

When i loaded the map on a 2009 laptop, it was running at 18 UPS or so, and when looking at what was taking the most time using the F4 menu and "show-time-usage". I noticed it's not the circuit network that consume the update time but the entity for around 38 or so, electric network for 10 and circuit network accounted only for 2 roughly. Therefore my conclusion is that there are too many combinators :). Not that the value are constantly changing or that there are too many calculations being done.

I think for this particular case the massive parralelization occuring is actually not benefiting the UPS metric. I think if there was only 1 single neuron/ array of (45x45) instead of 68, the Gomoku itself would be slower to unfold in real time, but you would still get 60 UPS, potentially even more. It's just an idea, but if you allow yourself to work with some delay of 2 or 3 second for example, 120 180 ticks. Then you can create a timer after each human move during which you have 3 ticks to use the single neuron as the neuron 1, and store the result, then 3 ticks to use the single neuron as neuron 2, and store the result, then 3 ticks to use the single neuron as neuron 3 ans store the result. and so on.

This would require to have the 68 "parameter 1" and the 68 "parameter 2", stored both in constant combinators and maybe 9 or 27 unique channels (on top of the 15x15 already used for the board and the few used for the player turn and reset. ) And a better cartography of where the "shortcuts" are located than what i have at the moment. But from what i've seen it would be feeding to neuron 16 located in a bottom layer, the result of neuron 4 located in an upper layer, sometimes.(made up illustrative numbers) Carefully managing the serialization. Swapping the parameter as input of the single neuron for the minimum amount of time to perform the convolution.

3 second or 180 tick delay is enough i think as it gives 3 ticks for the swapping, but maybe it's possible with only 2 ticks, maybe 4 or 5 ticks are required to store and fetch parameters. and that would mean the AI would take 5 or 6 second to make a move, but maybe those 5 or 6 second you can have 120 or 180 UPS. That's the kind of trade-off i'm seeing possible here. Maybe using only 8 neurons instead of 1 or 68 is a good compromise, to reduce the total number of active/powered entity as it seemed to be the cause of UPS-loss for me.

deep_remi
Burner Inserter
Burner Inserter
Posts: 13
Joined: Wed Mar 23, 2022 9:17 pm
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by deep_remi »

mmmPI wrote:
Wed Apr 27, 2022 7:33 am
I have found 1 way to win when i put their pawn on the corner and i start with 2 in the middle, hahaha, i don't need balanced opening i need less obvious way to cheat :D I tried playing the exact same again to see if the AI would do something different, in all i've looked i haven't seen some pseudo-randomness added, but i couldn't remember exactly my own moves and the game went different. I expect the same exact game to be happening though if i write down my move ?
Yes, the AI is deterministic.

I think for this particular case the massive parralelization occuring is actually not benefiting the UPS metric. I think if there was only 1 single neuron/ array of (45x45) instead of 68, the Gomoku itself would be slower to unfold in real time, but you would still get 60 UPS, potentially even more. It's just an idea, but if you allow yourself to work with some delay of 2 or 3 second for example, 120 180 ticks. Then you can create a timer after each human move during which you have 3 ticks to use the single neuron as the neuron 1, and store the result, then 3 ticks to use the single neuron as neuron 2, and store the result, then 3 ticks to use the single neuron as neuron 3 ans store the result. and so on.
On my machine, I get close to 60 UPS, so I did not bother much. Re-using the 45x45 array seems rather complicated to do.

I feel there should be a way to optimize the Factorio code such that a large circuit that is powered off does not take UPS time. That would be the best way.

mmmPI
Smart Inserter
Smart Inserter
Posts: 2674
Joined: Mon Jun 20, 2016 6:10 pm
Contact:

Re: Five-in-a-row game, with deep-learning AI

Post by mmmPI »

deep_remi wrote:
Wed Apr 27, 2022 9:20 am
On my machine, I get close to 60 UPS, so I did not bother much. Re-using the 45x45 array seems rather complicated to do.

I feel there should be a way to optimize the Factorio code such that a large circuit that is powered off does not take UPS time. That would be the best way.
I saw you posted the save in the optimization save file thread already. That's where it belongs i think if you mean optimizing factorio's code.

I've been playing a bit more with the blueprint, to get more familiar before attempting a modification, i think it's doable to reuse the 45x45 array from a factorio combinator point of view, i'm not sure i can do it, or that it is a good idea but it leads me to try and understand the little details which are interesting.

For example i had not given too much thought about the possibility to choose your own color, but now i realize the implication on the build, during normal game, the value from the board are sent to the neural network using 3 different horizontal red wire. The upper one is always carrying the signals for empty tile on the board, but the bottom two alternate between sending human's ooccupied tile and AI's occupied square. This is triggered by a change in the value of the "Z" signal that is either 1 or 2 depending on which turn is it to play.

This means you can read the move the AI would play if it were its turn. namely the output of the "move filter" is a signal that designate the tile the AI would play if it was playing against itself when it's the human turn to play. The signal is already carried via red wire which makes it very easy to plug on one of the debug board, then it's like an advice/cheat for your game :)

I plan on using this Z signal value change to trigger a timer that would help syncing input data for the "single neuron" idea. That seem rather complicated to do, but i have lots of free time and i'm enjoying playing with those :)

Post Reply

Return to “Combinator Creations”