Page 1 of 1

Database of Replays for Machine Learning

Posted: Wed Jul 10, 2019 2:56 pm
by pathtoneuralink
Greetings,

I am a Deep Learning Engineer who has started experimenting with Deep Reinforcement Learning on Factorio. I am interested in seeing what strategies an agent similar to AlphaStar @deepmind would develop while learning to play this game. One thing that would be incredibly helpful for creating such an agent is a database of replays. Please create a feature to collect a store player replays. Perhaps make it an opt-out feature so that players automatically collect this data unless they don't want their replays being captured. Should my research be successful it could be a great advance for the field of Artificial Intelligence and Machine Learning. The long term planning required to "beat" Factorio has never been successfully demonstrated and I would love your help making this happen!

Sincerely,

Database of Replays for Machine Learning

Posted: Wed Jul 10, 2019 3:03 pm
by pathtoneuralink
I am attempting to teach a Deep Reinforcement Learning algorithm to play Factorio. One feature that would be incredibly helpful in doing so is a database of player replays. Please consider implementing auto collection of player replays as an "opt-out" feature. Meaning that players will have to select not to have their replays captured and stored.

Re: Database of Replays for Machine Learning

Posted: Wed Jul 10, 2019 4:37 pm
by mmmPI
Hi
I would be very interested to see what an algorithm can come up with as a factorio base.

BUT: Asking that by default every players' computer would do extra work to basically store some information about how they use them feels relly wrong.

I would understand if other interested person can willingly share their replays to help you, and I would send some just to see if your algorithm starts drawing same stupid thing i do with rail when i'm bored.
pathtoneuralink wrote: Wed Jul 10, 2019 2:56 pm The long term planning required to "beat" Factorio has never been successfully demonstrated
I'm not sure what you mean by that, but there are severals speedrunners or even team of them, they probably produce more useful material than your average player, since they aim at objectives, don't do experiments during a game, are efficient , and also they are doing it also for the public so they may be more willing to share their thing for mutual exposure.

Compare that to letting alphastar learn watching trained chess grand master, rather than any game played on a chess app.
I am not sure if quality matters over quantity of replays in your case but i hope it does because it seems interesting and most of the time my replays are just me playing x hours of a new game with all cheats to try an idea i had during the day, sometimes it would be like 4 new games of 1 hour just drawing some rails.

Again it feels wrong to ask to have it "on" by default, that's exactly how behave some of the most annoying pieces of software that by defaults launch on start, auto-updates ofc at the worst time, call it "security update" when really it's updating the incorporated ads or integrating more and more possibilties to use hardware they sell with the software you are forced to use, store tons of useless data about you and your usage of the software, even years after they stopped maintaining it because they are selling/forcing a new version instead of fixing the mess they did with the previous ect

Most of the program I use are already like that, please I have never seen someone asked for this kind of automatic behavior to stop because it's no longer relevant. It's always about starting it. If it's to gather 0.00001% of the needed data every day, is it worth it ? How much time/replays do you think is needed to make something useful ? How can anyone estimate those ? How much data do you think that represent in terms of storage ? For how much time do you think the investment in real money should be planned to store all that ? What are the conditions that would put an end to the process because it is "over" ? What are the costs of those for the player in electricity, cpu usage, storage, internet data ? Those pile up little by little towards infinity :(

I hope i do not sound rude or agressive, i would be very interested to see the results, as i have carefully watch those kind of algorithm playing starcraft, playing dota and i am dreaming about a civilization 12 or 15 that would have an AI that can beat my ass for the prices they are, but this makes me angry, I think it's counterproductive for the general field because it rely on insidious methods that creates distrust.

Maybe I'm just an angry dude but you could still try and convince me because you will probably encounter some others more serious later :D

Re: Database of Replays for Machine Learning

Posted: Wed Jul 10, 2019 5:04 pm
by eradicator
I am me. I am interested in how the fuck you got the idea that what you're demanding is not completely ridiculous on too many levels to even attempt to explain it (cost, legality, bandwidth, etc...). And why you think it's ok to make not one but *two* threads about it as your very first posts. This could lead to great advances in troll detection.

Re: Database of Replays for Machine Learning

Posted: Wed Jul 10, 2019 6:38 pm
by urza99814
pathtoneuralink wrote: Wed Jul 10, 2019 2:56 pm Greetings,

I am a Deep Learning Engineer who has started experimenting with Deep Reinforcement Learning on Factorio. I am interested in seeing what strategies an agent similar to AlphaStar @deepmind would develop while learning to play this game. One thing that would be incredibly helpful for creating such an agent is a database of replays. Please create a feature to collect a store player replays. Perhaps make it an opt-out feature so that players automatically collect this data unless they don't want their replays being captured. Should my research be successful it could be a great advance for the field of Artificial Intelligence and Machine Learning. The long term planning required to "beat" Factorio has never been successfully demonstrated and I would love your help making this happen!

Sincerely,
Harvesting vast quantities of user data without the user's consent and without informing them is NOT something that responsible developers should be doing.

Pretty sure Factorio already allows exporting replays, so why not ask people to voluntarily submit those rather that asking the devs to do your work for you and to do it by force? If they implemented it the way you suggest, you won't me getting my replays anyway because I'd immediately uninstall Factorio. That kind of behavior is absolutely unacceptable IMO.

If you're a deep learning engineer, you ought to understand how easy it is for AI systems to deanonymize data and to draw conclusions from seemingly irrelevant datasets. We can identify individuals from how they walk or from the style of their writing; we can identify specific individuals based on professionally "anonymized" datasets of medical information... There's no way that anyone could convince me that this couldn't possibly leak sensitive information, especially if it's implemented by game devs rather than security experts. I'm not willing to take that risk, and I'm not going to trust anyone who just automatically assumes that I will.

Re: Database of Replays for Machine Learning

Posted: Wed Jul 10, 2019 7:31 pm
by Deadlock989
pathtoneuralink wrote: Wed Jul 10, 2019 3:03 pm I am attempting to teach a Deep Reinforcement Learning algorithm to play Factorio. One feature that would be incredibly helpful in doing so is a database of player replays. Please consider implementing auto collection of player replays as an "opt-out" feature. Meaning that players will have to select not to have their replays captured and stored.
The whiff of bullshit is strong with this one.

Re: Database of Replays for Machine Learning

Posted: Wed Jul 10, 2019 7:58 pm
by Koub
[Koub] First, don't make several threads about the same suggestion at a few minutes interval. Just merged both topics.

That was the moderator-me speaking. Now the me-me (if that makes sense).

On the one hand, the idea of using machine learning to see if an AI could learn to play Factorio optimally (for speed, or for UPS, or for whatever goal) could be fun.

On the other hand, using automatic data collection to create a huge database of replays, with an opt-out option is, as eradicator stated, a multi-layered absurdity.

I don't know where you live @OP, but in Europe, collecting and processing data that can be personal (which a replay contains), even with its owner consent (and even if it's publicly available), must obey strict rules like explaining why you collect them, for how long, give the guarantee that the personal data is "safe", give the guarantee that you don't collect more data than you strictly need to achieve your goal, ... I'm no specialist, but if you want more detail, I'm sure there are websites that explain these things.

Also I don't know the weight of a replay, but a savegame can weight anywhere from a few MB to a few hundreds MB, I'm sure the replays will be the same order. Can you imagine how much data would be uploaded daily somewhere ? that would be insane. I'm a lucky guy, I have amongst the best internet connexions available to regular people. But there are places with upload quotas and crappy connexions - and not only in remote places. This would kill people's Internet bill.

Let's imagine we go this far ... Now what would be needed is something to parse and interpret the replay, and translate it into something usable by a deep learning machine. Why would the devs invest time into this ?

I'll be honest, this suggestion is nowhere near convincing me it's realistic or desirable.

Re: Database of Replays for Machine Learning

Posted: Wed Jul 10, 2019 9:54 pm
by mudcrabempire
The amount of data needed for machine learning is huge. If I remember correctly, the Dota2 bot developed by OpenAI is processing millions of games per day. And more data is almost always better (you still have to do quality control). So we are talking here about amounts of data where "manually gathering some replays" is like a drop in a barrel. Or at least a cup.

I assume this is why the OP wanted some kind of automated data collection. But yeah, as the others pointed out before me, this suggestion was poorly thought-out/poorly worded at best.

If you are serious about this, I suggest that you look into the Dota2 bot from OpenAI more closely. As far as I understood, they did not use replays for training but instead just let the AI play against itself. Over and over and over and over and over... In order to tell the AI whats good and bad they used some kind of reward system and in order to help the AI learn all of the many aspects of the game, they started with a very restricted game and slowly permitted the AI the use of more and more options.

But I'm not writing anything here which is not better explained on their website. Also, I don't know much about about the details of machine learning, but I'll just go ahead and assume that you have access to some serious computating power? Like, you're probably not gonna get anywhere with a generic personal computer.

Re: Database of Replays for Machine Learning

Posted: Wed Jul 10, 2019 9:57 pm
by slippycheeze
pathtoneuralink wrote: Wed Jul 10, 2019 2:56 pm I am a Deep Learning Engineer who has started experimenting with Deep Reinforcement Learning on Factorio.
...did your ethics board sign off on this? Because I'm gonna guess you either don't have one, or didn't tell them you were planning to push for an opt-out on people literally all the world around, because that is a legal set of bear traps you just don't want.

Now, what you *could* do is ask players to donate maps with the "replay" option set, which would give you what you want, already exists, and is an opt-in. You still need an ethics (or legal) approval because the data certainly straddles the lines the GDPR and friends cover, so I'd be concerned about ensuring my, eg, data destruction and withdrawal of consent policies were up to date. (or, at least, the lawyers and ethics board were clear I didn't need them.)

Beyond that, you need to rethink your strategy for opening this conversation. As you can see, this was a poor tactical move, and you are gonna have way more difficulty getting voluntary participation now. Sorry. People are hard.

Re: Database of Replays for Machine Learning

Posted: Thu Jul 11, 2019 6:00 am
by Oktokolo
pathtoneuralink wrote: Wed Jul 10, 2019 2:56 pm I am a Deep Learning Engineer who has started experimenting with Deep Reinforcement Learning on Factorio. I am interested in seeing what strategies an agent similar to AlphaStar @deepmind would develop while learning to play this game. One thing that would be incredibly helpful for creating such an agent is a database of replays. Please create a feature to collect a store player replays.
Do not demand that other people do your homework. You don't need a single real life replay to train an ANN. Come up with a fitness function and a way to interface with Factorio - and let your ANN play the game.

Re: Database of Replays for Machine Learning

Posted: Thu Jul 11, 2019 10:24 am
by darkfrei
Alpha Zero has no database, just trying and checking if it makes better result as before.

Re: Database of Replays for Machine Learning

Posted: Thu Jul 11, 2019 11:56 pm
by slippycheeze
darkfrei wrote: Thu Jul 11, 2019 10:24 am Alpha Zero has no database, just trying and checking if it makes better result as before.
They were talking about AlphaStar, which used supervised and reinforcement learning as part of the education process, and worked in the RTS area. You don't necessarily need that, especially with what AlphaZero demonstrated, but it can sure help bootstrap the early days quicker. Especially when the interactions are as rich as Factorio, compared to Go, that can be significant. Which of 64x64 tile grid do I play on is a little faster to guess through and infer the rules than which of the 2560x1440 pixels, a full keyboard, and approximately zero useful things that happen to improve fitness without multiple sequential interactions.

Not that I'm in favour of the out-out strategy for collecting data or anything, just ... don't mistake that the *only* way for training is AlphaZero style these days. :)

Re: Database of Replays for Machine Learning

Posted: Tue Jul 16, 2019 5:32 pm
by Darinth
I'm mostly just curious if this is something that could be modded into the game. That makes it inherently opt-in, with the community potentially opting to create a library of playthroughs for researchers.