monte carlo simulations for AI?

iceiceice · Post by **iceiceice** » November 21st, 2013, 12:46 am

Hi,

Recently I had an idea for an AI project. The idea is still evolving but a necessary component is the ability to do Monte Carlo simulations.

Specifically, I want the AI to use the following procedure to evaluate whether it can "take a fight". I have in mind a heuristic which represents "max-aggression dogfighting" i.e. plan your turn and try to kill as much gold as possible. What I would like to be able to do is make a copy of the game state, switch all parties to be controlled by this heuristic, then finish my AI turn and run one opponent turn and record who killed more gold. Then repeat this 50-100 times and use the average to decide whether it's a good fight or not. (At least that's the basic idea.) I hope also to use the procedure to decide whether the AI can defend efficiently from some hypothetical position. (To be clear, the ai will not be max aggression dogfighting at all times, it will just consider whether it should do so / what happens if its enemy does so when it is making up its mind.)

It seems that if you want to make a copy of the game state the easiest thing is going to be to write this in C++, which is fine because I'm more comfortable in C++ than lua anyways. So over the past two days I've been looking at the source code in the AI module on and off.

Because of the class architecture used in wesnoth, it seems that this is potentially much more complicated than I thought it would be so I'm writing to ask for advice.

One way that you could write a game like wesnoth is to have a class e.g. game_state which contains all info directly relevant to the board game, e.g. the map, a list of units with their stats, a list of sides, a TOD object, etc. When the state changes, this class has methods e.g. move_unit, attack_unit, recruit_unit, end_turn, which are either void and update the state, or which return a new game_state object updated with the changes. (I guess immutable objects with modifier methods are more typical of java style, but anyways it's a minor design decision.) This object could also have serialization functions for when you want to save game etc. In case you want to do simulations like I do, you can just copy the game state and simulate away, then throw the state away when you are done.

Instead what I have found is that wesnoth has a single global namespace called "resources" which contains pointers to all essential info and modules. For example there is a game_map, list of units, a class called game_state (different from what I mention above), but also a list of controllers, a display, a sound manager, white board manager, etc. Updates are made to the game state in the following way: AI classes generate AI action objects, which instruct a particular unit to move / attack somewhere. AI classes derive from a "read_write_context" class, which contains methods "execute_move_action" etc., to which these action objects are passed. These methods pass the ai action objects to the global helper class "actions", which in turn casts them as fully-fledged action objects and asks them to execute themselves. In e.g. /src/actions/attack.cpp, we see that the attack::perform() function is where the battle calculations are gathered, the randomness is sampled, the outcomes recorded and the animations are told to occur.

Because these functions all refer to the global pointers, there is no way for me to pass them my dummy game_state to operate on instead. But it is a terrible idea to try to reimplement all the combat calculations and such, and I would much prefer to reuse all the gamestate modification code.

The most direct solution then is that I should clone all the relevant objects pointed to in "resources" and store them in a safe place, then run my simulation using the core code, and at the end replace the pointed objects with the saved clones. I would have to be able to disable any updates to the display / replay recorder. However, because all of these pointers are global I can't know what the side effects of this would be without reading all the source code. As far as I know there may be some other thread / process which will explode if I mess around with these pointers, and even if I coded it up and tried it I might not detect this problem for a long time. Can someone who is knowledgeable about the source code generally comment on the wisdom of this?

The best alternative I can think of is that I would have to implement an "AI_sandbox" context, also deriving from read_write_context, which would store a local copy of the game state, storing info that I grab from "resources", and which would override the "execute_move_action" etc. methods, so that these actions apply to the local copy. Then I could implement my heuristics as rudimentary AI's and attach them to this context, and do my simulation by calling play_turn on each side. The downside of this is that I would still have to copy/paste alot of core game code, e.g. the code that passes info to the battle_calculations object, performs the hits, manages the time of day, etc. It's not the end of the world but its a lot of refactoring, and it is always bad to duplicate code like this, in case e.g. that code gets updated in the future.

Did I miss anything in my assessment here? I remember in SeattleDad's ML recruiter thread that he asked about doing monte carlo simulations in wesnoth, has anyone already done what I am trying to do?

Thanks for reading this far

~iceiceice~

Post by **mattsc** » November 21st, 2013, 1:02 am

iceiceice wrote:Thanks for reading this far

Thanks for posting all that.

My answer is going to be pretty disappointing, as I know quite well what the AI does and am very interested in anything that has to do with it, but I only have a very tenuous idea of how C++ works. The person you really want to talk to about this is Crab (although some other folks are much more qualified than I am too), who's very busy in his non-Wesnoth world at the moment. So you might want to send him a PM pointing him to this thread - and tell him that I'm sorry to bug him with yet another thing, even if indirectly.

Anyways, to answer your last question, no, I don't think anybody has done this yet.

Post by **JaMiT** » November 21st, 2013, 2:57 am

iceiceice wrote:The most direct solution then is that I should clone all the relevant objects pointed to in "resources" and store them in a safe place, then run my simulation using the core code, and at the end replace the pointed objects with the saved clones.

Well, I would store the originals in a safe place, work with the clones, and at the end restore the originals. (The end result should be the same, but it just seems more polite to return the originals than to return copies. And it better simulates a call stack.) But otherwise, yeah, this is how you would go about setting up a dummy environment. Except, I suppose that would throw off the statistics and possibly other bookkeeping. Hmm... for a proof-of-concept, I would clone/replace *resources::units for your simulation. Also *resources::teams if you might include recruiting or recalling. I think that would be enough to see how well your simulations work. (It probably is not enough for an end result, but how much prep work do you want to do before a proof-of-concept?)

For the final result, you might need to define a copy constructor for play_controller. If done right, it might then be enough to store all the pointers (not what they point to) from resources.hpp then clone *resources::controller. (The play_controller sets most of the other resources. In one sense, play_controller is the overall object you were looking for.) After disposing of your copy, restore the resources:: pointers. In theory at least. I think it's plausible enough to worry about later.

iceiceice wrote:I would have to be able to disable any updates to the display / replay recorder.

Don't worry about the recorder until you have your simulation working. (If the simulation takes too long to put into practice, then there is no need for replays.) To disable updates to the display, include "video.hpp", then create a video-locking object:
update_locker lock_update(resources::screen->video());
The display is locked for the life of this object.

iceiceice · Post by **iceiceice** » November 22nd, 2013, 4:23 pm

Thanks for the tips everyone. Based on JaMiT's advice I've decided to try to implement the helper ai first and get it running as well and as fast as possible before trying to get the monte carlo stuff to work. If I can't get it running in like 10-50ms or so then the monte carlo idea is probably shot anyways. (I don't know how fast it runs right now.). If you're interested I've set up a github repo with my progress.

Quick question: Is there anyway to use the --load flag and the --ai-config flags together? I've set up a bunch of test scenarios which are snapshots before a big battle in a game with two human opponents, I would like to swap my test ai's in to see what they do.

If not I guess I can change the controllers in the save game files using a sed script or something.

Crow_T · Post by **Crow_T** » November 22nd, 2013, 10:16 pm

I'm totally out of my league here, but from my 3D days I recall that QMC was fast and still had good results https://en.wikipedia.org/wiki/Quasi-Monte_Carlo_method I'm guessing though that these are part of your tests...

iceiceice · Post by **iceiceice** » November 25th, 2013, 2:22 am

Hi, I thought I would post an update.

So a big part of my idea was that you can use linear programming to approximately figure out e.g. what is the weakpoint of the enemy line, what is the best unit to hit with first, and some things like this. I have not used linear programming packages before but they are supposed to be highly developed and super fast nowadays, like a "small" size program has thousands of constraints and variables.

They say that you cannot predict how fast you can solve a program based on size alone, the structure of the LP matters. So there was some testing that needed to be done to see exactly how fast they would be for the LPs I am getting from wesnoth, and if it would be good enough.

Preliminary testing shows that if I run it on LPs based on the position at a typical climactic battle from one of Cackfiend's or Amikrop1's ladder games, it builds the LP in 10ms and solves it in < 1 ms. I've tested it on 8 different replays. That is excellent, it means that solving the LP is usually going to be negligible compared to just writing it down. So I think using LPs for monte carlo techniques has promise, esp. because if you want to solve hundreds of LPs on a turn, lots of times they are related. For example, if you attack a unit and don't kill it, usually that means you just delete a few rows from the program or something like this, and then you are set for the next turn. So the cost to build can be amortized over many turns as well, which means you could potentially run simulations which solve thousands of LPs in a second, if it was all coded up carefully.

The next step is to try to set up the monte carlo stuff, and write the AI that is actually supposed to use the LPs to play wisely. Right now all I do is solve them and give debugging output.

iceiceice · Post by **iceiceice** » November 30th, 2013, 10:39 pm

Ok, time for another update --

Good news: I got the basic monte carlo idea suggested by JaMiT to work.

Spoiler:

Bad news: Right now the proof-of-concept AI takes about 10-20 seconds for each move on the most complicated positions, like the first one attached, even with debugging output off on the helper/simulation AI. (Normally it writes LP to temporary files to help with debugging.) Also it crashes, I think because I have a memory leak somewhere ><.

Good news: The proof of concept AI is far from optimized -- it does not do any LP caching. Also I have not used the special optimization features of the LP solver library. Right now it reallocates the entire matrix every time I add a row. There are options to make it preallocate instead which allegedly speeds things up at least an order of magnitude. I believe that is the bottleneck right now in overall program performance. So it may be possible to get two orders of magnitude performance increase after optimizations. Also maybe I can write a modified copy constructor for unit_map which doesn't mind if the input has some active iterators into it. If not, I think 5 seconds / move might be an acceptable goal for this project.

Bad news: The proof of concept AI seems like it might actually be worse than the helper AI it is supposed to be boosting... I think I need to increase the number of simulations to drown out noise ><, so that is something to fight against also.

Good news: On less complicated but still quite complicated positions it seems to be running just fine.

Note: The helper ai is also still far from complete, it does not understand about leadership, backstab, or berserk. Or that sometimes you should move a unit off the front line to make way for a different unit. But the essence of it is close to what I have imagined.

Also, the monte carlo AI is not currently playing the side of the enemy in the simulations, like I want, because I can't do that until I do a copy constructor for play_controller.

iceiceice · Post by **iceiceice** » November 30th, 2013, 10:59 pm

Crow_T wrote:I'm totally out of my league here, but from my 3D days I recall that QMC was fast and still had good results https://en.wikipedia.org/wiki/Quasi-Monte_Carlo_method I'm guessing though that these are part of your tests...

Hey, just looked at this. That's actually pretty cool, I didn't know about this. I'm not sure if it's appropriate however?

The idea is that if a function is "smooth" or "simple" in some sense, i.e. its a 3D shape or some smooth function, then I can estimate the integral / volume using low discrepancy samples instead of truly random samples and it will be fooled. But it can't speed up arbitrary integrals -- suppose the function I want to integrate is f(x) = 1 if x is in the quasirandom sequence and f(x) = 0 otherwise. Then it wouldn't converge better / there would be the potential for massive error. They write that the error can be bounded in terms of some quantity V(f), which measures the variation of f in some sense. So if the variation is small it works, and otherwise you are boned.

The function I'm trying to integrate by monte carlo is "quality of AI performance" as a function of the sequence of random bits given from the RNG. I really have no idea if that is going to have small or large variations... but I don't think the function is going to be at all simple, and it definitely won't have uniformly bounded variations, in the sense that if you change one or a few bits from the rng it can sometimes totally change the course of the game.

I guess I could just try it and see what happens. If the QMC ai plays better then for sure I'll keep the patch.

taptap · Post by **taptap** » December 1st, 2013, 10:31 pm

In Go there is a lot of success with Monte Carlo based AIs recently, but if you add multiple moves + randomness in the results + partial moves + different sequence of moves etc. you will need a lot more to get anywhere similar as in Go.

Personally, I believe the AI could already gain a lot of strength if it were possible to shuffle / sort the move order according to situation. Much easier to do, but a new mechanism to play with for those programming the AI.

iceiceice · Post by **iceiceice** » December 2nd, 2013, 12:03 am

taptap wrote:In Go there is a lot of success with Monte Carlo based AIs recently, but if you add multiple moves + randomness in the results + partial moves + different sequence of moves etc. you will need a lot more to get anywhere similar as in Go.

Personally, I believe the AI could already gain a lot of strength if it were possible to shuffle / sort the move order according to situation. Much easier to do, but a new mechanism to play with for those programming the AI.

That's interesting, I know very little about go AI's... It seems like monte carlo would be quite hard to use in a game like Go, and at least to me it seems that monte carlo should be quite useful in wesnoth. In go, a single stone can dramatically affect the health of many stones which are very far away. In wesnoth its just not the case, a single move only directly affects one unit on its turn, (although of course a unit can have macro implications). Chance plays a role in wesnoth but in a large battle it plays less and less role, so monte carlo is a natural way to test if a plan is good or not. Go positions seem quite volatile so its not clear that monte carlo will quickly lead to good results (depending on exactly what you have in mind of course). Edit: I read some more about this... interesting but my guess is this playthroughs idea won't be good for wesnoth.

The core of what I am trying to do is get the AI to build good plans on its turn, and then use monte carlo to measure how good an outcome they might lead to. I'm finding that you can use linear programming techniques to get the AI to quickly build a tactical plan that is pretty good -- you don't need monte carlo for that to be useful, but I think it has the possibility to make it very useful.

This stuff is also only relevant to managing a large battle with say 10-15 units on each side -- for a small skirmish, e.g. how do I figure out how to keep the enemy scout away from my villages by threatening to zoc it, I will have to think of a totally different routine probably. Also I don't have a good idea right now how to get the AI to manage strategy, like which sides to reinforce and such. It would be nice if there were some unified procedure but I don't really see it yet.

The Battle for Wesnoth Forums

monte carlo simulations for AI?

monte carlo simulations for AI?

Re: monte carlo simulations for AI?

Re: monte carlo simulations for AI?

Re: monte carlo simulations for AI?

Re: monte carlo simulations for AI?

Re: monte carlo simulations for AI?

Re: monte carlo simulations for AI?

Re: monte carlo simulations for AI?

Re: monte carlo simulations for AI?

Re: monte carlo simulations for AI?