AI based on Deep Learning

Discussion of all aspects of the game engine, including development of new and existing features.

Moderators: Forum Moderators, Developers

mattsc
Posts: 1063
Joined: October 13th, 2010, 6:14 pm
Location: Wandering, mostly aimlessly

Re: AI based on Deep Learning

Post by mattsc » December 19th, 2017, 12:53 am

milwac wrote:This is an excellent resource mattsc, thank you! While I look through this code, perhaps I can be a bit more specific about what I'm looking for right now, and ask : are there some simple functions that I can write in a script in Python or Lua to access in game properties like - side to move, villages each side has, types of units available, positions of units, reachable hexes for a unit etc.? I wish to visualize these properties while a game/replay is being played, and see which ones could be useful as input features, before I think about the AI code itself.
I don't know about python, but a lot of this is built-in Wesnoth Lua functionality. Go to LuaWML and check out the links in Section 4. For example:

- side to move: wesnoth.current.side
- villages: wesnoth.get_villages
- types of units: wesnoth.unit_types
- reachable hexes: wesnoth.find_reach

If you want more complex functions, there's a good chance that I or somebody else has already written something in our own AI experiments. Just ask, it might exist. These are very different types of AI though, so likely not much help to what you want to do otherwise.

The bigger problem I see is that you say you want to visualize this while you are playing a replay. The way Wesnoth works is that anything you want to see in a replay needs to be there while the game is being "recorded" (played). So while that is possible (some of my AIs have that type of debugging mechanism built in; see, for example, eval_exec_CA.lua), it has to be present when the game is started and cannot be applied after the fact to previous replays. Well, it is possible, but you would have to add it by editing the replay files. Writing a script to do so automatically is, of course, a possibility.

User avatar
milwac
Posts: 29
Joined: April 9th, 2010, 11:40 am

Re: AI based on Deep Learning

Post by milwac » January 2nd, 2018, 2:03 pm

Thanks mattsc for all your help (on this thread and offline) in understanding how a new AI can be written in Wesnoth and how the AI functions in general.

So far I have come to the conclusion that simulating a sequence of moves (in a single turn or otherwise) is not possible with the current AI framework provided in Lua. It is only possible to evaluate the current state of the game and perform some actions (moves) on it, but it is not possible to copy over the current state, perform actions on that state, evaluate the final modified state and then perform the actual actions on the original state. This process is essential for any learning algorithm to work, in fact, it is essential for any algorithm which tries to 'look ahead' for instance, minimax. It is also essential to perform any sort of optimization algorithm to optimize actions of a single turn (recruitment or unit placement).

This has been my observation. mattsc did suggest to me in a pm that it might be possible to write a wrapper around the evaluation and execution functions of the candidate actions, and call them.. but i wonder if there is a possibility to save a copy of the game state and run these wrapper functions on that copy, in such a way that the standard wesnoth.<xyz> queries can be made on the copy. So far, I think not.

If someone thinks that there is a way to perform simulations like the way I want, please do let me know.

Cheers

neverEnough
Code Contributor
Posts: 1
Joined: June 27th, 2011, 1:08 am

Re: AI based on Deep Learning

Post by neverEnough » January 28th, 2018, 4:34 am

milwac wrote:[..]Another key feature of reinforcement learning is 'delayed reward'. When I said, killing the opponent's leader is the aim of an N v N game - the delayed reward in this case is the leader-kill. The algorithm is trained in such a way that all situations that can lead to a leader-kill are given high values and all situations that lead to such situations are also given high values and so on. Of course there can be good and bad moves in between and a match can be lost with a single good or bad move. Such patterns are accounted for by sampling a lot of possible directions probabilistically from a given state of the game and all those future states evaluated to determine whether the current state is a good state to be in, or not. As we know, Wesnoth has a a huge branching factor, and so how we deal with this is still up for discussion at this point. But the core idea is - yes, it is possible to assign intermediate rewards to any given state of the game and determine how good or bad it is for a said player.
Hi,
I'm not experienced but as player I can tell that any reward may be reduced to gold (aside from leader kill ofc). Collecting and holding villages has a numerical value as much as killing an unit which have costed some amount on the enemy side. On Isar's for example, detecting an easy kill and reaching it at turn 4-5, say a naga, will weight exactly -14 gold for the enemy ability to reach his future goal. Pretty much better than locking 2 units on 2 villages and increasing the total game income by .. mumble ..+4 i think. Not sure if opponent's upkeep factor should be considered, probably not.
Given this statement, everything leads to valuating the risk of -future- attacks (even from the first turn!), moves to reduce risks which doesn't exclude "don't move at all" and how much resources put in every plan which is matter of enough ai train.

An interesting point to me is the wesnoth rng and how tactical choices are rewarded in ai's learning since the same assumptions trough the game will lead to different results, so the ai growth should be less elastic than alphago or chess implementations. I would like to read more brainstormings.
Anyway, thx for trying it

scales
Posts: 2
Joined: September 20th, 2018, 10:31 am

Re: AI based on Deep Learning

Post by scales » September 20th, 2018, 10:52 am

Hi,

I had an idea to develop a learning AI for Wesnoth, and apparently was not the only one with this idea :)
milwac wrote:
January 2nd, 2018, 2:03 pm

So far I have come to the conclusion that simulating a sequence of moves (in a single turn or otherwise) is not possible with the current AI framework provided in Lua. It is only possible to evaluate the current state of the game and perform some actions (moves) on it, but it is not possible to copy over the current state, perform actions on that state, evaluate the final modified state and then perform the actual actions on the original state. This process is essential for any learning algorithm to work, in fact, it is essential for any algorithm which tries to 'look ahead' for instance, minimax. It is also essential to perform any sort of optimization algorithm to optimize actions of a single turn (recruitment or unit placement).

Cheers
Do you think it is necessary for Reinforcement Learning algorithms? From what I gathered, one would need to save the current state and action, continue for the next state, and then update the return value for the previous state. Would that be possible?

User avatar
milwac
Posts: 29
Joined: April 9th, 2010, 11:40 am

Re: AI based on Deep Learning

Post by milwac » September 20th, 2018, 11:46 am

scales wrote:
September 20th, 2018, 10:52 am
Do you think it is necessary for Reinforcement Learning algorithms? From what I gathered, one would need to save the current state and action, continue for the next state, and then update the return value for the previous state. Would that be possible?
What you are referring to is a play-through. Random play-throughs in Wesnoth would be impractical (if not impossible) to learn from as you'd probably need billions of games to even get basic strategies right. Random play throughs won't even work for chess or anything much more simpler than Wesnoth. Also random actions are very erratic - even if one were able to run billions of random games on a super computer - it is hard to reward a sequence of moves as 'good' or 'bad'. Sometimes a blunder after 20 good moves decides the game (towards a loss most probably) and all 20 good moves are scored as 'bad' moves.

So in order to make a great AI learn, you need to make it play as good as possible from what it has already learnt. Initially it plays random of course, but over the course of time it develops a 'value' function for the current state of the game given a bunch of inputs (gold, units, levels, positions of units, enemy units, leader safety and so on) which it then uses in a number of simulations starting from a given state. Then after the simulations are over, it picks the move with the best average 'value' arising from that move. After the game is over, the 'value' function can be improved using the moves of this game which was just played. This goes on for several games.

I am myself not an expert in the gory details of reinforcement learning algorithms, but AFAIK this is how algorithms like AlphaZero etc work. To put things into perspective :

Stockfish evaluated millions of positions from any given position before making a move while AlphaZero only evaluated about 80,000 positions. Still AlphaZero was able to beat Stockfish comprehensively. It's not about how many states you can see ahead, it's about which ones you 'choose' to see.

From my brief look into the AI framework 9 months or so ago, I didn't find any way to save the current Wesnoth state, and simulate playing the game from that state without actually playing the game. I'm sure this should be possible to implement, but I am not sure how much of an effort it'll be. At some stage however, if some AI giants were to take this up as a challenge (especially since general game playing AIs are here now and the DOTA2 AIs are beating the best humans already), I'm sure someone out there will be very interested that such a feature be made available in the API.

User avatar
milwac
Posts: 29
Joined: April 9th, 2010, 11:40 am

Re: AI based on Deep Learning

Post by milwac » September 20th, 2018, 12:30 pm

neverEnough wrote:
January 28th, 2018, 4:34 am
Hi,
I'm not experienced but as player I can tell that any reward may be reduced to gold (aside from leader kill ofc). Collecting and holding villages has a numerical value as much as killing an unit which have costed some amount on the enemy side. On Isar's for example, detecting an easy kill and reaching it at turn 4-5, say a naga, will weight exactly -14 gold for the enemy ability to reach his future goal. Pretty much better than locking 2 units on 2 villages and increasing the total game income by .. mumble ..+4 i think. Not sure if opponent's upkeep factor should be considered, probably not.
Anyway, thx for trying it
This is certainly a good first approximation for a reward function. Gold/Income/Upkeep in Wesnoth is equivalent to material advantage in games like Chess. Sometimes though there might be strategic 'sacrifices' - like a poisoned village which may be very dangerous to take. Also a leaderkill as a last ditch attempt at pulling off a sneaky win from a hopelessly lost position. Such situations need to be taken into account (the lookahead should be strong enough to see that the small loss of gold now leads to large gain later). I also feel gold + cost of total on field units is a good reward function for starters.

scales
Posts: 2
Joined: September 20th, 2018, 10:31 am

Re: AI based on Deep Learning

Post by scales » September 20th, 2018, 2:05 pm

milwac wrote:
September 20th, 2018, 11:46 am
scales wrote:
September 20th, 2018, 10:52 am
Do you think it is necessary for Reinforcement Learning algorithms? From what I gathered, one would need to save the current state and action, continue for the next state, and then update the return value for the previous state. Would that be possible?
What you are referring to is a play-through. Random play-throughs in Wesnoth would be impractical (if not impossible) to learn from as you'd probably need billions of games to even get basic strategies right. Random play throughs won't even work for chess or anything much more simpler than Wesnoth. Also random actions are very erratic - even if one were able to run billions of random games on a super computer - it is hard to reward a sequence of moves as 'good' or 'bad'. Sometimes a blunder after 20 good moves decides the game (towards a loss most probably) and all 20 good moves are scored as 'bad' moves.

So in order to make a great AI learn, you need to make it play as good as possible from what it has already learnt. Initially it plays random of course, but over the course of time it develops a 'value' function for the current state of the game given a bunch of inputs (gold, units, levels, positions of units, enemy units, leader safety and so on) which it then uses in a number of simulations starting from a given state. Then after the simulations are over, it picks the move with the best average 'value' arising from that move. After the game is over, the 'value' function can be improved using the moves of this game which was just played. This goes on for several games.
I did mean play-throughs, though not random ones. Actually I was thinking along the ideas you have outlined. This should be theoretically possible to implement, if there is a way of saving state-action pairs and associate values with them. I am not familiar with Wesnoth code though, so not sure if this is possible. I will take a look if I can find some time for this.

User avatar
Dugi
Posts: 4846
Joined: July 22nd, 2010, 10:29 am
Location: Carpathian Mountains
Contact:

Re: AI based on Deep Learning

Post by Dugi » September 21st, 2018, 7:21 am

My friend is developing a self-learning bot for Starcraft (the old one, that has a library for extracting game state from RAM) as part of his PhD thesis. They found strategy games to be particularly difficult to get create bots for because of the extreme variance of game states, there is an unlimited number of unit types, units have health in addition to that, the game area is quite large etc. The vector of game states is way greater than in chess or Go. They described the game state with a vector with 4096 values, I don't know what values he took there and I assume that he simply didn't deal with all units if there was too many of them. They used a Markov matrix-based learning system rather than a neural network (he was doing it in Java and it was too slow to use a neural network in real time, but maybe he was just a bad programmer). The resulting AI was beaten by noobs in fair games, but defeated all other bots they tried. The AI also made 100.000 actions per second while humans usually do far less, like 5. So this may be a really challenging task.

I think that comparisons with AlphaZero and the DotA bot might be irrelevant, because AlphaZero played a so many games that no human can play it in a thousand lifetimes. It didn't use logic or any other understandable approach, nobody saw any meaning in its moves until it won. I don't know much about DotA, but the bot who has beaten humans might have been using a build that can make good use of inhuman reactions and timing. It could be an analogy to those undefeatable bots in FPS games that always aim correctly for the head while running and using a shaky weapon.

This is no real time game and only one move is made at a time, so there's more time to compute but the computer can't make any use if its naturally better reactions and precision. Yet, I see a big problem at mapping the game state to the entrance layer of a neural network. Some neural networks can adjust its number of neurons while learning, but once learned, they cannot adjust. Maybe if the network was convoluted, with some sections repeating, it could handle maps of any size (by spreading out the repeating parts), but there's still a lot of other variance. But this would already require quite a special AI algorithm since typical AI learns from manually set results while this one would have to learn from playing against itself.

Post Reply