GSoC project idea: measure how well units are balanced

Brainstorm ideas of possible additions to the game. Read this before posting!

Moderators: Forum Moderators, Developers

Forum rules
Before posting a new idea, you must read the following:
Post Reply
Telcontar
Posts: 56
Joined: April 4th, 2013, 1:55 am

GSoC project idea: measure how well units are balanced

Post by Telcontar » November 21st, 2014, 6:05 am

Hi all,
I have posted this message before on the developers' list a few days ago, but maybe some people only look at this forum, so I post an edited version here:

During the Google Summer of Code (GSoC) reunion I had a project idea. It was based on the fact that it is difficult to fine-tune the strength of the units.
For example, recently the Footpad and the Dwarvish Lord have been weakened a bit from their original spec.

I wonder if some automatic analysis can show us units that are potentially too strong/too weak? Of course humans would make the final decision.

I see a couple of ways to analyze the strength of units automatically:

(1) The Great Tournament:

The idea is to have units stand right next to each other and fight to the death. This is done for each possible terrain a unit can stand on.
The result (chance of survival) gives us an indication of the relative strength of a unit against another one.
To be fair, the results have to be weighed with the frequency a unit appears in a scenario using that terrain.

(a) Duels (1 on 1 fights).

This is the most straightforward version. We would of course run this outside the Wesnoth engine, without any GUI and animations. For each
possible terrain tile, a battle would be generated for each possible pairing of units. Of course each battle would have to be run twice
(each unit comes first in one battle). Each round would involve the best attack, and typically generate a histogram of outcomes. For each
of the possible outcomes in that histogram the next round would again generate a histogram; histograms can be merged after each round.

This would then give us a percentage of cases where each unit eventually survives or dies. For example, if we take an Iron Mauler on
a castle tile vs. a Shaman in water, the outcome is probably 99%/1%. Maybe on a different setting, the Shaman has 2 % survival. :-)
This shows us that the Shaman is a weak fighter (which is obvious but automating this shows us such stats for each pairing of units).

(b) Group battles (one unit each from a faction?)

Maybe something similar can be done for a group of units, fighting in a hedgehog formation against each other. However, the number of tile
combinations is very big then (maybe all units of one faction would stand on the same type of tile each time). We would also have to move
units when a unit dies, to continue the fight.

(c) Weighting.

Dwarvish units do poorly in forests, unlike elves; but dwarves appear often in scenarios with mountains and caves, so the result of a dwarf
fighting on such terrain would be given much greater weight. This will avoid giving the impression that dwarves are weak because they have
trouble in water/forests, etc.

(2) The Great Race:

Similar idea, but the idea is to reach a far-away point across lots of tiles (again not using the GUI). The tiles would again by weighted by
appearances of a given unit in maps with a given tile. We can then count how many turns it takes for a unit to cross, say, 1000 tiles.

(3) Result:

We can compare the overall scores of (1) and (2) against the recruitment and XP costs. Maybe we can see a trend? Maybe we can get a
better intuition how much special abilities like heal, poison, ambush, charge, etc. are worth?

I think it would be interesting to get some insight on whether we can find possibly overpowered/underpowered units in such a way. It would give us a way to express this numerically, which could help to balance new units against existing ones.

In terms of programming, I think this is very feasible for a summer project:

(1) Extraction of unit stats.

(2) Extraction of map stats for all built-in maps and campaigns (together with the recruit list). This may be tricky for cases where
the recruit list in a campaign changes over time. Maybe the WML can be analyzed for such changes?

(3) Implementing the virtual fights/races. Perhaps the Wesnoth engine can be used in that way, as long as we don't use the GUI for millions
of simulations, as that would be too slow.

(4) Analysis/visualization of the results. This may be non-trivial as there are hundreds of units so this generates a lot of data.

User avatar
tekelili
Posts: 1038
Joined: August 19th, 2009, 9:28 pm

Re: GSoC project idea: measure how well units are balanced

Post by tekelili » November 21st, 2014, 6:20 am

I have no deeply thought about all your idea profits, but my first impression is that you could get more "depurated data" asking experienced players that usually play World Conquest. That campaign offers a lot units pool to join players army and force them to fight very mixed enemy armies. As recall plays a mayor role, players use to try "abuse" from overpowered units making them as higher fraction of their army as they can. It also looks a better field to meassure the "strength" of units with specials as heals, slow or poison. Just my 2 cents.

Edit: Also note that from human vs human perspective, Orcish Grunt is an overpowered unit due to his HP-ZOC/gold ratio and village stealing threat. That would be also pretty hard to measure in an emulation.
Be aware English is not my first language and I could have explained bad myself using wrong or just invented words.
World Conquest II

User avatar
ancestral
Developer
Posts: 1108
Joined: August 1st, 2006, 5:29 am
Location: Motion City

Re: GSoC project idea: measure how well units are balanced

Post by ancestral » November 21st, 2014, 1:54 pm

Telcontar, I thought hard about this, because I don’t think this is anything that has been explored holistically. What I mean to say is, you need lots of actual game results before you can prove any points.

Also, before beginning, we need to define what “imbalanced” is.
  • Is a unit that is chosen most or least often by human players perhaps imbalanced?
  • Is a unit that deals more damage imbalanced?
  • Is a unit that is able to kill more often imbalanced?
  • Is a unit that costs less and levels up faster imbalanced?
I think many people will easily say it depends. Perhaps it’s natural to have more fighters than mages. Some units offer higher risk vs. reward which is expected. Others are intended purely in situational usage. Many choices may depend even on play style. There are other factors, too: one’s skill level with a faction, the number of opponents, whether the opponents are human or AI, the type of factions a player is facing, the makeup and size of the map, the time of day the game starts with, the timer length… it's impossible to account for everything.

So what if we neutralized or minimized most of these variables, or just average them over a wide array of situations, what then? The question we need to ask is, why should we choose one unit over all others? or what is the importance of the unit’s relationship to the faction? This should tell us if a unit is imbalanced. How do we determine this value? Let’s see if this problem has been close to being solved before in the real world…

In baseball, individual statistics have been kept for well over 100 years. Batting average, or the percent a player can hit a ball with a bat, is a common, individual stat, and an important one, too: in order to win a game you need more runs that the opponent; to score a run, you need to get a hit.1 But although hitting is important, it's just one part of baseball, right? Pitching, fielding, running, even staying healthy are just some of the other helpful attributes for a player, and most people aren’t and cannot be great at everything. So players play to their strengths, and that creates roles (a power hitter, a slap hitter, a relief pitcher, a utility fielder, etc.)

All right, so in Wesnoth, all units aren’t equal, and each unit plays a role too. How can we gauge a unit’s power, or importance to its team? If instead of recruiting that fighter you recruited any other unit, would you fare better or worse? To answer that question, we turn back to baseball.

WAR, or Wins Above Replacement, is a relatively new value that baseball writers and fans use that attempts to define a player’s contribution to the team. It’s shown to provide very strong correlation to a team’s actual win-loss record. There isn’t one standardized formula yet, but the variations all use the generation and prevention of runs as the metric to define the value (runs from hitting, running and fielding). How was the formula created? From going over baseball data from professional baseball seasons, then fine-tuned per position role.

So let’s say each role is a unit, hits are damage and runs are kills. If you can gather enough data, then you can find, per unit, which are more valuable to a faction; and from that, find which units are overpowered and which are underpowered. (Again, this need not be perfect; we’re not going after perfection, which is practically impossible. We’re simply trying to find correlation.)

Once you’ve concluded which units are underpowered or overpowered, then maybe some suggestions can be made on numbers to tweak. Most likely it’s going to take some experimentation, repeated computations and plenty more data to get better.

Where do you find the data? I’m not sure we really have enough from players. Perhaps a place to start is by simply running simulations with AI players. (It may be that AI and human behavior are correlated, and it may be they’re not, but at least it’s a starting point. I still question whether the AI is equipped well to work with all the units, but there’s only one way to find out.) Keep track of helpful stats — off the top of my head, maybe things like team wins, ties and losses, times in combat, defense bonus, terrain bonus, resistance, kills, damage dealt, damage possible, villages captured, turns, turns after death, experience and healing points per unit. (You might need to record these using an add-on.) Run 100 simulations of 2 vs. 2 for each of the 16 two-player built-in maps, rotating through each of the 21 49 different faction combinations. (Maybe put a turn limit of 30 on them?) That should give you a lot of data to analyze.

1 Ignoring walks, hit-by-pitches and other exceptional hitting actions.
Last edited by ancestral on November 21st, 2014, 4:31 pm, edited 1 time in total.
Wesnoth BestiaryPREVIEW IT HERE )
Unit tree and stat browser
CanvasPREVIEW IT HERE )
Exp. map viewer

User avatar
tekelili
Posts: 1038
Joined: August 19th, 2009, 9:28 pm

Re: GSoC project idea: measure how well units are balanced

Post by tekelili » November 21st, 2014, 2:12 pm

ancestral wrote:Where do you find the data? I’m not sure we really have enough from players. Perhaps a place to start is by simply running simulations with AI players.
Problem I see to these kind of ideas (even when I see them interesting) is that being realistic, BfW AI performance is near to computer chess playing at 1960... and you can not expect learn about chess from such "intellegence".

In other words, AI will never tell you how good are units as healers, as it never rotate front units an reagroup for healing.
Be aware English is not my first language and I could have explained bad myself using wrong or just invented words.
World Conquest II

User avatar
Wintermute
Inactive Developer
Posts: 840
Joined: March 23rd, 2006, 10:28 pm
Location: On IRC as "happygrue" at: #wesnoth-mp

Re: GSoC project idea: measure how well units are balanced

Post by Wintermute » November 21st, 2014, 3:11 pm

Telcontar wrote:Hi all,
I have posted this message before on the developers' list a few days ago, but maybe some people only look at this forum, so I post an edited version here:

During the Google Summer of Code (GSoC) reunion I had a project idea. It was based on the fact that it is difficult to fine-tune the strength of the units.
For example, recently the Footpad and the Dwarvish Lord have been weakened a bit from their original spec.

I wonder if some automatic analysis can show us units that are potentially too strong/too weak? Of course humans would make the final decision.
Hi Telcontar, we may have talked about this over breakfast a bit? :D

I only have time for a short reply, so briefly: units these days are balanced (mostly) by multiplayer faction, rather than against each other. Thus units fill a roll in the faction. The poacher is not the best archer when compared one on one to other archers, but he IS the best unit (aside from those with weak armor) that the Knalgan Alliance can put in a forest hex. As such, it's not ever going to do well compared one on one with other units but it serves an important purpose within the faction. This complicates automated testing for balance.

In fact, I proposed a GSoC project for last summer that I thought *would* help quite a bit with both general balance and also for specific UMC authors to assess balance in their add-ons. Basically, we would be taking the replays that we already archive and making them available for data analysis purposes, somewhat like what you are writing about above. We had a student who was interested in it, and wrote a proposal here. Unfortunately (for us!) he chose to work on a project for another organization, but I will make a similar proposal next year and I hope we will have another good applicant for it!

EDIT: Fixed link.
"I just started playing this game a few days ago, and I already see some balance issues."

User avatar
zookeeper
WML Wizard
Posts: 9739
Joined: September 11th, 2004, 10:40 pm
Location: Finland

Re: GSoC project idea: measure how well units are balanced

Post by zookeeper » November 21st, 2014, 3:29 pm

Indeed, there is no way to produce any meaningful information on balance by analyzing unit stats and/or AI matches.

What might be useful would perhaps be something like analyzing all matches (excluding special scenarios etc etc) played on the MP server and checking if there's any units which are predominantly ignored by the winning side. It wouldn't say anything about faction balance as such, but it could help identify units which don't see much use (for whatever reason), so that in case the MP devs would like those units to be used more commonly (to make gameplay with/against that faction more varied), they could come up with some tweak to make them more preferable. But I wouldn't know whether the MP devs would actually find data like that that useful or not.

User avatar
Wintermute
Inactive Developer
Posts: 840
Joined: March 23rd, 2006, 10:28 pm
Location: On IRC as "happygrue" at: #wesnoth-mp

Re: GSoC project idea: measure how well units are balanced

Post by Wintermute » November 21st, 2014, 4:19 pm

zookeeper wrote:Indeed, there is no way to produce any meaningful information on balance by analyzing unit stats and/or AI matches.

What might be useful would perhaps be something like analyzing all matches (excluding special scenarios etc etc) played on the MP server and checking if there's any units which are predominantly ignored by the winning side. It wouldn't say anything about faction balance as such, but it could help identify units which don't see much use (for whatever reason), so that in case the MP devs would like those units to be used more commonly (to make gameplay with/against that faction more varied), they could come up with some tweak to make them more preferable. But I wouldn't know whether the MP devs would actually find data like that that useful or not.
Just so. One of the hopes I had for the GSoC project above was if he had time to also add enough of an interface to allow UMC folks (and anyone!) to pull pull up just kind of data. On a certain map with these two factions, what is the breakdown of the units that are recruited? How often does player one as this faction win? Does it change much if they are player 2? Does recruiting or not recruiting a particular unit have a noticeable impact on win percentages? Some of the same kinds of questions it seems you are getting at with the WAR stuff, ancestral. Once we have a database for the replays we are already archiving now then we can start to get answers to this type of question. I think that would be useful (or at the very least interesting!) for developers and players alike.

I think a more data driven approach along the lines of what Telcontar and ancestral are saying would be quite welcome but it will be much more useful for balancing to use human data. That said, the same data might be really important to the AI folks when trying to figure out how to improve the AI - given that the AI is still trying to keep up with high level human players, the more information we can give it about how those humans play the better. With regard to the sample size, we have years worth of saved replays from games on the server. One issue is that those are spread across different versions of the game - and stats change over time... Running simulations to bulk up the sample would be a great idea except that tekelili makes valid points about it's usefulness in practice. Still, over time and looking at specific win (or completion rates for some add-ons) rates on specific maps would be quite helpful, as would data on unit recruitment overall.
"I just started playing this game a few days ago, and I already see some balance issues."

Telcontar
Posts: 56
Joined: April 4th, 2013, 1:55 am

Re: GSoC project idea: measure how well units are balanced

Post by Telcontar » November 25th, 2014, 11:23 pm

Wintermute wrote:
zookeeper wrote:Indeed, there is no way to produce any meaningful information on balance by analyzing unit stats and/or AI matches.

What might be useful would perhaps be something like analyzing all matches (excluding special scenarios etc etc) played on the MP server and checking if there's any units which are predominantly ignored by the winning side.
I think a more data driven approach along the lines of what Telcontar and ancestral are saying would be quite welcome but it will be much more useful for balancing to use human data. That said, the same data might be really important to the AI folks when trying to figure out how to improve the AI - given that the AI is still trying to keep up with high level human players, the more information we can give it about how those humans play the better.
Hi Wintermute,
I think you have some good points about using human-vs-human match data. In any case, the two approaches seem complementary:
  • Human-vs-human matches give us better information on the actual strength of a unit, but they do not give us much on why player prefer some units over others. We just get a result (for example "berserkers, archers and mages are not recruited as often as normal fighters") without the reason ("these units need other units to protect them").
  • Simple statistics give us an insight on certain attributes (battle prowess, speed) of a unit but they do not take into account formations, or special abilities. However, the "gap" in the calculated "value" may help is to explain what healing etc. is actually "worth". In a sense, the simplicity of the analysis helps us to understand the result better.
I think the second approach may be particularly useful to help us gauge the strength of the non-mainline factions, which are not tested as much as mainline.

In a sense, I see the first approach (data from human matches) as a sort of "top-down" analysis, while the second one (a more straightforward and systematic automated comparison) as "bottom-up".
We could advertise both as potential GSoC ideas, and limit ourselves to carrying out one of them. I am OK with giving preference to analyzing match data if strong proposals exist on either topic.

However, I see one more issue with match data: Is it clear to the player that their data is stored on a server? I can imagine some players having gripes with that if they didn't know about it beforehand.

User avatar
Wintermute
Inactive Developer
Posts: 840
Joined: March 23rd, 2006, 10:28 pm
Location: On IRC as "happygrue" at: #wesnoth-mp

Re: GSoC project idea: measure how well units are balanced

Post by Wintermute » November 26th, 2014, 1:09 am

Telcontar wrote: I think the second approach may be particularly useful to help us gauge the strength of the non-mainline factions, which are not tested as much as mainline.

In a sense, I see the first approach (data from human matches) as a sort of "top-down" analysis, while the second one (a more straightforward and systematic automated comparison) as "bottom-up".
We could advertise both as potential GSoC ideas, and limit ourselves to carrying out one of them. I am OK with giving preference to analyzing match data if strong proposals exist on either topic.
I would welcome more data of all types to be available to be analysed, sure. :D If I had to choose I would have a strong preference toward human-human games, though for others they might want other things. UMC authors, for example would surely be interested in human-AI games if it's their content!
Telcontar wrote:However, I see one more issue with match data: Is it clear to the player that their data is stored on a server? I can imagine some players having gripes with that if they didn't know about it beforehand.
Yes and no. Since we don't require registration on the server we also don't really make users read stuff first. However, that information is available in our Code of Conduct, so it *could* be known to any user who cares to look.
"I just started playing this game a few days ago, and I already see some balance issues."

Telcontar
Posts: 56
Joined: April 4th, 2013, 1:55 am

Re: GSoC project idea: measure how well units are balanced

Post by Telcontar » November 27th, 2014, 2:28 am

Wintermute wrote:
Telcontar wrote: In a sense, I see the first approach (data from human matches) as a sort of "top-down" analysis, while the second one (a more straightforward and systematic automated comparison) as "bottom-up".
I would welcome more data of all types to be available to be analysed, sure. :D If I had to choose I would have a strong preference toward human-human games, though for others they might want other things. UMC authors, for example would surely be interested in human-AI games if it's their content!
Telcontar wrote:However, I see one more issue with match data: Is it clear to the player that their data is stored on a server? I can imagine some players having gripes with that if they didn't know about it beforehand.
Yes and no. Since we don't require registration on the server we also don't really make users read stuff first. However, that information is available in our Code of Conduct, so it *could* be known to any user who cares to look.
We can perhaps wait and see what direction students prefer for a summer project.

As for automatically uploading the replay, it may not be completely obvious to players that matches that allow observers are also archived.

This is not directly related to this topic, but has it been discussed whether "Archive replays on server" should be made a separate option or not? Some players may not want their tactical blunders to be preserved for eternity, so I think giving them control over this is a good idea.

I also think there are cases where someone may use either combination of options:
  • Allow observers and archive replays (current behavior).
  • Allow observers but don't preserve replay.
  • Private game and no replay (current behavior).
  • Private game but after the game is over a copy is archived.
If this has not been discussed yet, I will create a new post.

User avatar
Wintermute
Inactive Developer
Posts: 840
Joined: March 23rd, 2006, 10:28 pm
Location: On IRC as "happygrue" at: #wesnoth-mp

Re: GSoC project idea: measure how well units are balanced

Post by Wintermute » November 27th, 2014, 1:11 pm

Telcontar wrote:As for automatically uploading the replay, it may not be completely obvious to players that matches that allow observers are also archived.

This is not directly related to this topic, but has it been discussed whether "Archive replays on server" should be made a separate option or not? Some players may not want their tactical blunders to be preserved for eternity, so I think giving them control over this is a good idea.

I also think there are cases where someone may use either combination of options:
  • Allow observers and archive replays (current behavior).
  • Allow observers but don't preserve replay.
  • Private game and no replay (current behavior).
  • Private game but after the game is over a copy is archived.
If this has not been discussed yet, I will create a new post.
Here is the privacy section currently on the wiki.
MP Code of Conduct wrote:Generally speaking
Your own client will not track your activity, other than via save games, the preferences file, or other files in the userdata directory, nor will it collect any personally identifiable information.
Any official wesnoth server will generally log all transactions in some form, and tie them to your IP address.
Furthermore
Any public chats made in the MP Lobby are logged publicly.
Any public chats made in an MP game are logged in the publicly available replay of that game.
Any private messages, sent to a single player using /msg, are sent only to the recipient and not logged.
Any game which is created with "observers" checked is considered public, and the full replay is made available.
If "observers" is unchecked, then the game is considered private, and no public replay is saved.

A public log is made when a player logs in or leaves any of the wesnoth servers.
Your preferences file (stored on your computer) contains a list of recently used nicknames, and recently used servers.
Emphasis mine. If that doesn't answer your question or you would like to discuss changes then a new post would be best.

EDIT: I misread you a bit, It looks like a new post for the topic would be best.
"I just started playing this game a few days ago, and I already see some balance issues."

User avatar
Xudo
Posts: 561
Joined: April 3rd, 2009, 5:26 pm

Re: GSoC project idea: measure how well units are balanced

Post by Xudo » November 28th, 2014, 6:48 pm

I think the approach, which BTree has used in AI development, might work here too. One difference is: he had used behavioral tree to test algorithm while here this method will be used to test unit stats in similar AI matchups.

Post Reply