Ladder Council

Discussion of all aspects of multiplayer development: unit balancing, map development, server development, and so forth.

Moderator: Forum Moderators

User avatar
Aldarisvet
Translator
Posts: 836
Joined: February 23rd, 2015, 2:39 pm
Location: Moscow, Russia

Re: Ladder Council

Post by Aldarisvet »

iceiceice wrote: It just depends on exactly what conclusions you are looking to draw.
If you are looking to demonstrate "every factional matchup is balanced to +/- 5%, with 95% confidence" or something very scientific, then yes you would need a lot more data. If you flip a coin 100 times, the standard deviation is 10, so observing only 30 heads or as many as 70 heads is not statistically significant at that level, in regards to the assumption that the coin is fair.
I forgot all my university courses.
I've just checked. 'Standard deviation' is actually 5 for flipping a coin 100 times, not 10. Not much.
Binom_raspr_form_10.png
Binom_raspr_form_10.png (752 Bytes) Viewed 5325 times
But all that it means that in 68.2% cases the deviation must be within that +/- 5% deviation.
Standard_deviation_diagram_(decimal_comma).svg.png
Standard_deviation_diagram_(decimal_comma).svg.png (8.77 KiB) Viewed 5325 times
facebook.com/wesnothian/ - everyday something new about Wesnoth
My campaign:A Whim of Fate, also see it's prequel Zombies:Introduction
Art thread:Mostly frankenstains
User avatar
Horus2
Posts: 407
Joined: September 26th, 2010, 1:05 pm

Re: Ladder Council

Post by Horus2 »

tekelili wrote:@Aldarisvet: Your doubts are reasonables, but you are skiping most important factor: Players Skill Difference

No matter how many games are played, even with 1.000.000 games you could have totally biased data. If all those games were played between a very skilled player vs a noob, and they were always picking random faction and side, you would have totally balanced outcomes for factions and sides (about 50% vistory), despite map having some imbalances.
Aldarisvet wrote: Well, normally the situation that you discribed will just never happen.
But about a quarter of games would be with noobs vs noobs and this can distort data because some races are harder to play than others. Or some races are easier to play.
Given that rebels actually an easiest faction to play and possibly most beloved and popular in mainline campaigns it is possible that noobs would play better with rebels than with other factions (in battles noobs vs noobs).
Strongly disagree with these quotes.

I cannot press it hard enough that almost all players involved were big guns. Aside from me playing 65 times, there were 16 matches from Mint, 16 from abhijit, 13 Rigor, 9 khiM, 6 khorne-flakes, 5 The_Black_Sword, 5 RiceMuncher, 3 Caritas, 3 Computer_player, and in a side role, a couple more from some of the legends: amikrop1, Blop, d, Dauntless, Demogorgon, Dreadnough, gamelle, Ichorid, Janitor, Kira1, Kral, nelson, Nordmann, Oook, thefish, unicorn.
Even then, whenever they made bad plays, the rating got ruthlessly demoted to one star, or in rare cases, entirely left out of account. "Noob vs noob" replays simply do not exist in the archives.
Deviations from 50% are not necessarily the fault of the map, but the matchup itself. Sadly, the Knalga vs Undead win ratio presented here is what i consider "normal". I would not expect mappers to do the work of gods. I really wish if there were other statistics of other, maybe more conservative maps to compare to, but currently, Ruphus Isle is the most methodically evaluated map of the game (at least i do not know about such developer manifestos for core multiplayer maps).

But the most important thing to mention is how pointless 1000 replays are in themselves. Not only it is an absurdly big number and a sink of time and energy (once again, amassing replays for these stats took four years, which is ridiculous, but hopefully ever since allowing it for ladder, we will get more feedback to crunch, in a more natural way), even if i filter out the worst, it is little more than an obsession for hard data. Statistics like this should never be taken more than a loose guide and a support for theories plus good old-fashioned human logic. If anyone previously would had argued that Loyalists are probably inferior here, then we would have something to support his claims. I consulted with Mint yesterday and asked what did he expect to bottomscore, and he replied "anything but Loyalists". We will surely look into the roots of it in the future, but i think we have the right to consider it a statistical anomaly, for the time being. And if any further changes will be done to the map, i will not increase terrain density for sure.

Collecting data was far from useless, though, as these stats are more than enough to debunk some of the myths made by sceptics:
  • Either side has an advantage (48-52 is pleasantly even)
  • Dominant chaotic rushes, the original reason why Hornshark Isle was imbalanced (no strong correlation between Northerners and Undead performance, as the latter is only 4th)
  • Factions with flying/water units too powerful (again, no correlation, two factions with water units compete with two factions with flying units for the first place)
  • Once one lost 2 villages, it is not possible to counterattack (the opposite proven by massive amount of examples, it is just that unlike in a classic dual-front map, you have to retreat to the middle; also numerous lawful victories were achieved between the period of turn 9 and 12)
User avatar
tekelili
Posts: 1039
Joined: August 19th, 2009, 9:28 pm

Re: Ladder Council

Post by tekelili »

tekelili wrote:
Horus2 wrote:Strongly disagree with these quotes.
If you can submit data that reinforce your experiment outcomes, then you should apologize for not provide them in first instance, instead show disapoiment with reclamations about valid experiment enverioment description ;)
Be aware English is not my first language and I could have explained bad myself using wrong or just invented words.
World Conquest II
User avatar
Horus2
Posts: 407
Joined: September 26th, 2010, 1:05 pm

Re: Ladder Council

Post by Horus2 »

tekelili wrote:
tekelili wrote:
Horus2 wrote:Strongly disagree with these quotes.
If you can submit data that reinforce your experiment outcomes, then you should apologize for not provide them in first instance, instead show disapoiment with reclamations about valid experiment enverioment description ;)
I am not disappointed. :lol: But i stated very clearly that every single replay had to pass a quality control and get weighted, as Velensk above also pointed it out.
User avatar
iceiceice
Posts: 1056
Joined: August 23rd, 2013, 2:10 am

Re: Ladder Council

Post by iceiceice »

Aldarisvet wrote:
iceiceice wrote: It just depends on exactly what conclusions you are looking to draw.
If you are looking to demonstrate "every factional matchup is balanced to +/- 5%, with 95% confidence" or something very scientific, then yes you would need a lot more data. If you flip a coin 100 times, the standard deviation is 10, so observing only 30 heads or as many as 70 heads is not statistically significant at that level, in regards to the assumption that the coin is fair.
I forgot all my university courses.
I've just checked. 'Standard deviation' is actually 5 for flipping a coin 100 times, not 10. Not much.
Binom_raspr_form_10.png
Yes, you are right, I was being sloppy.

The standard deviation is the square root of variance. And the variance of a sum of independent random variables is the sum of their individual variances, so to find the variance for 100 coins we just need to find it for one coin. A single coin is either heads or tails, so lazily, the variance is obviously at most 1, since the value is always between 0 and 1, that's what I used :) That leads to upper bound of 100, and for standard deviation, an upper bound of 10. But that's not the tight bound -- the expectation of the coin is 1/2, so the distance from the mean, in both cases 0 and 1, is always 1/2 actually. The squared distance from the mean is thus 1/4, so that's the correct variance for a single coin. So the variance for n coins is actually n/4, and the standard deviation is 1/2 sqrt(n), so half what I said.
Aldarisvet wrote:
iceiceice wrote: But all that it means that in 68.2% cases the deviation must be within that +/- 5% deviation.
Standard_deviation_diagram_(decimal_comma).svg.png
It means more than that though, as long as the distribution is "near gaussian". A common thing that people like doctors will have to know is that "95% of the data is within two standard deviations of the mean" -- this is a pretty common rule of thumb for people who have to actually look at statistics to do things. More generally, the probability that a random data point is more than `k`deviations from the mean is less than a function on the order `e^{-k^2}`, which is vanishingly small for even modest k. People sometimes talk about a "6 sigma" event in finance or whatever as being a once-in-a-lifetime or once-in-several-decades deviation. I guess more recently in popular culture people talk about "black swan events".

You might be very skeptical of the idea that assuming that some distribution is gaussian or near gaussian is reasonable -- it's very convenient and seems hard to justify. I used to be quite skeptical of such claims myself when I was much younger. However it turns out that usually it's an extremely reasonable assumption. Basically whenever you are adding many random numbers together, as long as the random numbers are typically similar in magnitude, and they are "mostly" independent of one-another, the sum must tend extremely quickly towards a gaussian distribution -- it's just a fundamental mathematical phenomenon. Whenever you add two distributions A and B which are independent, the result tends to be smoother than both A and B in some sense, and it turns out that the maximally smooth distribution under additive perturbations is the Gaussian.

Many people in school learn something called a "Central Limit Theorem" which shows this in some formal sense when adding copies of a given distribution, usually in terms of Levy-distance of distributions. But if you are mainly interested in Large Deviations bounds, which is usually what is most useful, then actually a much stronger and simpler bound is due to Bernstein and now called the "Chernoff bound". Chernoff's bound, and many powerful derivatives of it, are extremely important in modern theoretical computer science and discrete mathematics. They are extremely useful for analyzing failure probabilities of algorithms, constructions of data structures and routing schemes, or more generally in combinatorics and explicit constructions. Usually the assumption of researchers is that if you are adding a bunch of random variables together that "look like" a chernoff bound might apply because there is "no clear reason" that they should be highly correlated, the assumption is that the Chernoff bound is indeed true and it's just a technical challenge to figure out how to prove it. I proved a quite strong bound of this kind once and submitted it as part of a paper which was accepted to a tier one theory conference a few years ago: http://eccc.hpi-web.de/report/2012/042/

:geek:

So my general intuition here, anyways, is that even if the games being played are not completely statistically independent, i.e., even if sometimes the same player played in some of the games, even if the players are talking to eachother about the games, etc., you should expect a nearly-Gaussian distribution to emerge very quickly, and the "95% of the data within 2 standard deviations of the mean" assumption and all related assumptions are quite reasonable, IMO.
User avatar
Pentarctagon
Project Manager
Posts: 5561
Joined: March 22nd, 2009, 10:50 pm
Location: Earth (occasionally)

Re: Ladder Council

Post by Pentarctagon »

I don't pretend to understand most of the statistical stuff, but it would be interesting to see an apples-to-apples comparison as it were. Which factions "noobs" are more likely to win/lose with vs "pros".
99 little bugs in the code, 99 little bugs
take one down, patch it around
-2,147,483,648 little bugs in the code
User avatar
Aldarisvet
Translator
Posts: 836
Joined: February 23rd, 2015, 2:39 pm
Location: Moscow, Russia

Re: Ladder Council

Post by Aldarisvet »

@iceiceice

I cant go too far in this matter without brain stress, I am an economist, not a mathematician, and my current work is quite far from a statistical analysis.
I just wanted to remember at least something from my 10-year ago education, how we can judge if some deviation is saying that this is really abnormal or it is just a result of randomness.
So 58% winrate for rebels is between sigma and two sigmas interval and its probablity is 13.6%. So this can happen even if rebels do not have a real advantage on the map.
And 35.29% win rate for loyals is conditionally on the 3 sigmas threshold from 50%. Because deviation from 50 is almost 3*5=15. And between 3 sigmas and infinity (actually 10 sigmas is a maximum deviation here) we have just 0.1% probability so this deviation could really say something (say that the map is really unbalanced against loyals), but still, with that quite low probability can be just a result of randomness.
However, we have too small selection to judge. Probably I was not right about 1000 games in total, but 100 games for every faction would be good from statistical point of view.
I just trying to write simple things so everyone could really understand simple criterias to judge if some winrate is saying about unbalance or not. I hope I did it right.
facebook.com/wesnothian/ - everyday something new about Wesnoth
My campaign:A Whim of Fate, also see it's prequel Zombies:Introduction
Art thread:Mostly frankenstains
User avatar
Mint
Posts: 159
Joined: January 22nd, 2011, 9:29 am
Location: Location Location Location

Re: Ladder Council

Post by Mint »

SigurdFireDragon wrote:With Sandbox Map Picker, I noticed that Rime Grotto and Hellhole isn't in any set. And Cynsaun Battlefield is in both 'Adventurous & Conservative' and 'Museum' Also that 'random start time' isn't checked by default. Are these intentional?
Hi, sorry it took so long for a reply, I thought I had replied already but it seems I forgot to.

Rime Grotto is currently not in a set as it is unfinished. Currently the plan with it is to have an income change at some point but that may not ever work, so it might just become a regular map. Hellhole is complete, however we are yet to vote on where it goes.

Cynsaun Battlefield was only meant to be in the museum, and I fixed it when I saw your message. Thank you for reporting it.

Random start time is not checked by default as all the maps are balanced for a set time of day. For most maps this is dawn, but for a few maps the starting time is either pushed forward or pushed backward by one element in the cycle (for example Fallenstar Lake starts at second watch instead of dawn). So yes this is intentional.

Thanks for the feedback! :)
User avatar
Horus2
Posts: 407
Joined: September 26th, 2010, 1:05 pm

Re: Ladder Council

Post by Horus2 »

Sometimes the greatest opponent on a map is yourself. This was the case when i played against someone who found an exploit on Hellhole, and the circumstances allowed him to demonstrate it first-hand. Good laughs were had. :lol:
Obviously the map has been updated since, but the replay remains as a proof that being able to think outside the box is a valuable skill, and not just an overused phrase.

In the following days we are going to vote on the inclusion of Hellhole. If you already saw it in play and you liked it, wish the map good luck! :)
User avatar
Horus2
Posts: 407
Joined: September 26th, 2010, 1:05 pm

Re: Ladder Council

Post by Horus2 »

Hereby i summon Party Skeleton to aid us in celebrating the successful inclusion of Hellhole to the adventurous category. The updated version of the Sandbox Map Picker is ready to be downloaded.

Happy inclusion-day!

Image
User avatar
Horus2
Posts: 407
Joined: September 26th, 2010, 1:05 pm

Re: Ladder Council - Pyrennis gfx change

Post by Horus2 »

The Walls of Pyrennis is a fast-paced and well-received ladder map, but no one complains openly about its obtrusive flaw: readability. Because it uses the same snowy orcish keep tiles for the impassable walls and the recruiting purposes, it is very hard on the eye, so the first impressions of the players are not overly positive.
Mint and i both came up with possible graphical changes. We believe the orcish keep for walls is neat, but the actual keeps have to be redesigned. Dreadnough suggested to make it fully encampment keep. Our ideas can be seen in the spoiler below:
big image:

This is an open voting for everyone who knows the map. On which change would you cast your vote on? Please bombard us with feedback! (i cannot create a poll, so we have to work with comments)
Also, if you have a different suggestion, feel free to post it.
Last edited by Horus2 on April 14th, 2016, 6:20 pm, edited 1 time in total.
Dreadnough
Posts: 63
Joined: March 7th, 2010, 1:01 pm

Re: Ladder Council

Post by Dreadnough »

Hey,

Actually I am not sure if I like either of them.

Mint´s castle design works for me, but the keep really looks out of place compared with the rest of the map. So I can not really vote for it.
Horus2´s design works for me, the keep looks fine and not really out of place. But, the fence of the castle and keep is of a different design.

I think a combination of both ideas works best. Take the castle from Mint and the keep from Horus2. This way the fence works and the castle/keep looks different from the impassable terrain.
Ben24626
Posts: 59
Joined: April 8th, 2015, 1:07 am

Re: Ladder Council

Post by Ben24626 »

I like Mint's cause it's easy to tell the difference between impassable and keep.
User avatar
Elder2
Posts: 405
Joined: July 11th, 2015, 2:13 pm

Re: Ladder Council

Post by Elder2 »

I vote for Mint's version, i think it looks the best.

Btw i have no idea why loyals' win ratio is so bad, its pretty easy to do well with them if you survive the initial rush from my experience, especially vs some factions like rebels
User avatar
Eagle_11
Posts: 759
Joined: November 20th, 2013, 12:20 pm

Re: Ladder Council

Post by Eagle_11 »

Hi, i liked the Grand Houde Mountain map so much made some changes to it:
2pMtHoudeAltMap.png
2p_GrandHoudeAlt.cfg
(15.7 KiB) Downloaded 400 times
Post Reply