Ladder Council
Moderators: Forum Moderators, Developers
 Aldarisvet
 Translator
 Posts: 755
 Joined: February 23rd, 2015, 2:39 pm
 Location: Moscow, Russia
Re: Ladder Council
I forgot all my university courses.iceiceice wrote: It just depends on exactly what conclusions you are looking to draw.
If you are looking to demonstrate "every factional matchup is balanced to +/ 5%, with 95% confidence" or something very scientific, then yes you would need a lot more data. If you flip a coin 100 times, the standard deviation is 10, so observing only 30 heads or as many as 70 heads is not statistically significant at that level, in regards to the assumption that the coin is fair.
I've just checked. 'Standard deviation' is actually 5 for flipping a coin 100 times, not 10. Not much. But all that it means that in 68.2% cases the deviation must be within that +/ 5% deviation.
facebook.com/wesnothian/  everyday something new about Wesnoth
My campaign:A Whim of Fate, also see Zombies:Introduction single map campaign
Art thread:Mostly frankenstains
My campaign:A Whim of Fate, also see Zombies:Introduction single map campaign
Art thread:Mostly frankenstains
Re: Ladder Council
tekelili wrote:@Aldarisvet: Your doubts are reasonables, but you are skiping most important factor: Players Skill Difference
No matter how many games are played, even with 1.000.000 games you could have totally biased data. If all those games were played between a very skilled player vs a noob, and they were always picking random faction and side, you would have totally balanced outcomes for factions and sides (about 50% vistory), despite map having some imbalances.
Strongly disagree with these quotes.Aldarisvet wrote: Well, normally the situation that you discribed will just never happen.
But about a quarter of games would be with noobs vs noobs and this can distort data because some races are harder to play than others. Or some races are easier to play.
Given that rebels actually an easiest faction to play and possibly most beloved and popular in mainline campaigns it is possible that noobs would play better with rebels than with other factions (in battles noobs vs noobs).
I cannot press it hard enough that almost all players involved were big guns. Aside from me playing 65 times, there were 16 matches from Mint, 16 from abhijit, 13 Rigor, 9 khiM, 6 khorneflakes, 5 The_Black_Sword, 5 RiceMuncher, 3 Caritas, 3 Computer_player, and in a side role, a couple more from some of the legends: amikrop1, Blop, d, Dauntless, Demogorgon, Dreadnough, gamelle, Ichorid, Janitor, Kira1, Kral, nelson, Nordmann, Oook, thefish, unicorn.
Even then, whenever they made bad plays, the rating got ruthlessly demoted to one star, or in rare cases, entirely left out of account. "Noob vs noob" replays simply do not exist in the archives.
Deviations from 50% are not necessarily the fault of the map, but the matchup itself. Sadly, the Knalga vs Undead win ratio presented here is what i consider "normal". I would not expect mappers to do the work of gods. I really wish if there were other statistics of other, maybe more conservative maps to compare to, but currently, Ruphus Isle is the most methodically evaluated map of the game (at least i do not know about such developer manifestos for core multiplayer maps).
But the most important thing to mention is how pointless 1000 replays are in themselves. Not only it is an absurdly big number and a sink of time and energy (once again, amassing replays for these stats took four years, which is ridiculous, but hopefully ever since allowing it for ladder, we will get more feedback to crunch, in a more natural way), even if i filter out the worst, it is little more than an obsession for hard data. Statistics like this should never be taken more than a loose guide and a support for theories plus good oldfashioned human logic. If anyone previously would had argued that Loyalists are probably inferior here, then we would have something to support his claims. I consulted with Mint yesterday and asked what did he expect to bottomscore, and he replied "anything but Loyalists". We will surely look into the roots of it in the future, but i think we have the right to consider it a statistical anomaly, for the time being. And if any further changes will be done to the map, i will not increase terrain density for sure.
Collecting data was far from useless, though, as these stats are more than enough to debunk some of the myths made by sceptics:
 Either side has an advantage (4852 is pleasantly even)
 Dominant chaotic rushes, the original reason why Hornshark Isle was imbalanced (no strong correlation between Northerners and Undead performance, as the latter is only 4th)
 Factions with flying/water units too powerful (again, no correlation, two factions with water units compete with two factions with flying units for the first place)
 Once one lost 2 villages, it is not possible to counterattack (the opposite proven by massive amount of examples, it is just that unlike in a classic dualfront map, you have to retreat to the middle; also numerous lawful victories were achieved between the period of turn 9 and 12)
Horus, organiser of International Wesnoth Tournament 2016
Re: Ladder Council
tekelili wrote:
If you can submit data that reinforce your experiment outcomes, then you should apologize for not provide them in first instance, instead show disapoiment with reclamations about valid experiment enverioment descriptionHorus2 wrote:Strongly disagree with these quotes.
Be aware English is not my first language and I could have explained bad myself using wrong or just invented words.
World Conquest II
World Conquest II
Re: Ladder Council
I am not disappointed. But i stated very clearly that every single replay had to pass a quality control and get weighted, as Velensk above also pointed it out.tekelili wrote:tekelili wrote:If you can submit data that reinforce your experiment outcomes, then you should apologize for not provide them in first instance, instead show disapoiment with reclamations about valid experiment enverioment descriptionHorus2 wrote:Strongly disagree with these quotes.
Horus, organiser of International Wesnoth Tournament 2016
Re: Ladder Council
Yes, you are right, I was being sloppy.Aldarisvet wrote:I forgot all my university courses.iceiceice wrote: It just depends on exactly what conclusions you are looking to draw.
If you are looking to demonstrate "every factional matchup is balanced to +/ 5%, with 95% confidence" or something very scientific, then yes you would need a lot more data. If you flip a coin 100 times, the standard deviation is 10, so observing only 30 heads or as many as 70 heads is not statistically significant at that level, in regards to the assumption that the coin is fair.
I've just checked. 'Standard deviation' is actually 5 for flipping a coin 100 times, not 10. Not much.
The standard deviation is the square root of variance. And the variance of a sum of independent random variables is the sum of their individual variances, so to find the variance for 100 coins we just need to find it for one coin. A single coin is either heads or tails, so lazily, the variance is obviously at most 1, since the value is always between 0 and 1, that's what I used That leads to upper bound of 100, and for standard deviation, an upper bound of 10. But that's not the tight bound  the expectation of the coin is 1/2, so the distance from the mean, in both cases 0 and 1, is always 1/2 actually. The squared distance from the mean is thus 1/4, so that's the correct variance for a single coin. So the variance for n coins is actually n/4, and the standard deviation is 1/2 sqrt(n), so half what I said.
It means more than that though, as long as the distribution is "near gaussian". A common thing that people like doctors will have to know is that "95% of the data is within two standard deviations of the mean"  this is a pretty common rule of thumb for people who have to actually look at statistics to do things. More generally, the probability that a random data point is more than `k`deviations from the mean is less than a function on the order `e^{k^2}`, which is vanishingly small for even modest k. People sometimes talk about a "6 sigma" event in finance or whatever as being a onceinalifetime or onceinseveraldecades deviation. I guess more recently in popular culture people talk about "black swan events".Aldarisvet wrote:iceiceice wrote: But all that it means that in 68.2% cases the deviation must be within that +/ 5% deviation.
You might be very skeptical of the idea that assuming that some distribution is gaussian or near gaussian is reasonable  it's very convenient and seems hard to justify. I used to be quite skeptical of such claims myself when I was much younger. However it turns out that usually it's an extremely reasonable assumption. Basically whenever you are adding many random numbers together, as long as the random numbers are typically similar in magnitude, and they are "mostly" independent of oneanother, the sum must tend extremely quickly towards a gaussian distribution  it's just a fundamental mathematical phenomenon. Whenever you add two distributions A and B which are independent, the result tends to be smoother than both A and B in some sense, and it turns out that the maximally smooth distribution under additive perturbations is the Gaussian.
Many people in school learn something called a "Central Limit Theorem" which shows this in some formal sense when adding copies of a given distribution, usually in terms of Levydistance of distributions. But if you are mainly interested in Large Deviations bounds, which is usually what is most useful, then actually a much stronger and simpler bound is due to Bernstein and now called the "Chernoff bound". Chernoff's bound, and many powerful derivatives of it, are extremely important in modern theoretical computer science and discrete mathematics. They are extremely useful for analyzing failure probabilities of algorithms, constructions of data structures and routing schemes, or more generally in combinatorics and explicit constructions. Usually the assumption of researchers is that if you are adding a bunch of random variables together that "look like" a chernoff bound might apply because there is "no clear reason" that they should be highly correlated, the assumption is that the Chernoff bound is indeed true and it's just a technical challenge to figure out how to prove it. I proved a quite strong bound of this kind once and submitted it as part of a paper which was accepted to a tier one theory conference a few years ago: http://eccc.hpiweb.de/report/2012/042/
So my general intuition here, anyways, is that even if the games being played are not completely statistically independent, i.e., even if sometimes the same player played in some of the games, even if the players are talking to eachother about the games, etc., you should expect a nearlyGaussian distribution to emerge very quickly, and the "95% of the data within 2 standard deviations of the mean" assumption and all related assumptions are quite reasonable, IMO.
 Pentarctagon
 Forum Administrator
 Posts: 4007
 Joined: March 22nd, 2009, 10:50 pm
 Location: Earth (occasionally)
Re: Ladder Council
I don't pretend to understand most of the statistical stuff, but it would be interesting to see an applestoapples comparison as it were. Which factions "noobs" are more likely to win/lose with vs "pros".
99 little bugs in the code, 99 little bugs
take one down, patch it around
2,147,483,648 little bugs in the code
take one down, patch it around
2,147,483,648 little bugs in the code
 Aldarisvet
 Translator
 Posts: 755
 Joined: February 23rd, 2015, 2:39 pm
 Location: Moscow, Russia
Re: Ladder Council
@iceiceice
I cant go too far in this matter without brain stress, I am an economist, not a mathematician, and my current work is quite far from a statistical analysis.
I just wanted to remember at least something from my 10year ago education, how we can judge if some deviation is saying that this is really abnormal or it is just a result of randomness.
So 58% winrate for rebels is between sigma and two sigmas interval and its probablity is 13.6%. So this can happen even if rebels do not have a real advantage on the map.
And 35.29% win rate for loyals is conditionally on the 3 sigmas threshold from 50%. Because deviation from 50 is almost 3*5=15. And between 3 sigmas and infinity (actually 10 sigmas is a maximum deviation here) we have just 0.1% probability so this deviation could really say something (say that the map is really unbalanced against loyals), but still, with that quite low probability can be just a result of randomness.
However, we have too small selection to judge. Probably I was not right about 1000 games in total, but 100 games for every faction would be good from statistical point of view.
I just trying to write simple things so everyone could really understand simple criterias to judge if some winrate is saying about unbalance or not. I hope I did it right.
I cant go too far in this matter without brain stress, I am an economist, not a mathematician, and my current work is quite far from a statistical analysis.
I just wanted to remember at least something from my 10year ago education, how we can judge if some deviation is saying that this is really abnormal or it is just a result of randomness.
So 58% winrate for rebels is between sigma and two sigmas interval and its probablity is 13.6%. So this can happen even if rebels do not have a real advantage on the map.
And 35.29% win rate for loyals is conditionally on the 3 sigmas threshold from 50%. Because deviation from 50 is almost 3*5=15. And between 3 sigmas and infinity (actually 10 sigmas is a maximum deviation here) we have just 0.1% probability so this deviation could really say something (say that the map is really unbalanced against loyals), but still, with that quite low probability can be just a result of randomness.
However, we have too small selection to judge. Probably I was not right about 1000 games in total, but 100 games for every faction would be good from statistical point of view.
I just trying to write simple things so everyone could really understand simple criterias to judge if some winrate is saying about unbalance or not. I hope I did it right.
facebook.com/wesnothian/  everyday something new about Wesnoth
My campaign:A Whim of Fate, also see Zombies:Introduction single map campaign
Art thread:Mostly frankenstains
My campaign:A Whim of Fate, also see Zombies:Introduction single map campaign
Art thread:Mostly frankenstains
Re: Ladder Council
Hi, sorry it took so long for a reply, I thought I had replied already but it seems I forgot to.SigurdFireDragon wrote:With Sandbox Map Picker, I noticed that Rime Grotto and Hellhole isn't in any set. And Cynsaun Battlefield is in both 'Adventurous & Conservative' and 'Museum' Also that 'random start time' isn't checked by default. Are these intentional?
Rime Grotto is currently not in a set as it is unfinished. Currently the plan with it is to have an income change at some point but that may not ever work, so it might just become a regular map. Hellhole is complete, however we are yet to vote on where it goes.
Cynsaun Battlefield was only meant to be in the museum, and I fixed it when I saw your message. Thank you for reporting it.
Random start time is not checked by default as all the maps are balanced for a set time of day. For most maps this is dawn, but for a few maps the starting time is either pushed forward or pushed backward by one element in the cycle (for example Fallenstar Lake starts at second watch instead of dawn). So yes this is intentional.
Thanks for the feedback!
Re: Ladder Council
Sometimes the greatest opponent on a map is yourself. This was the case when i played against someone who found an exploit on Hellhole, and the circumstances allowed him to demonstrate it firsthand. Good laughs were had.
Obviously the map has been updated since, but the replay remains as a proof that being able to think outside the box is a valuable skill, and not just an overused phrase.
In the following days we are going to vote on the inclusion of Hellhole. If you already saw it in play and you liked it, wish the map good luck!
Obviously the map has been updated since, but the replay remains as a proof that being able to think outside the box is a valuable skill, and not just an overused phrase.
In the following days we are going to vote on the inclusion of Hellhole. If you already saw it in play and you liked it, wish the map good luck!
Horus, organiser of International Wesnoth Tournament 2016
Re: Ladder Council
Hereby i summon Party Skeleton to aid us in celebrating the successful inclusion of Hellhole to the adventurous category. The updated version of the Sandbox Map Picker is ready to be downloaded.
Happy inclusionday!
Happy inclusionday!
Horus, organiser of International Wesnoth Tournament 2016
Re: Ladder Council  Pyrennis gfx change
The Walls of Pyrennis is a fastpaced and wellreceived ladder map, but no one complains openly about its obtrusive flaw: readability. Because it uses the same snowy orcish keep tiles for the impassable walls and the recruiting purposes, it is very hard on the eye, so the first impressions of the players are not overly positive.
Mint and i both came up with possible graphical changes. We believe the orcish keep for walls is neat, but the actual keeps have to be redesigned. Dreadnough suggested to make it fully encampment keep. Our ideas can be seen in the spoiler below:
This is an open voting for everyone who knows the map. On which change would you cast your vote on? Please bombard us with feedback! (i cannot create a poll, so we have to work with comments)
Also, if you have a different suggestion, feel free to post it.
Mint and i both came up with possible graphical changes. We believe the orcish keep for walls is neat, but the actual keeps have to be redesigned. Dreadnough suggested to make it fully encampment keep. Our ideas can be seen in the spoiler below:
big image:
This is an open voting for everyone who knows the map. On which change would you cast your vote on? Please bombard us with feedback! (i cannot create a poll, so we have to work with comments)
Also, if you have a different suggestion, feel free to post it.
Last edited by Horus2 on April 14th, 2016, 6:20 pm, edited 1 time in total.
Horus, organiser of International Wesnoth Tournament 2016

 Posts: 55
 Joined: March 7th, 2010, 1:01 pm
Re: Ladder Council
Hey,
Actually I am not sure if I like either of them.
Mint´s castle design works for me, but the keep really looks out of place compared with the rest of the map. So I can not really vote for it.
Horus2´s design works for me, the keep looks fine and not really out of place. But, the fence of the castle and keep is of a different design.
I think a combination of both ideas works best. Take the castle from Mint and the keep from Horus2. This way the fence works and the castle/keep looks different from the impassable terrain.
Actually I am not sure if I like either of them.
Mint´s castle design works for me, but the keep really looks out of place compared with the rest of the map. So I can not really vote for it.
Horus2´s design works for me, the keep looks fine and not really out of place. But, the fence of the castle and keep is of a different design.
I think a combination of both ideas works best. Take the castle from Mint and the keep from Horus2. This way the fence works and the castle/keep looks different from the impassable terrain.
Re: Ladder Council
I like Mint's cause it's easy to tell the difference between impassable and keep.
Re: Ladder Council
I vote for Mint's version, i think it looks the best.
Btw i have no idea why loyals' win ratio is so bad, its pretty easy to do well with them if you survive the initial rush from my experience, especially vs some factions like rebels
Btw i have no idea why loyals' win ratio is so bad, its pretty easy to do well with them if you survive the initial rush from my experience, especially vs some factions like rebels
Re: Ladder Council
Hi, i liked the Grand Houde Mountain map so much made some changes to it: