Statistical analysis of the RNG

mrchadt · Post by **mrchadt** » March 30th, 2007, 10:14 am

Thats it really. I think this would be a usefull addition to the stats table already in place. A mean of your dice rolls and their corresponding standard deviation.
I can't imagine it would be too difficult to do, but I can't imagine myself doing it, so does any kind person in dev think this is a good idea?

Post by **torangan** » March 30th, 2007, 12:22 pm

The random generator used is the default random implementation from C/C++. I'm pretty sure you can find scientific data about it.

mrchadt · Post by **mrchadt** » March 30th, 2007, 12:56 pm

Sorry, I didnt make myself clear. What I meant was a statisical report of the numbers generated by the RNG for a player in a game of wesnoth, similar to the stats box which can be viewed by pressing s.
I think this would give a good idea of how lucky or unlucky one has been in a game.
I hope this is clearer.

Chris NS · Post by **Chris NS** » March 30th, 2007, 1:49 pm

There already is: damaged taken/inflicted against EV (expected value). And I personally would drop even that (or at least remove it from the default display). Luck is something you cannot simply measure from how many attacks did and didn't hit, as a few key attacks involving a key unit often outweighs a string of good or bad luck on the other 90% of the battlefield.

Post by **Dave** » March 31st, 2007, 12:13 am

I would agree that the 'expected values' already shows everything useful to be shown.

Since rolling a 50 is as good as rolling a 90 when all you need is a 40 to hit, I don't see how 'standard deviation' could be meaningful.

David

rrenaud · Post by **rrenaud** » March 31st, 2007, 2:18 am

It gives you an estimate of how biased the spread between the expectation and actual value is. If the actual damage dealt for your attacks is more than 3 standard deviations above the mean, you can be fairly confident that you have taken advantage of luck.

Still, I do agree that an unweighted average doesn't tell the whole story. Certainly, not every attack was equallty important. Still, juding just how crucial a given attack was seems like a hard AI problem. Also, standard deviation is probably hard to put into perspective for many players.

Post by **MSchmahl** » March 31st, 2007, 10:04 am

This may be off-topic for this thread, but I've often wondered how the EV in the stats table is calculated. Is it per-combat, or per-strike?

For example, suppose I attack a Dark Adept that has 8 hp in water with my Lancer (12-3) at dawn. It honestly doesn't matter whether I manage to kill on the first strike, or the third. I suspect, however, that if the Lancer hits on the third strike, the EV accumulater goes up by 12x3x0.8, or 28.8, while the actual damage goes up by 12. Which means that I was "unlucky". Another way of looking at it would be that I had a 99.2% chance of doing that 12 damage, which means the EV should have been 11.904, or perhaps 7.936 (8 times 0.992).

mrchadt · Post by **mrchadt** » March 31st, 2007, 11:24 am

To Msschmahl. The point you made there I have discussed before on previos threads, which is patially why I feel that this idea would offer some usefull information.

To Dave. Yes you are right rolling a 90 is the same as rolling a 50 if you need 40 to hit, but not if you need 60 to hit. My point here is that the average dice throw might well be 50 but if the spread is not uniform which the SD would show then one would have a better idea of how lucky one was in a particular game.

My idea of having this was to see the correlation between what the RNG rolls and damage EV. I was wondering if EV could be -or + even if the RNG was in fact average with uniform distribution. Obviously if the distribution were not uniform lets say to be extreme mostly 10's and 90's , which would give 50 as an average but high SD or lost of 40's,50's and 60's for low SD, then the damage EV could be anything.

Well this is just an idea, thanks for taking your time to look at it.

Post by **Darth Fool** » March 31st, 2007, 11:58 am

If you are really curious, you could always right a simple parser for the wml save files taking data out of the [random] tags to plot whatever you want. Such a tool would be useful for whenever this subject periodically returns, and if there was some statistic that turned out to be really interesting, what better way to convince a dev to add it into the main wesnoth code than to have a nice plot showing why it is cool?

Yogin · Post by **Yogin** » April 21st, 2007, 11:37 am

I wanted to do this at one point, but I couldn't figure out the correct way to calculate variance. I got as far as figuring out the variance of a sum of independently parameterized Bernoulli's. But the problem I had was when each bernoulli distribution got multiplied by strike damage. That scales its standard deviation by the strike damage, but you can't just add standard deviations or add variances when combining multiple strikes with different strike damages.

I'll think about it some more, but the solution for indepent bernoulli's is easy:

Code: Select all

X ~ Bern(p1)
Y ~ Bern(p2)

EV(X+Y) = EV(X)+EV(Y)

Var(X+Y) = [Var(X)+Var(Y)]/2

This can be extended to sum of multiple independent Bernoulli's, where 2 = # of distributions.

I don't think you can just put a d^2 in front of each variance where d = damage of each strike. It seems more complex than that.

Maybe I just need some sleep to get the correct formula.

rrenaud · Post by **rrenaud** » April 21st, 2007, 9:26 pm

What is wrong with algorithm 1 at wikipedia?

http://en.wikipedia.org/wiki/Algorithms ... g_variance

You can keep O(1) memory and output variance in O(1) time in an online fashion by just keeping track of the number of events, the sum of the events, and the sum of the events squared.

Yogin · Post by **Yogin** » April 21st, 2007, 11:53 pm

rrenaud wrote:What is wrong with algorithm 1 at wikipedia?

http://en.wikipedia.org/wiki/Algorithms ... g_variance

You can keep O(1) memory and output variance in O(1) time in an online fashion by just keeping track of the number of events, the sum of the events, and the sum of the events squared.

Umm... Algorithm 1-3 all have nothing whatsoever to do with the issue at hand. Algorithm 1 & 2 find the variance/standard deviation/variation for a sample of numbers x(1), x(2), ... x(n). Algorithm 3 finds the variance after x(n+1) has been added to the sample.

What we would like to do is find the variance/standard deviation/variation for the distribution created by cumulative strikes of different damage. We have no sample or population. All we have are raw probability distributions.

rrenaud · Post by **rrenaud** » April 22nd, 2007, 1:41 am

Wow, I was so off. I see now that you want to calculate the variance of say, 4 attacks with damage 5 and chance to hit of 60% along with 3 attacks with 2 damage and 70% chance to hit, and not the variance of attacks doing damage 5,5,5,0,3,0.

Hopefully I am less clueless in this post

.

Why would you figure that putting a d^2 in front of the Bernoulli doesn't work?

It should be fine by 6 here.

http://en.wikipedia.org/wiki/Variance#P ... .2C_formal

Also, why do you divide by 2 when summing variances of independent random variables? That seems to contradict 8a listed there.

Swarm and slow, however, are quite annoying and get rid of the nice simplicity in calculations.

Yogin · Post by **Yogin** » April 22nd, 2007, 6:16 am

rrenaud wrote:Wow, I was so off. I see now that you want to calculate the variance of say, 4 attacks with damage 5 and chance to hit of 60% along with 3 attacks with 2 damage and 70% chance to hit, and not the variance of attacks doing damage 5,5,5,0,3,0.

Yes, but you need to do it per strike, rather than per battle. Otherwise EV is always over estimated.

Hopefully I am less clueless in this post .

significantly

Why would you figure that putting a d^2 in front of the Bernoulli doesn't work?

It should be fine by 6 here.

http://en.wikipedia.org/wiki/Variance#P ... .2C_formal

Also, why do you divide by 2 when summing variances of independent random variables? That seems to contradict 8a listed there.

Hmm... It's been 5 years or so since I studied stats, so I might've been wrong. I was under the impression that property 8a only applies to IID random variables. If they apply to non-identical RV's, then it's much easier than I thought. (IID is independent, identically distributed). Thus, I tried to derive it through other properties. I'll take a closer look at this. I suspect I'm wrong, which makes the calculations easier.

Swarm and slow, however, are quite annoying and get rid of the nice simplicity in calculations.

Should be ok, since you calculate EV after the battle ends, and do it per strike. Hmm... actually, I guess the issue's more complicated than that. Calculating per strike isn't the easiest way, either. You must do it per battle, and use the combat calculator we have. That results in a unique distribution. Kind of a multinomial, but not exactly.

mrchadt · Post by **mrchadt** » April 22nd, 2007, 8:22 am

Thanks for having a look into this, I hadn't noticed this thread had been replied to in a while.
I was thinking more of analysing the numbers produced by the rng rather then how much damage is done in game play.
But I do think that giving the variance or standard deviation of the damage done/taken would be useful too. I think at present the EV is calculated by chance to hit * damage done on each attack, which gives different numbers then if the EV is worked out from clicking on enemy and looking at the damage calculations, the second way being better imo.
If the information from the damage calculations was collected together this could be used as a starting point for analysis.