Another Statistical Measure

Brainstorm ideas of possible additions to the game. Read this before posting!

Moderator: Forum Moderators

Forum rules
Before posting a new idea, you must read the following:
Post Reply
Gidoza
Posts: 20
Joined: April 23rd, 2015, 4:35 pm

Another Statistical Measure

Post by Gidoza »

Hi,

I've been thinking about the statistical measure in Wesnoth that displays actual damage VS expected damage overall (taken and delivered). Generally, this is a good measure - however if it were used for a real statistical study of some kind (not being SUPER serious on that, but emphasizing the point that multiple measures are useful for a good overall analysis), I don't think it would pass: the main reason is that this gives no indication of where the damage is going or how effective it is. The simplest example of that is healing: healing units scew the data by providing total damage values that are not representative of the gold spent on units (if healing did not exist, then this might work).

My thought is that it would be interesting to have a second measure that displays information on expected kills. I'm not exactly positive how the math would work on this, but here's the idea:

1. If a unit could not die because the maximum damage of the attacking unit is unable to kill it, no information is recorded.

2. If a unit is generally expected to die (say 90%) and does, then a record of that expectation and fulfillment is made.

3. If a unit is generally expected to die and does not (failure on 90% death chance), this would shift the graph sharply in the other direction - but this is where my Statistical calculative knowledge gets fuzzy on how to do this.


Why this matters: Whereas the overall stats of a game can even out in the long run, huge strokes of luck in certain situations (killing a unit you didn't expect to or failing to kill one you expected to) affect considerably more than just the immediate damage statistics: failing to kill a unit that was reasonably expected to die, for example, can profoundly affect a battle: positioning is off, other units don't have access, the unit you thought was safe is now stranded, etc...and moreover, the attacked unit may even get away and manage to heal up and come back again. The opposite is true as well (taking out a unit I never could have imagined killing and then scoring more convenient hits because of it). None of this is seriously captures in the current statistical analysis: the proposed method would give a more global "picture" of what is actually *happening*, which is a more precise (if still imperfect) overall assessment.
User avatar
Ravana
Forum Moderator
Posts: 2933
Joined: January 29th, 2012, 12:49 am
Location: Estonia
Contact:

Re: Another Statistical Measure

Post by Ravana »

I think it might be too confusing for mainline, but if addons were given easier access to damage statistics, that might be possibility.
Tad_Carlucci
Inactive Developer
Posts: 503
Joined: April 24th, 2016, 4:18 pm

Re: Another Statistical Measure

Post by Tad_Carlucci »

Sounds to me like an awful lot of work to test the RNG. Most likely the desire is to disprove the claims it is. Such an effort seems fruitless, and may even backfire. I'm not opposed to some UMC trying, but it should not go into mainline.
I forked real life and now I'm getting merge conflicts.
Gidoza
Posts: 20
Joined: April 23rd, 2015, 4:35 pm

Re: Another Statistical Measure

Post by Gidoza »

Tad_Carlucci wrote: January 13th, 2020, 3:26 pm Sounds to me like an awful lot of work to test the RNG. Most likely the desire is to disprove the claims it is. Such an effort seems fruitless, and may even backfire. I'm not opposed to some UMC trying, but it should not go into mainline.
I'm not sure I understand your point. Whereas in Chess the matter of skill is questionable because randomness does not exist, there is always a random element in Wesnoth, and so unless that randomness is removed, the most accurate measure possible of the RNG statistics is important in order to know precisely what happened. I don't see why accurate measurements can "backfire" or really even why this needs to be explained, frankly.
Tad_Carlucci
Inactive Developer
Posts: 503
Joined: April 24th, 2016, 4:18 pm

Re: Another Statistical Measure

Post by Tad_Carlucci »

The issue is not technical, it's psychological. No amount of "proof" will convince someone who believes the RNG is flawed. Given the nature of randomness, you method is guarenteed to produce some highly skewed results which will be locked upon as "proof" the RNG is biased even though such resulst are, in fact, required part of the proof it is, in fact, not biased. That is the backfire ... your proof of non-bias will be taken as proof of bias.
I forked real life and now I'm getting merge conflicts.
Whiskeyjack
Posts: 476
Joined: February 7th, 2015, 1:27 am
Location: Germany

Re: Another Statistical Measure

Post by Whiskeyjack »

Tad_Carlucci wrote: January 13th, 2020, 6:00 pm The issue is not technical, it's psychological. No amount of "proof" will convince someone who believes the RNG is flawed. Given the nature of randomness, you method is guarenteed to produce some highly skewed results which will be locked upon as "proof" the RNG is biased even though such resulst are, in fact, required part of the proof it is, in fact, not biased. That is the backfire ... your proof of non-bias will be taken as proof of bias.
I think Gidoza's suggestion wasn't another "RNG sux mimimi" or "RNG is disfunctional" thread, but rather about the statistic measuring actual to expected hit RNG in a given game, which is a good (or not so much - that's the point of contention) indicator for multiplayer games how much luck factored into that one specific match.
(I think it's shown under menu -> statistics or with hotkey s)
Under blood-red skies, an old man sits
In the ruins of Carthage - contemplating prophecy.
Tad_Carlucci
Inactive Developer
Posts: 503
Joined: April 24th, 2016, 4:18 pm

Re: Another Statistical Measure

Post by Tad_Carlucci »

The point is, at some point, someone is going to see something happen calculate the odds of that happening as 1-in-10-million, and proceed to claim that it proves something is fishy with "luck" (aka the RNG) when, instead, it proves such long-shot occurances can happen and this actually disproves one way the RNG might be broken.

Put another way, to ensure such long-shots never occurred we would need to break the RNG so it would no longer be random.
I forked real life and now I'm getting merge conflicts.
User avatar
Pentarctagon
Project Manager
Posts: 5496
Joined: March 22nd, 2009, 10:50 pm
Location: Earth (occasionally)

Re: Another Statistical Measure

Post by Pentarctagon »

I mean, that already happens occasionally anyway with people complaining about a mage missing 6 attacks in a row or the like.
99 little bugs in the code, 99 little bugs
take one down, patch it around
-2,147,483,648 little bugs in the code
Mawmoocn
Posts: 154
Joined: March 16th, 2019, 3:54 pm

Re: Another Statistical Measure

Post by Mawmoocn »

Gidoza wrote: January 12th, 2020, 5:44 pm 2. If a unit is generally expected to die (say 90%) and does, then a record of that expectation and fulfillment is made.
I think you should record any chance of death (0.0001% or lower but not 0) as your data won’t show the difference (of luck? / skill?) if you kill a unit at lower percentage (even if that was 0.1% chance of happening).

If there are recorded brackets for every (chance to kill) percentage, you’ll see whether it’s partially accurate if the failure rate is higher than normal for 70% - 80% range.

The real problem is it won’t record any information before they’re vulnerable to die, so you can get situations like when a Peasant dies to a 1 HP Lich.... basically it won’t highlight the "details" without additional info.

There are strategies that use death to win, with this in consideration, information can’t be always “accurate” if strategies or tactics contain a mix of saving and sacrificing units.

I think it may need other supporting/additional features, because it seems incomplete and it can be nice to see this information somewhere.
Gidoza
Posts: 20
Joined: April 23rd, 2015, 4:35 pm

Re: Another Statistical Measure

Post by Gidoza »

Tad_Carlucci wrote: January 13th, 2020, 10:08 pm The point is, at some point, someone is going to see something happen calculate the odds of that happening as 1-in-10-million, and proceed to claim that it proves something is fishy with "luck" (aka the RNG) when, instead, it proves such long-shot occurances can happen and this actually disproves one way the RNG might be broken.

Put another way, to ensure such long-shots never occurred we would need to break the RNG so it would no longer be random.
In seriousness, I don't understand your point. You can't claim 1-in-10-million chances and RNG broken if you have more data. On the other hand, if a 1-in-10-million chance of luck occurs in my favour, I at least have enough honesty as the Victor to declare that if it had been a game of Chess in an otherwise similar situation, I would not have won.

Of course the stats are likely in most situations to balance out; of course small deviations will give many an excuse to cry foul. But since you bring it up, I can't blame anyone who would want a Chess-like Wesnoth without any probability at all.
Gidoza
Posts: 20
Joined: April 23rd, 2015, 4:35 pm

Re: Another Statistical Measure

Post by Gidoza »

Mawmoocn wrote: January 14th, 2020, 11:43 am
Gidoza wrote: January 12th, 2020, 5:44 pm 2. If a unit is generally expected to die (say 90%) and does, then a record of that expectation and fulfillment is made.
I think you should record any chance of death (0.0001% or lower but not 0) as your data won’t show the difference (of luck? / skill?) if you kill a unit at lower percentage (even if that was 0.1% chance of happening).

If there are recorded brackets for every (chance to kill) percentage, you’ll see whether it’s partially accurate if the failure rate is higher than normal for 70% - 80% range.

The real problem is it won’t record any information before they’re vulnerable to die, so you can get situations like when a Peasant dies to a 1 HP Lich.... basically it won’t highlight the "details" without additional info.

There are strategies that use death to win, with this in consideration, information can’t be always “accurate” if strategies or tactics contain a mix of saving and sacrificing units.

I think it may need other supporting/additional features, because it seems incomplete and it can be nice to see this information somewhere.
I appreciate this rebuttal and you make a particularly good point with the intentional death strategy. In principle one could say that you want to measure a player's intentions, but measuring intentions is a little...difficult...
Mawmoocn
Posts: 154
Joined: March 16th, 2019, 3:54 pm

Re: Another Statistical Measure

Post by Mawmoocn »

Gidoza wrote: January 16th, 2020, 5:19 pm In principle one could say that you want to measure a player's intentions, but measuring intentions is a little...difficult...
I thought it could work well with another supporting feature but it’s optional, because recording chance to die/kill is reliant to chance to hit and damage. It'll be sufficient if it's to compare "would be" deaths.

Some other ways to skew “death” statistics is if the unit is within a “conditional state” (chance of death can partially be removed by abilities due to healing or reducing damage while in combat) while on attack phase/combat with another unit.

Slow and Drain are abilities that fit this criteria as they reduce/heal damage if certain conditions are met. UMC can be different if other similar abilities appears.

It also includes units similar to Dwarvish Thunderer (1 strike units).

Basically, conditional elements are CTH abilities/attacks and other triggers inside combat that heals and reduces damage.


Anyways, I tried to visualise your idea but it has no graph, only numbers.

CTK = Chance to kill

Code: Select all

CTK		Alive	Dead	Real CTK
5% - 10%	496	25	4.8%
21% - 29%	35	8	18.7%
35% - 37%	40	12	23.1%
41% - 47%	80	91	53.3%
51% - 60%	221	195	46.9%
61% - 66%	349	467	57.3%
75% - 79%	91	397	81.4%
89% - 90%	12	268	95.8%

Total		1324	1463
* This is a hypothetical example

The problem is it lacks verbose info (for every encountered chance percentage) and maybe it’s not what you’re looking for?

Well if you can somehow differ which unit dies or their HP value before death, it’ll be complex but probably informative?

It lacks damage and chance to hit info so it probably can be improved later.
Gidoza
Posts: 20
Joined: April 23rd, 2015, 4:35 pm

Re: Another Statistical Measure

Post by Gidoza »

I appreciate your effort to take me seriously and draw the distinction about what I am and am not saying.

My observation on some of the RNG posts that have happened in various places is that there is some "talking past" that goes on. There is definitely good luck/bad luck that can be attributed to a game (e.g. if a coin is flipped 10,000 times, seeing 100 heads in a row does not mean that the coin is broken). The fact of the matter is that - since the game is one of strategy - 100 heads in a row might not break the RNG, but it can break someone's game. The RNG is fine, but it does not make sense to critique someone's tactics based on a streak of bad luck. A ton of lucky hits with Northerners (for example) in the daytime will kill nothing; while a streak of unlucky hits in the night will also kill nothing - yet the statistical register will come out saying that a statistically appropriate amount of damage was inflicted (which is true). Reality is that the RNG is not broken; but reality is also that the RNG can break a game. A single unit that gets to heal when it couldn't have before, or a town taken at a critical time which was hopelessly unlikely, etc...these have vital impacts on the game as a whole, in a way that can't be measured by the damage totals in any meaningful way.

Mawmoocn wrote: January 30th, 2020, 11:42 pm
Gidoza wrote: January 16th, 2020, 5:19 pm In principle one could say that you want to measure a player's intentions, but measuring intentions is a little...difficult...
I thought it could work well with another supporting feature but it’s optional, because recording chance to die/kill is reliant to chance to hit and damage. It'll be sufficient if it's to compare "would be" deaths.

Some other ways to skew “death” statistics is if the unit is within a “conditional state” (chance of death can partially be removed by abilities due to healing or reducing damage while in combat) while on attack phase/combat with another unit.

Slow and Drain are abilities that fit this criteria as they reduce/heal damage if certain conditions are met. UMC can be different if other similar abilities appears.

It also includes units similar to Dwarvish Thunderer (1 strike units).

Basically, conditional elements are CTH abilities/attacks and other triggers inside combat that heals and reduces damage.


Anyways, I tried to visualise your idea but it has no graph, only numbers.

CTK = Chance to kill

Code: Select all

CTK		Alive	Dead	Real CTK
5% - 10%	496	25	4.8%
21% - 29%	35	8	18.7%
35% - 37%	40	12	23.1%
41% - 47%	80	91	53.3%
51% - 60%	221	195	46.9%
61% - 66%	349	467	57.3%
75% - 79%	91	397	81.4%
89% - 90%	12	268	95.8%

Total		1324	1463
* This is a hypothetical example

The problem is it lacks verbose info (for every encountered chance percentage) and maybe it’s not what you’re looking for?

Well if you can somehow differ which unit dies or their HP value before death, it’ll be complex but probably informative?

It lacks damage and chance to hit info so it probably can be improved later.
IceSandslash
Developer
Posts: 17
Joined: February 12th, 2023, 1:13 pm

Re: Another Statistical Measure

Post by IceSandslash »

Good idea. With the caveat that every battle with a chance of casualties should be recorded, as stated by Mawmoocn.

Regarding graphs, (alert, wall of jargon)
Spoiler:
And that brings me to the next point. What people would be actually interested is in knowing how many units died that should have survived, how many that should have died survived, and -of course- how many fairly died. And that's an easy calculation. If there were N chances to kill C_i, used for n_i battles, resulting in d_i deaths. Then the theoretical amount of deaths would be ΣC_i * n_i, and the empirical amount would be Σd_i. Whether the player performed better or worse is just the difference of (Σd_i) - (ΣC_i * n_i). Positive - your rolls were good. Negative - your rolls were bad. And the p value for (Σd_i) would also be computed for good measure.
Mawmoocn
Posts: 154
Joined: March 16th, 2019, 3:54 pm

Re: Another Statistical Measure

Post by Mawmoocn »

Hi, I know I took a bit a long time to reply but the truth is, I just recently proved how complicated RNG was, on my very first original post, I thought of a way without including damage on the statistics because, it's harder to materialize your idea using damage...

To cut the story short,
Gidoza wrote: March 29th, 2023, 2:49 amA ton of lucky hits with Northerners (for example) in the daytime will kill nothing; while a streak of unlucky hits in the night will also kill nothing
based on my small research and this statement, the current RNG has 2 RNGs.

Chance to hit with number of strikes/hits, and, damage with number of strikes/hit.

While they coexist, the RNG for these 2 is in a symbiotic relationship.

They don't affect each other.

While I don't have the actual data, the proof that there are 2 RNGs is:

This hard to explain so to generalize, in a coin flip, there are 3 outcomes, head, tails, and the coin standing by itself.

Lets say that in order to make the flip valid, it must stop on the table, or it will be ignored.

You have 3 tries, you bet all on the coin standing by itself, and surprisingly, it worked???

In this situation, where could you find chance to hit?

Gidoza wrote: March 29th, 2023, 2:49 amthat can't be measured by the damage totals in any meaningful way.
To be honest, the idea of measuring vital information, despite the potential limitations, I could only think of using a graph based on damage with turn information.

Other ways is to compare damage and number of strikes/hit in a statistical form or use a "team performance" (units attacking 1 unit) statistics.

They may have limitations, including other conditions when calculating valid results (damage is the final result of combat), and damage itself.
Post Reply