Ladder Site Online...

Angry Andersen · September 10th, 2008, 12:04 pm

Wintermute wrote:
Angry Andersen wrote:k=40*exp(-N/100), where N is the total number of games played...
The function starts at k=40 for a player's first game and then decreases smoothly to a value of ~15 after 100 games played and ~5 after 200 games. So after 200 games, the rating of the player will only change at 1/8th of the speed that it did for the first games.
Maybe the k-values should have some absolute lower limit C, in which the general shape of the formula would become k=A*exp(-N/B)+C

Implementing different systems in parallel, as suggested by eyerouge, would be brilliant for testing purposes!
I agree, that testing things with the data we have is a good place to go. I think that your model is likely the way to go "down the road" if not right now. The potential issue that I see is that if we use your model with the current stats, there are simply not enough games played for this to be any improvement on the current system. Again, this may or may not be the case, but what I could imagine happening is that (since few people have played even 100 games), ratings in general will be low, and that seems like it would be LESS stable whenever a good, new player joins. That player can play 20 games, win almost all of them, and perhaps shoot up OVER the heads of everyone else. Keeping in mind that player's ranks for those that have played lots of games will be moving slowly. Then such a hotshot newcomer might lose 5 games against top players and plummet down almost equally as fast. This is of course speculation - I haven't run the numbers, but have you considered/dismissed such possibilities?

I also wonder, as I mentioned before about rounding off low K-values causing poor results? If you have only 5 points to play with, won't this encourage preying on weak players, because you always get at least 1 point, and can't really get that many points from anyone?

As you correctly noticed, the change to k-values in the ELO-system that I suggested assumes that scores be kept as floating-point values, even if just integer numbers are displayed. Otherwise, rounding errors are likely to cause distortions.

The rating system I developed (not the k-factor correction) has one interesting property, which none of the other systems have: the order in which games were played has NO influence on the resulting scores. This has quite an influence on the situation you described above: the distortions of other players scores caused by games against provisional players with inaccurate scores correct themselves as more information (=games) about the provisional player becomes available.

An example:
Player A: good player, high but not top score
Player B: entered the ladder recently (~average score), but is an expert player

Player B plays against player A and wins. After that, player B wins many other games and becomes one of the top ranked Players. What would the effects on scores be:

ELO-type systems:
Player A is punished for loosing a game against an opponent with much lower score. The following games by player B have no influence on this

My algorithm:
Player A is punished for loosing a game against an opponent with much lower score. As player B continues to rise in score, the score of Player A is also rehabilitated, since now it becomes obvious that he lost against a pretty strong opponent. I.e. with the new information, it becomes evident that Player B was likely to win the game.

All algorithms, in which the order of games plays a role on the resulting scores, reward games against currently overrated opponents and punish games against currently underrated opponents (as in my example). So picking the right opponents at the right time becomes very important to achieve top scores and DISTORTS the scores of players (i.e. scores reflect tactical picking of opponents AND playstrength, rather than just the later).

This effect is much attenuated with my algorithm: even if the opponent was over-/underrated at the time of the game, this will be corrected as more information about that opponent becomes available. For the same reason, the influence of lucky wins/losses should also be reduced.

Would it be possible, to export the current database to a text file? That way I could import it into Matlab and calculate scores for comparison.

Post by **Wintermute** » September 10th, 2008, 6:45 pm

Angry Andersen wrote:As you correctly noticed, the change to k-values in the ELO-system that I suggested assumes that scores be kept as floating-point values, even if just integer numbers are displayed. Otherwise, rounding errors are likely to cause distortions.

Well, other than the obvious joy of having several decimal places tacked onto each rating, I seem to remember some problem with keeping decimals leading to rating deflation? Though I admit that I don't see that anywhere, now that am looking for it - so maybe I just made that up.

Angry Andersen wrote:The rating system I developed (not the k-factor correction) has one interesting property, which none of the other systems have: the order in which games were played has NO influence on the resulting scores. This has quite an influence on the situation you described above: the distortions of other players scores caused by games against provisional players with inaccurate scores correct themselves as more information (=games) about the provisional player becomes available.

An example:
Player A: good player, high but not top score
Player B: entered the ladder recently (~average score), but is an expert player

Player B plays against player A and wins. After that, player B wins many other games and becomes one of the top ranked Players. What would the effects on scores be:

ELO-type systems:
Player A is punished for loosing a game against an opponent with much lower score. The following games by player B have no influence on this

My algorithm:
Player A is punished for loosing a game against an opponent with much lower score. As player B continues to rise in score, the score of Player A is also rehabilitated, since now it becomes obvious that he lost against a pretty strong opponent. I.e. with the new information, it becomes evident that Player B was likely to win the game.

Well, that is an interesting idea. That seems like it would work very well in most cases. Two more questions though:
1) Might calculation become prohibitively expensive? It seems that as more games are played, there will be lots of recalculating going on for a players score... which will then affect ALL the games (that is, the opponents score) that the player participated in. And then if those scores are recalculated, won't that effect the player's score again? I guess maybe I don't understand what your algorithm is doing, but it seems like if players past scores are calculated based on their current score, than a single change in score will mean LOTS of calculation, as the effects ripple down the ladder and then ripple back and forth until new balanced is reached.

2) While it *shouldn't* be a problem, there seems to be increased potential for players to abuse the ladder rankings, either intentionally or by accident. Consider a player who plays quite well, beats lots of good players, and then stops taking things as seriously. That player may be credited with wins that would not be repeatable, since the player's playing level has dropped - which would impact the scores of all those good players that were beaten on the way up. In the worst case, real sabotage is possible, as a good player could beat an individual (or several) lots of times, and then purposefully trash his/her score in some kind of a vendetta -which would harm the other innocent players.

I can imagine a chess player playing grand masters at the peak of their carrier and establishing a high rating. But then as the masters get older (and their ratings drop) the player will be punished because his old opponents are "getting worse". Granted we don't really care about 20 years from now, but I think the question is valid: what happens to your scores as players move on/drop out of wesnoth? Perhaps this is only a theoretical problem - what do you think?

The other potential factor that I see is that many players start playing Wesnoth, join the ladder and start playing games at a pretty low level. They may get MUCH better over time, ending up with quite a good score. Their early opponents may get lots of credit for beating a weak opponent who was accurately rated at the time, but has since moved up. Why should those early games count for so much?

Angry Andersen wrote:All algorithms, in which the order of games plays a role on the resulting scores, reward games against currently overrated opponents and punish games against currently underrated opponents (as in my example). So picking the right opponents at the right time becomes very important to achieve top scores and DISTORTS the scores of players (i.e. scores reflect tactical picking of opponents AND playstrength, rather than just the later).

While this is true, I think the worst damage can be pretty well handled with a provisional rating system (such a system is currently in place and will likely continue to be tweaked as the ladder grows). I am not sure that this is any worse a problem than the issues that I raised above. Testing may help determine that:

Angry Andersen wrote:Would it be possible, to export the current database to a text file? That way I could import it into Matlab and calculate scores for comparison.

That would be great. I don't do anything officially with the ladder (I'm all talk!), but perhaps eyerouge can/will be able to supply you with a text file. Maybe a tab delineated exel export would work?

eyerouge · Post by **eyerouge** » September 11th, 2008, 7:46 am

Angry Andersen

I've included database dumps in various formats (CSV, LaTex and Open Doc) which contain the results of each game played, along with the winner / loser names. Shout if something isn't obvious in them and I'll explain.

About the suggested system: As Wintermute notes, the system doesn't differentiate player skills over time - it doesn't take into account that a player I win/lose against at time x doesn't have the same skills at time y. In the Elo system that person isn't really even treated as "the same" player. In your system the person is always the same, "constant" in skills and has no development.

Only way I see to fix this problem, granted I understood your suggestion at all

, is if game results in the rating calculations are only valid for x amount of time. For example, when a player wins, your formula would go into work ut only take into account i.e. all the results of all games from the latest 30 days or so. However, that is also problematic since there is no real rational way to determine that time.

(On a sidenote: A system with a similar problem, if I have understood it, is the Glicko one - there the time inbetween games seems to have an effect on the rating, which is absurd since time passed in between games doesn't equal loss in skills or that the player didn't keep on playing outside of the rating framework.)

Angry Andersen · Post by **Angry Andersen** » September 11th, 2008, 8:17 pm

@eyerouge:
Thanks a lot for the files! I managed to import the data and am currently running some tests. There are some minor problems with importing the data. Could you tell me the total number of games there should be and also the maximum length of the name of a player? That would allow me to validate if everything is ok.

Wintermute wrote: Well, that is an interesting idea. That seems like it would work very well in most cases. Two more questions though:
1) Might calculation become prohibitively expensive? It seems that as more games are played, there will be lots of recalculating going on for a players score... which will then affect ALL the games (that is, the opponents score) that the player participated in. And then if those scores are recalculated, won't that effect the player's score again? I guess maybe I don't understand what your algorithm is doing, but it seems like if players past scores are calculated based on their current score, than a single change in score will mean LOTS of calculation, as the effects ripple down the ladder and then ripple back and forth until new balanced is reached.

That is a pretty good description of what is happening

I use a random walk procedure to find the scores for all players, that explain the data (game results) best. Of course, this calculation is more demanding than what the present system does. My 'prototype' version, which hasn't been optimized yet, needed almost 4 minutes(!) to calculate all scores from scratch (over 4000 games and over 500 players). But there are many ways in which the performance can be improved to become much faster. Especially, initializing with previous results rather than random values should lead to a MUCH faster performance. So, yes, currently it is too 'expensive' in terms of calculation demands, but I'm sure this can be improved a lot.

Wintermute wrote: 2) While it *shouldn't* be a problem, there seems to be increased potential for players to abuse the ladder rankings, either intentionally or by accident. Consider a player who plays quite well, beats lots of good players, and then stops taking things as seriously. That player may be credited with wins that would not be repeatable, since the player's playing level has dropped - which would impact the scores of all those good players that were beaten on the way up. In the worst case, real sabotage is possible, as a good player could beat an individual (or several) lots of times, and then purposefully trash his/her score in some kind of a vendetta -which would harm the other innocent players.

I can imagine a chess player playing grand masters at the peak of their carrier and establishing a high rating. But then as the masters get older (and their ratings drop) the player will be punished because his old opponents are "getting worse". Granted we don't really care about 20 years from now, but I think the question is valid: what happens to your scores as players move on/drop out of wesnoth? Perhaps this is only a theoretical problem - what do you think?

The other potential factor that I see is that many players start playing Wesnoth, join the ladder and start playing games at a pretty low level. They may get MUCH better over time, ending up with quite a good score. Their early opponents may get lots of credit for beating a weak opponent who was accurately rated at the time, but has since moved up. Why should those early games count for so much?

I agree, that the assumption of a stable performance is not realistic and can lead to the problems you describe. A possible solution would be to reduce the influence of a game as time passes. For example, the "weight" of a game could smoothly decline with time, so that it is halved for every 3 months since it was played. That way, a game that was played one year ago would only count 1/16th of a game played today. This would also encourage players to stay active.
(this could be introduced in my algorithm by a simple multiplication, i.e. the calculation demands wouldn't be affected much)

Angry Andersen wrote:All algorithms, in which the order of games plays a role on the resulting scores, reward games against currently overrated opponents and punish games against currently underrated opponents (as in my example). So picking the right opponents at the right time becomes very important to achieve top scores and DISTORTS the scores of players (i.e. scores reflect tactical picking of opponents AND playstrength, rather than just the later).

Wintermute wrote:While this is true, I think the worst damage can be pretty well handled with a provisional rating system (such a system is currently in place and will likely continue to be tweaked as the ladder grows). I am not sure that this is any worse a problem than the issues that I raised above. Testing may help determine that:

I hope the solution offered above might fix such problems, but only testing will tell

eyerouge · Post by **eyerouge** » September 11th, 2008, 9:49 pm

I don't remember how many games i exported, should show in the CSV version on what line number it ends - one line per game. I thin it's 4 284 in total or 2 284. Also, here are games included that are contested (=1 or TRUE) or withdrawn (=1 or TRUE), and those should not be counted towards the rating - dismiss them altogether in the calculations.

Max. amount of characters in a nickname can be up to 40.

Fosprey · Post by **Fosprey** » September 14th, 2008, 1:02 am

I post here the problem since i don't find the contact mail anymore on the ladder.subversiva.org
I wanted to know what happend with the replay feature, it was a great feature and i don't see it anymore, what happend?

Post by **Yogibear** » September 14th, 2008, 9:16 am

It was abandoned due to host problems on the server side (eyerouge can't do anything about it because he doesn't have access to that part of the server).

We will be informed, as soon as the problem is fixed, but for now it's better to stay away from uploading replays.

eyerouge · Post by **eyerouge** » September 15th, 2008, 2:34 pm

Fosprey wrote:I post here the problem since i don't find the contact mail anymore on the ladder.subversiva.org
I wanted to know what happend with the replay feature, it was a great feature and i don't see it anymore, what happend?

Exactly as Yogi wrote (thnx btw). The feature will be enabled as soon as the problem is fixed or we have another free (and spam/ad free) webhost that is at least as good or better.

Kalis · Post by **Kalis** » September 18th, 2008, 4:25 am

Eyerouge, here's a question.

I had a match against Murderer earlier. We started a Weldyn Channel game, he randomed dwarves, and then he promptly left because he didn't want to play them.

Is that legitimate? I was pretty annoyed, so I reported that as my victory, and he just contested the game.

Post by **Yogibear** » September 18th, 2008, 9:59 am

First of all:
Please don't use this thread to clarify such questions, this is for development of the ladder, not for settling player disputes. If you have problems, either send a mail or PM to eyerouge. In the future there will (hopefully) be a ladder administrator to contact for this.

Second:
I am not in the position to make official statements (as if there is anything "official" about the ladder

) but i am pretty sure that the following is correct and in the original sense of the ladder, its rules and the intentions behind them.

To answer your question:
No, this is not legitimate. The only way to make it so would be, if you both agree on cancelling the game before it ends (or even better: If you agree on that before the game starts).
If you don't do that, standard rules apply and if a player leaves and doesn't come back he lost.

However:
It's not like we play for money, do we? And the game actually ended before it started. If i were you, i'd not report the game, tell Murderer that you are a little upset and why, and agree on the exact terms the next time you play each other. For example you could as well agree not to play certain mirror matches. Whatever.
This is also in the sense of the ladder administrators (atm eyerouge) that players settle their disputes by themselves. We all have better things to do than spent our time with teaching people politeness and seriousness

.

You have the right to insist on your victory, but in my opinion there is no need to do so. Relax and take it easy

.

ADmiral-N · Post by **ADmiral-N** » September 18th, 2008, 10:52 am

And the next time you play against him, think twice about the sportsmanship rating.

Edit:
On second thought, isn't Murderer a Rebels player? Why would he play random? Did you ask if he was ready before starting the game?
Don't post the answers in this thread; Rather, consider them from your point of view and especially Murderer's. Then decide on actions

Fosprey · Post by **Fosprey** » September 18th, 2008, 11:18 am

It's desiree that we use docs new maps? if so, should we add that in the game's name?

Post by **Wintermute** » September 18th, 2008, 12:43 pm

Fosprey wrote:It's desiree that we use docs new maps? if so, should we add that in the game's name?

People always desiree to use doc's new maps!

Seriously though, using the most current map is probably a good idea. It is possible of course that there could be "problems" with relatively untested maps, so perhaps be aware that if using new maps a dispute might arise (I.E. this new map isn't fair, etc.) and it may be hard to "prove" that one or another until the maps become a bit more established. However, testing is good - use the new maps.

I don't think I'd put anything in the game title, but it would be nice to the opponent to inform him/her in the game lobby. Since they may be planing something and need to change up (and there is a timer).

mrmoose · Post by **mrmoose** » September 18th, 2008, 4:02 pm

Kalis wrote:Eyerouge, here's a question.

I had a match against Murderer earlier. We started a Weldyn Channel game, he randomed dwarves, and then he promptly left because he didn't want to play them.

Is that legitimate? I was pretty annoyed, so I reported that as my victory, and he just contested the game.

just to throw my two cents in i think u should count this game. to me him leaving automatically gives u a victory theres no reason u should be punished because the person u were playing against didnt like the faction he randomly got and decided to quit. after all it was his choice to go random he could have just as easily chose his faction thats his fault not urs... anyways sorry for posting here just had to give my opinion

Kalis · Post by **Kalis** » September 18th, 2008, 6:05 pm

ok I'll e-mail him about it yogi bear, thanks.

And the reason I asked is because it's happened to me around 3x in the past week now, and I'm getting really fed up with it.
It's like more and more people feel this is a legit strategy.

And Admiral, no he doesn't always play rebels. He goes random too.

The Battle for Wesnoth Forums

Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...