Ladder Site Online...

Discussion of all aspects of multiplayer development: unit balancing, map development, server development, and so forth.

Moderator: Forum Moderators

Post Reply
Kolbur
Posts: 122
Joined: April 29th, 2009, 9:33 am

Re: Ladder Site Online...

Post by Kolbur »

@milwac
Are you sure that your Glicko implementation and successive tests with the ladder data are flawless? There are a few things that don't add up for me... :wink:

If losing vs much lower rated players gets punished so extremely like you said why are the Glicko graphs you posted perfectly flat after the early games? All these players have lost vs some clearly weaker players at some point so I would expect some serious downward spikes but there aren't any.

Also the claim that the Glicko rating always converges to the true skill strength of a player at present seems quite questionable if it ranks the same person with different nicks so differently. There some much more extreme examples than Dauntless in there. Let's take this one:
#53. Primus_Pilus [1736]
#2357. plk2 [1283]
:lol2:

Honestly I don't believe that Glicko could be that bad... :wink:

PS: Repairing the ladder is hopefully in the work
User avatar
milwac
Posts: 29
Joined: April 9th, 2010, 11:40 am

Re: Ladder Site Online...

Post by milwac »

Are you sure that your Glicko implementation and successive tests with the ladder data are flawless?
I'd like to think so! It wasn't that hard actually ;) but it'll help if I could have some test cases to prove that my implementation is not buggy. I tested with some hardcoded values and they seem reasonable. But since you mentioned this I'll take another look.
...why are the Glicko graphs you posted perfectly flat after the early games?
The spikes are there mostly during the initial phases only. After this initial phase, the RD of the players stabilise and they are less affected by unnatural results.(Either a loss or a win, hence aliases cannot affect an already established player much) The 'severe' treatment that I mentioned depends on a players' RD, and is more pronounced in the early stages. You would've seen more spikes if I further increased the initial RD to 350 (As it should be), but then I came across a lot of negative ratings and hence stuck with an initial value of 150. I should have made this clear before, apologies.

Some examples :

Before Match:
Winner Rating 2000 / Loser Rating 1500 / Winner RD 100 / Loser RD 200
After Match:
Winner Rating 2003 / Loser Rating 1308 / Winner RD 99 / Loser RD 193

Explanation : Winner is a good player and his rating is quite certain (low RD), so he doesn't gain much. Loser however has a high RD and hence he loses a lot of points.

Before Match:
Winner Rating 1500 / Loser Rating 2200 / Winner RD 350 / Loser RD 100
After Match:
Winner Rating 2111 / Loser Rating 2198 / Winner RD 337 / Loser RD 99

Explanation : Here something unnatural happened. Loser with a low RD wasn't expected to lose this one. But even then since the winner has a high RD, the loser is punished less. The winner however jumps up, but still has a high RD (which means he can come down quickly too)

Before Match:
Winner Rating 1600 / Loser Rating 2200 / Winner RD 50 / Loser RD 100
After Match:
Winner Rating 1613 / Loser Rating 2199 / Winner RD 49.9 / Loser RD 99.5

Explanation : Somewhat sad, but since the winner had already played at a 1600 level for such a long time that even after outplaying his top ten counterpart comprehensively without any rng help still doesn't help his rating much.
There some much more extreme examples than Dauntless in there. Let's take this one:
#53. Primus_Pilus [1736]
#2357. plk2 [1283]
I guess it's because of how the ladder was treated to by most (now) veteran players. They might have lost a lot many games initially against weak players (because of luck etc), and their RD kinda stabilised by the time they started making big, so it was hard for their rating to recover. If you see players with more than 800 games, most of them are suffering from these dismal stats : neki, plk2, skb, and to some extent Dauntless as well. OTOH if you'd take a look at people who have played less games but have won most of them, they are right at the top. Some people who could be newbies either didn't stay too long, or were never challenged enough as their performances were not noticed. (In Glicko good new players are spotted instantly) You cannot really judge the effectiveness of glicko from recomputation of rankings running on a 4 year old Elo system. It is just wrong. Hence even if I feel the rating system should be clearly defined, I don't see much sense in this migration.

If you'd start afresh you'll see things as the way they are : Goldilocks and Primus_Pilus are where they should be.
Honestly I don't believe that Glicko could be that bad...
Haha.. what can I say, fact is sometimes stranger than fiction ;) But still I won't be dissapointed so much, because these results have always decreased the RD, never increased it. Say a player took a break for a year, and came back much more stronger than before. The RD should be set to the maximum value of 350 in such cases (which will be if implemented in the ladder), IMO for every passive day the RD should increase by one, which shall fix most of these inconsistencies.

PS: I have no idea about what is wrong with the ladder currently, I see some disk quota exceeded error on the site. Hopefully we have not run out of DB space (?!)
Kolbur
Posts: 122
Joined: April 29th, 2009, 9:33 am

Re: Ladder Site Online...

Post by Kolbur »

milwac wrote: Haha.. what can I say, fact is sometimes stranger than fiction ;) But still I won't be dissapointed so much, because these results have always decreased the RD, never increased it. Say a player took a break for a year, and came back much more stronger than before. The RD should be set to the maximum value of 350 in such cases (which will be if implemented in the ladder), IMO for every passive day the RD should increase by one, which shall fix most of these inconsistencies.
I think there is your mistake. If the RD always decreases with each match no matter the outcome it's no wonder that people get stuck with their earlier ratings. plk2 for example started a lot weaker than he is today and improved over time but since his RD converged early it was impossible for him to improve his rating significantly. Now this seems pretty undesirable, don't you think? The actual mistake is that RD should go up for both players if a strong player loses vs a notably weaker opponent at least if both their RD is low. Their rating may not change by much but since their RD was low there seems to be something wrong with this result so the RD should go up and not down. Players (with the same number of games) that play very inconsistent (losing vs lower rated players but winning vs stronger ones too) should have a higher RD than players that play very consistent.
Some examples :

Before Match:
Winner Rating 2000 / Loser Rating 1500 / Winner RD 100 / Loser RD 200
After Match:
Winner Rating 2003 / Loser Rating 1308 / Winner RD 99 / Loser RD 193

Explanation : Winner is a good player and his rating is quite certain (low RD), so he doesn't gain much. Loser however has a high RD and hence he loses a lot of points.
This looks ok. It is the expected result so the RD goes down for both. I wonder why the winner gained any points though.

Before Match:
Winner Rating 1500 / Loser Rating 2200 / Winner RD 350 / Loser RD 100
After Match:
Winner Rating 2111 / Loser Rating 2198 / Winner RD 337 / Loser RD 99

Explanation : Here something unnatural happened. Loser with a low RD wasn't expected to lose this one. But even then since the winner has a high RD, the loser is punished less. The winner however jumps up, but still has a high RD (which means he can come down quickly too)
I'm not so sure about this one. It's not really that unnatural since the winner's RD basically means that he isn't rated so far. The ratings changes look ok. Winner RD should go down indeed (since it was so high before) but I don't know if the loser RD should change at all. Playing vs a high RD opponent means that there is not a lot of information to gain.

Before Match:
Winner Rating 1600 / Loser Rating 2200 / Winner RD 50 / Loser RD 100
After Match:
Winner Rating 1613 / Loser Rating 2199 / Winner RD 49.9 / Loser RD 99.5

Explanation : Somewhat sad, but since the winner had already played at a 1600 level for such a long time that even after outplaying his top ten counterpart comprehensively without any rng help still doesn't help his rating much.
Now this is definitely not correct. First the loser should lose more points than the winner gains since his RD is higher than the RD of the winner, no? Secondly RD should go up for both. The information gained from this match is that the current rating is not so stable than it seemed before so the RD needs to be adjusted. This way the winner can improve his rating over time more easily if he continues to beat higher rated players thanks to the higher RD (improved skill -> improved rating). The loser on the other hand risks even more points the next time he plays. Now if this result was just a stroke of luck the RD should go down again for both players in the successive games if they keep playing consistently in regard to their rating.

I guess it's because of how the ladder was treated to by most (now) veteran players. They might have lost a lot many games initially against weak players (because of luck etc), and their RD kinda stabilised by the time they started making big, so it was hard for their rating to recover. If you see players with more than 800 games, most of them are suffering from these dismal stats : neki, plk2, skb, and to some extent Dauntless as well. OTOH if you'd take a look at people who have played less games but have won most of them, they are right at the top. Some people who could be newbies either didn't stay too long, or were never challenged enough as their performances were not noticed. (In Glicko good new players are spotted instantly) You cannot really judge the effectiveness of glicko from recomputation of rankings running on a 4 year old Elo system. It is just wrong. Hence even if I feel the rating system should be clearly defined, I don't see much sense in this migration.

If you'd start afresh you'll see things as the way they are : Goldilocks and Primus_Pilus are where they should be.
Well, veteran players should be able to be at the top too obviously. If this is really how it is supposed to be then the Glicko rating is absolutely rubbish. But I doubt that. You probably guessed that already. :wink:
I'm not sure about the actual math involved with the Glicko RD changes but I suggest you look into again. I agree that RD should climb automatically over time if a player is passive but only slowly and not immediately.

PS: I have no idea about what is wrong with the ladder currently, I see some disk quota exceeded error on the site. Hopefully we have not run out of DB space (?!)
This is indeed the problem and the solution proposed so far is to delete all the old replays which can't be played vs the current version anyway.
User avatar
milwac
Posts: 29
Joined: April 9th, 2010, 11:40 am

Re: Ladder Site Online...

Post by milwac »

@Kolbur :

Firstly I'd like to discuss the inconsistencies in the 3rd example. After you mentioned your observation, I also thought that the rating changes were fishy which led me to discover a bug in my program where I was updating the wrong delta for the loser! Many thanks for this careful observation! The corrected ratings are updated.

Accordingly this is how the corrected examples look like -

#1.
Before Match:
Winner Rating 2000 / Loser Rating 1500 / Winner RD 100 / Loser RD 200
After Match:
Winner Rating 2003 / Loser Rating 1488 / Winner RD 99.1 / Loser RD 193.5

The rating increase for the winner is because she is assumed to lie between 1800-2200 and the loser is assumed to lie between 1100-1900, there is a slight intersection, (1800 beat a 1900) and hence the small rating change.

#2.
Before Match:
Winner Rating 1500 / Loser Rating 2200 / Winner RD 350 / Loser RD 100
After Match:
Winner Rating 2111 / Loser Rating 2165 / Winner RD 337.4 / Loser RD 99.6

Maybe my dropping the part after the decimal point bothered you ;)

#3.
Before Match:
Winner Rating 1600 / Loser Rating 2200 / Winner RD 50 / Loser RD 100
After Match:
Winner Rating 1613 / Loser Rating 2146 / Winner RD 49.9 / Loser RD 99.5

Now addressing your issue about unexpected results.
The actual mistake is that RD should go up for both players if a strong player loses vs a notably weaker opponent at least if both their RD is low. Their rating may not change by much but since their RD was low there seems to be something wrong with this result so the RD should go up and not down. Players (with the same number of games) that play very inconsistent (losing vs lower rated players but winning vs stronger ones too) should have a higher RD than players that play very consistent.
And what RD actually should be.
Secondly RD should go up for both. The information gained from this match is that the current rating is not so stable than it seemed before so the RD needs to be adjusted. This way the winner can improve his rating over time more easily if he continues to beat higher rated players thanks to the higher RD (improved skill -> improved rating). The loser on the other hand risks even more points the next time he plays. Now if this result was just a stroke of luck the RD should go down again for both players in the successive games if they keep playing consistently in regard to their rating.
Unfortunately glicko RD is not this factor you are talking about. I remember mentioning the need for some similar factor in one of my earlier posts. I presume the sigma factor in the glicko2 system could account for this. This factor is something which is inherent to a player, which makes her play exceptionally stronger or weaker at any given day, OTOH the glicko RD is the plain simple certainty measure obtained from past games, regardless of any inherent ability. But to reiterate what I said earlier, a constant increase over passive periods of play would almost certainly fix this anomaly which prevents strong players from moving up.

To put it simply, you cannot negate the observations from 100 games in the past from one anomalous game. More so, the anomalous game should be disregarded in this respect. (Glicko2 perhaps does something else and increases the RD as you mentioned) The 3rd example, bad as it looks, does look really unexpected doesn't it? Something like the 2200 had an urgent errand to run, and his 10 year old daughter took over the computer. The intention all along is to converge to a players' true skill, if RD had to increase every now and then, this won't be possible will it?

Any which way, it is always better to come up with suitable modifications and not adapt some system as it is. To this end, the variance of RD should be looked into.
Kolbur
Posts: 122
Joined: April 29th, 2009, 9:33 am

Re: Ladder Site Online...

Post by Kolbur »

I looked up glicko and the way milwac handled the RD values is correct. Only the RD increasing over time misses.
Now the glicko ranking looks a lot better already. There are still problems which make the Glicko rating system not very suitable for our needs.

I find the assumption that the certainty of the rating simply increases (RD decreases) with more games regardless of the outcomes a very weak one. It has the effect that players get stuck in their early rating range even if their playing skill increases. Now with glicko this is somewhat compensated by the RD value increasing over time (rating periods), this effect is not included in the numbers milwac provided so far. This has other unfortunate effects though. Players can only improve their rating after a long period of time if they have a low rating and low RD even if they actually improved significantly. This encourages them to create new accounts instead of using their old one because climbing in the rating is much easier like that. The time it takes to improve ones rating could be made shorter by letting the RD increase faster but then some less active players would end up with big RD values every time they play. The thing is we don't have something like rating periods. Some people play some games each day, others only a few per month and then there are a few who only play a low number of games per year. This is why we can't have a rating system that relies on time periods for it's mechanics.

Also you can see in the glicko rankings by milwac that there are a number of players in the top who only beat up low ranked players. Owlface is the most striking example. He has 30 wins and no losses, but there were barely any competent opponents he beat (only eyerouge, all others were < 1550 in elo).

I hope we can see some Trueskill numbers in the future too, I heard this rating system is a lot better... :wink:
User avatar
milwac
Posts: 29
Joined: April 9th, 2010, 11:40 am

Re: Ladder Site Online...

Post by milwac »

Upon the suggestion of tiboloid I looked up trueskill again and figured that the 'RD' in trueskill indeed takes into account the outcome of a game and doesn't always decrease. I was quite happy about this and wrote the implementation. Here are the results :

http://xntrick.comuf.com/wesnoth/trueskill_ranks.html

Hoping that the implementation is not buggy, it seems we can now be sure of which system to implement in the ladder!
User avatar
Rigor
Posts: 941
Joined: September 27th, 2007, 1:40 am

Re: Ladder Site Online...

Post by Rigor »

wow! this one looks very good!

btw: the ladder site is not working because the server is full of old replays and we have no more disc space - and it takes a little while to delete all old games (pre-1.8 version) yet.
moloch
Posts: 3
Joined: February 27th, 2011, 2:27 pm

Re: Ladder Site Online...

Post by moloch »

I noticed that there are some new reports, but I still can't report from 3 days.
When I try to report i get this 2 lines:

Warning: mysql_query(): supplied argument is not a valid MySQL-Link resource in /home/subversiva/data/www/ladder.subversiva.org/autologin.inc.php
on line 15
Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /home/subversiva/data/www/ladder.subversiva.org/autologin.inc.php on
line 16

The link that i reach is http://ladder.subversiva.org/report.php

The message is a normal access denied:

Access denied.

Please log in to use this function. Only members of the ladder can
access this page. Become one and compete today!

But I'm logged in as usual. :(
soul_steven
Posts: 144
Joined: September 5th, 2009, 5:47 pm

Re: Ladder Site Online...

Post by soul_steven »

moloch, Rigor explained the problem just a second ago. In short there is only a certain amount of space on the server (meaning hypothetically 300,000,000 games and replays can be reported and we have reached that limit.) They are deleting old replays so that temporarily the problem will be fixed just be patient.
anoel
Posts: 23
Joined: October 6th, 2010, 12:05 am

Re: Ladder Site Online...

Post by anoel »

if the rating becomes constant after a short while, isnt it a bit boring soon?
nelson
Posts: 91
Joined: March 19th, 2008, 11:15 pm
Contact:

Re: Ladder Site Online...

Post by nelson »

Don't you think that deleting all of the old replays is somewhat problematic? Does this mean that the Wesnoth ladder will have no history? Does this mean that if e.g. HODOR stops playing when the next version of Wesnoth comes out, that all of his games will eventually be deleted, and in the future nobody will be able to see his games and marvel at his skill?

I hope that we are at least backing up these old replays somewhere, for posterity.

Why don't we move to some other server with more space instead of deleting our history?
soul_steven
Posts: 144
Joined: September 5th, 2009, 5:47 pm

Re: Ladder Site Online...

Post by soul_steven »

nelson, as far as I know the replays from before 1.8 do not play anymore thus why they are being deleted. As for moving to a different more upgraded server i have no idea your guess is as good as mine.. Maybe we had the best possible? My only concern about this solution is that while it is a good temporary fix this problem will come up again in the future I would assume?
User avatar
Rigor
Posts: 941
Joined: September 27th, 2007, 1:40 am

Re: Ladder Site Online...

Post by Rigor »

nelson wrote:I hope that we are at least backing up these old replays somewhere, for posterity. Why don't we move to some other server with more space instead of deleting our history?
soul_steven wrote:this problem will come up again in the future I would assume?
marvelous conclusions, gentlemen. and who pays "some other server with more space"? :lol2: :lol2: :lol2: :lol2: :lol2:

army of loanshar...eh

reliable server providers, step forward!
moloch
Posts: 3
Joined: February 27th, 2011, 2:27 pm

Re: Ladder Site Online...

Post by moloch »

Who is actually paying the hosting? It could be a good idea accepting Paypal donations, I'm sure that in our community there are some guys who will be happy to give a small donation to the creators of the ladder for the hosting, the time spent to working at the ladder code and so on, so that we can help someone who's dedicating part of his time to give us a great gaming experience! :)
Maybe we can collect a list of users who can give a small contribute, we're so many!
Gallifax
Multiplayer Moderator
Posts: 137
Joined: October 23rd, 2006, 5:36 pm
Location: Who cares?

Re: Ladder Site Online...

Post by Gallifax »

Maybe no one will pay Rigor, but wouldnt it be sensible to think of another solution 1st before deleting history?:) Or at least ask people?



Why dont the replay play? I still got many of the pervious stable and dev versions on my harddrive, wel I just need to find on which pcs exactly, but I could still play very old replays.

I was just previously thinking if it wouldnt be possible to get old exe files somehwere for download so we could even see those 5 or 6 year old replays.
Post Reply