Ladder Site Online...

eyerouge · Post by **eyerouge** » September 1st, 2008, 12:11 pm

tsr: fixed.

wintermute: I think that the ladder code should support the easy change of the 400 constant, so I'll mess around with it in a near future and make it changable from the configuration file. Once that is done it will be easy converting the ladder to whatever value we want. Then we could try the 800 setting and see what impact it has.

Post by **Wintermute** » September 1st, 2008, 2:03 pm

eyerouge wrote:wintermute: [/b]I think that the ladder code should support the easy change of the 400 constant, so I'll mess around with it in a near future and make it changable from the configuration file. Once that is done it will be easy converting the ladder to whatever value we want. Then we could try the 800 setting and see what impact it has.

Sounds good!

eyerouge · Post by **eyerouge** » September 2nd, 2008, 9:05 am

Wintermute wrote:Sounds good!

Check it out. Was that what you hoped for? Print them out and have them side by side. I think you are correct that we should consider changing it to 800. Any more thoughts? Also, would you do me the favour of writing up a short text explainig the difference and what happens mathematically in a way that a 4-year old would understand it

I think we need to do it like that if we're changing something this big, people need to understand what happens and why, how it works, when we announce it.

It would be kind of cool to have some diagram over the win expectancy or at least a table or something, to see how it changes.

Post by **Wintermute** » September 3rd, 2008, 1:54 pm

eyerouge wrote:
Wintermute wrote:Sounds good!
Check it out. Was that what you hoped for? Print them out and have them side by side. I think you are correct that we should consider changing it to 800. Any more thoughts? Also, would you do me the favour of writing up a short text explainig the difference and what happens mathematically in a way that a 4-year old would understand it I think we need to do it like that if we're changing something this big, people need to understand what happens and why, how it works, when we announce it.

It would be kind of cool to have some diagram over the win expectancy or at least a table or something, to see how it changes.

I'll take a look tonight or tomorrow.

Angry Andersen · Post by **Angry Andersen** » September 5th, 2008, 1:51 pm

@Wintermute and eyerouge:

I followed your discussion on the rating formula. Here are some thoughts:

- The ELO rating system is inherently probabilistic: that is, even for chess games it is assumed to some games are won due to good or bad luck. Such 'luck' influences can have many forms, explicitly rolling dice is just one. Other (less obvious) influences would be fluctuations in players concentration (e.g. 'having a bad day'). Wouldn't it be 'lucky' to win a game against a good player and get lots of points, just because he had a bad day or was distracted for a moment?
What I'm saying is: don't overrate the differences of the influence of luck between chess and wesnoth. Even a game were the worlds best player only had a 60% percent chance of beating an average player (i.e. a very random game, with only little player influence) could be rated with the same procedure. But the spread of scores would be much smaller, which correctly reflects the limited influence of skill on game results.

- I have some doubts that your suggested change to the formula will solve the problem. By doing the 400-800 change, you are effectively rescaling all scores. That is, in the long run, you will get a ladder where players who were 200 points apart before the change will now be 400 points apart.

I would rather suggest, that the number of points won or lost in a game is reduced, so that more games are needed for bigger score changes. This in turn reduces the influence of luck.

A more sophisticated solution would be to reduce the influence of a single game on the score of a player for each game that person previously played. That way, a player with 1500 points and 20 games would NOT be treated the same as someone with 1500 points and 100 games. Such an approach would take the precision of the current ranking into account (i.e. smoothly decrease the k-value for a game based on how experienced players are).

BTW I suggested a self-made, statistically based scoring algorithm related to ELO and the Rasch-Model some months ago along with code in Matlab (see 'coders corner'). The algorithm should be able to deal with both 1v1 or team games and not only rates players, but also factions (...who will all get the same score eventually, since balancing in Wesnoth is excellent

). Tell me if you're interested.

Jozrael · Post by **Jozrael** » September 6th, 2008, 2:13 am

Any idea when the replay upload function will be reinstated? I played my first two games on the ladder (under nick: ZergRush) today and went 1-1 against the same person (overconfident in the first game and my leader got penned in by a dying orc army T_T).

nagyokos9 · Post by **nagyokos9** » September 6th, 2008, 8:47 am

It would be useful to add the option to search among ladder players by nationality on the ladder webpage.

Post by **Wintermute** » September 7th, 2008, 12:37 am

Angry Andersen wrote:- The ELO rating system is inherently probabilistic: that is, even for chess games it is assumed to some games are won due to good or bad luck. Such 'luck' influences can have many forms, explicitly rolling dice is just one. Other (less obvious) influences would be fluctuations in players concentration (e.g. 'having a bad day'). Wouldn't it be 'lucky' to win a game against a good player and get lots of points, just because he had a bad day or was distracted for a moment?
What I'm saying is: don't overrate the differences of the influence of luck between chess and wesnoth. Even a game were the worlds best player only had a 60% percent chance of beating an average player (i.e. a very random game, with only little player influence) could be rated with the same procedure. But the spread of scores would be much smaller, which correctly reflects the limited influence of skill on game results.

I understand what you are saying here, and I largely agree. Having played tournament chess for years (also, years ago

) I am quite aware of how even in chess you can walk away from a game and think "I should have won that game, it was just that one stupid move..." However, we're talking about games that are very different. I would argue that wesnoth is "closer" to poker than to chess (I love all three games we're talking about here), if I had to pick one. Wesnoth is game of calculated risk in the same way that poker is. Chess is not. I think that there is a deference between the uncalculatable factors that go into a chess victory, and real randomness. Bobby Fisher's famous 1972 match where he chose to not show up for a game in some sort of mind game (which might have worked, considering that he made quite the comeback after that) seems to me to be an example of what you call "luck" in chess. I maintain that this is different that luck in Wesnoth.

Angry Andersen wrote:- I have some doubts that your suggested change to the formula will solve the problem. By doing the 400-800 change, you are effectively rescaling all scores. That is, in the long run, you will get a ladder where players who were 200 points apart before the change will now be 400 points apart.

This is true. My idea would be to spread the scores out a bit while also tweaking the way that scores are calculated, with the goal of providing more stability. Fewer players bouncing around 5 ranks or more by losing 3 games in a row, for example. The key here is that more is going on that just doubling the distance between players.

Angry Andersen wrote:I would rather suggest, that the number of points won or lost in a game is reduced, so that more games are needed for bigger score changes. This in turn reduces the influence of luck.

A more sophisticated solution would be to reduce the influence of a single game on the score of a player for each game that person previously played. That way, a player with 1500 points and 20 games would NOT be treated the same as someone with 1500 points and 100 games. Such an approach would take the precision of the current ranking into account (i.e. smoothly decrease the k-value for a game based on how experienced players are).

Reducing the points awarded per game might have a similar long-term effect on stability, but there are two questions I have about that. First, is there a danger that by awarding a low number of points per game we introduce more rounding error into the calculations? I don't know for sure, but it seems like it could be a factor, since we currently do NOT keep fractional points. Second, since the whole ladder is still basically provisional (only a handful of players have more than 50 games played), it might take a long time to "get anywhere" if major changes in rating are on the order of 100s of games. Perhaps that would be fine later on after many hundreds of games have been played by a large enough quorum of players.

Angry Andersen wrote:BTW I suggested a self-made, statistically based scoring algorithm related to ELO and the Rasch-Model some months ago along with code in Matlab (see 'coders corner'). The algorithm should be able to deal with both 1v1 or team games and not only rates players, but also factions (...who will all get the same score eventually, since balancing in Wesnoth is excellent ). Tell me if you're interested.

I have not seen that, but it sounds interesting. I know nothing of the Rasch-Model, do you know of a good place to get some info? AFAIK, no one is dead-set on Elo, but it is just what has been easy to implement/does a pretty good job.

eyerouge · Post by **eyerouge** » September 7th, 2008, 1:32 am

Angry Andersen wrote:- The ELO rating system is inherently probabilistic: that is, even for chess games it is assumed to some games are won due to good or bad luck. Such 'luck' influences can have many forms, explicitly rolling dice is just one. Other (less obvious) influences would be fluctuations in players concentration (e.g. 'having a bad day').

I'm actually into the bunch of people (albeit we're not many) that agree with you. I personally also believe that luck has much less to do with the average Wesnoth game than most players, especially newcomers, seem to believe.

I base it on the fact that good players seem to keep winning most of the time or more often even if the RND is "against them", since being good per definition also means that you can handle the risks better in a game where there are "random" elements like dices etc.

I also don't think that an umodified Elo (more or less the current system we use) that doesn't try to compensate for the luck factor is a huge problem. The suggestion was originally Wintermutes.

That said, I do however agree with what he wants to accomplish, even if I'm clueless about the correctness of the method he suggested.

(Due to my own lack of math knowledge, nothing else...)

If you guys look at the pdf:s I posted you'll see that the numbers seem to make way more sense in the 800 sheets. Why? Because:

Wintermute wrote:Wesnoth is game of calculated risk in the same way that poker is.

Allthough I don't agree with Wintermute that Wesnoth is closer to poker than to chess since I'd claim the very opposite, I would still maintain that wesnoth is closer to poker than chess is close to poker. I would also insist that Wesnoth has way more inherent and built in random opportunities/variables than chess has. Proving it is easy: In wesnoth you have everything you can find in chess, but more of it (units, combos, space etc). But more importantly: Whatever can distract you in chess can distract you in wesnoth. And on top of that, you have a RND that is built in. You don't have that in chess. That makes Wesnoth, indeed, into a game that is closer to poker than chess is.

We still don't have players that are beyond 1900. That's potentially a problem. It seems as if it's hard to climb beyond that or even to that point. Question is why. Answer lies in part in what Wintermute is discussing - the influence of RND. (I would however argue that since all players can pick opposition, they should keep to players in their own class and by doing so see to it that they don't lose chunks of points by playing a much lower rated player and lose the game due to RND going crazy)

Angry Andersen wrote:I would rather suggest, that the number of points won or lost in a game is reduced, so that more games are needed for bigger score changes. This in turn reduces the influence of luck.

Our previous problem was that we used too low K values and movement on the ladder was too small. If you take into account the patience people have and the number of Wesnoth games they'll play on the ladder in average, among other things, the result can easily become a dead ladder or one where there is little to none distinction between players in the eyes of the players.

Also, the smaller each movement is, the longer time it takes for each player to reach his/her "true" rating.

Angry Andersen wrote:A more sophisticated solution would be to reduce the influence of a single game on the score of a player for each game that person previously played. That way, a player with 1500 points and 20 games would NOT be treated the same as someone with 1500 points and 100 games. Such an approach would take the precision of the current ranking into account (i.e. smoothly decrease the k-value for a game based on how experienced players are).

I have been thinking about a way to measure rating accuracy for a very long time and would love to see some concrete way to do it. Bring it on. Show me how it can be done with a formula and explain it to me like I'm 4 years old and we could see if it works and should be coded or not. Sounds like a great feature to the ladder system.

Btw: The K value is already decreased for players that are more experienced. The more games you play, the closer you get to a lower K value. A game doesn't have a K value: The two players do. And each one of them have their own depending on their current Elo.

You can see settings and the exact formula in the SVN of the ladder project, in the elo.class.

Wintermute wrote:Second, since the whole ladder is still basically provisional (only a handful of players have more than 50 games played), it might take a long time to "get anywhere" if major changes in rating are on the order of 100s of games. Perhaps that would be fine later on after many hundreds of games have been played by a large enough quorum of players.

I think Winter is on to something here: We must act in a pragmatic way, and adjust later on if needed. Most players don't play that many games, and we must take that into account by creating a system that is usable in such a multiplayer setting. The ladder has been around for exactly one year now and it hasn't picked up some steam until just recently. And even now we only have a max of 20 games/day. Overall, only 1% (!) of all Wesnoth games are ladder games, according to Wesnoth server loggs last time I saw them.

Wintermute wrote:I have not seen that, but it sounds interesting. I know nothing of the Rasch-Model, do you know of a good place to get some info? AFAIK, no one is dead-set on Elo, but it is just what has been easy to implement/does a pretty good job

mr russ has actually written some code so that the ladder can use a totally different system than Elo - the Glicko(2?). I think he has the code more than half done and that it works as intended, but we haven't implemented it yet since he seems buried in IRL stuff right now.

It would be super cool to let the ladder system support as many rating systems as possible - whoever is interested in writing a small class (not much code at all really) is welcome to do so, just tell me and I'll set you up with everything you need. The wesnoth ladder will however keep on using Elo until we have some solid proof that it's better to switch over to an alternative rating system and that system is already integrated into the ladder.

Wintermute: Did you look at th pdf:s? Still up for it? And where's that write-up so people like me can understand it?

Joz wrote: Any idea when the replay upload function will be reinstated?

No, it's in the hands of chains and his friend and involves some physical fiddling with the computer the site is on. I can't influence it since it's across the Atlantic for me

Post by **Wintermute** » September 7th, 2008, 3:08 am

eyerouge wrote:Wintermute: Did you look at th pdf:s? Still up for it? And where's that write-up so people like me can understand it?

It seems to do about what expected. I am still for it, and I will think about a brief write-up.

Zlodzei · Post by **Zlodzei** » September 8th, 2008, 2:48 pm

ladder only works good when you play against rivals with ~ the same rating. it cant be fixed. ever.

Post by **Doc Paterson** » September 8th, 2008, 4:18 pm

Zlodzei wrote:ladder only works good when you play against rivals with ~ the same rating. it cant be fixed. ever.

Nice to know.

Angry Andersen · Post by **Angry Andersen** » September 9th, 2008, 4:13 pm

Bellow is a suggested formula for a smoothly decreasing k-value in an ELO-type rating system (the rate at which scores change with each reported game is proportional to the k-value, i.e. big values=fast change, small values=slow change). Such a formula prevents an arbitrary division between provisional and non-provisional players, by placing every player somewhere on a provisional - non-provisional continuum based on the number of games played. With an appropriate choice of parameters, such a function might add a lot to the stability of the ladder.

k=40*exp(-N/100), where N is the total number of games played

(unfortunately I don't know how to insert a plot of the function into this message)

The function starts at k=40 for a player's first game and then decreases smoothly to a value of ~15 after 100 games played and ~5 after 200 games. So after 200 games, the rating of the player will only change at 1/8th of the speed that it did for the first games.
Maybe the k-values should have some absolute lower limit C, in which the general shape of the formula would become k=A*exp(-N/B)+C

I suggested my self-made algorithm to the TripleA ladder. They have a pretty similar discussion about rating systems, but unfortunately these guys are less active than the Wesnoth community, so things move slowly. If anyone is interested, the posts can be found in the TripleA-forums:
http://tripleawarclub.org/forums/index. ... topic=1346

Implementing different systems in parallel, as suggested by eyerouge, would be brilliant for testing purposes!

@eyerouge & Wintermute: as far as I understand, we all agree that Chess, BfW and Poker all involve some amount of randomness. The amount of randomness obviously differs between all 3, but there is no principal difference in applying an ELO-type rating system to them. I suggest that k-values should be lower for games that involve more randomness, since obviously more games will be needed to estimate a players playstrength.

Angry Andersen · Post by **Angry Andersen** » September 9th, 2008, 4:35 pm

Wintermute wrote:I have not seen that, but it sounds interesting. I know nothing of the Rasch-Model, do you know of a good place to get some info? AFAIK, no one is dead-set on Elo, but it is just what has been easy to implement/does a pretty good job.

The Rasch-Model is a psychological-mathematical model. It is the theoretical basis of psychometric tests, e.g. intelligence tests. The ELO-system is similar to this model in some ways, i.e. the ELO-system can be derived from the principles underlying the Rasch-model. Wikipedia ( http://en.wikipedia.org/wiki/Rasch_model ) has some information on this model. For the mathematically inclined, this would be the correct place to start in order to obtain a deeper understanding of rating procedures in general. There is a whole science called 'test theory' which is devoted to estimating persons abilities based on responses to 'test items' (a test item here would be a single game between different players).

Post by **Wintermute** » September 9th, 2008, 6:03 pm

Angry Andersen wrote:k=40*exp(-N/100), where N is the total number of games played...
The function starts at k=40 for a player's first game and then decreases smoothly to a value of ~15 after 100 games played and ~5 after 200 games. So after 200 games, the rating of the player will only change at 1/8th of the speed that it did for the first games.
Maybe the k-values should have some absolute lower limit C, in which the general shape of the formula would become k=A*exp(-N/B)+C

Implementing different systems in parallel, as suggested by eyerouge, would be brilliant for testing purposes!

I agree, that testing things with the data we have is a good place to go. I think that your model is likely the way to go "down the road" if not right now. The potential issue that I see is that if we use your model with the current stats, there are simply not enough games played for this to be any improvement on the current system. Again, this may or may not be the case, but what I could imagine happening is that (since few people have played even 100 games), ratings in general will be low, and that seems like it would be LESS stable whenever a good, new player joins. That player can play 20 games, win almost all of them, and perhaps shoot up OVER the heads of everyone else. Keeping in mind that player's ranks for those that have played lots of games will be moving slowly. Then such a hotshot newcomer might lose 5 games against top players and plummet down almost equally as fast. This is of course speculation - I haven't run the numbers, but have you considered/dismissed such possibilities?

I also wonder, as I mentioned before about rounding off low K-values causing poor results? If you have only 5 points to play with, won't this encourage preying on weak players, because you always get at least 1 point, and can't really get that many points from anyone?

The Battle for Wesnoth Forums

Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Search by nationality among Ladder players?

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...