Ladder Site Online...
Moderator: Forum Moderators
Re: Ladder Site Online...
tsr: fixed.
wintermute: I think that the ladder code should support the easy change of the 400 constant, so I'll mess around with it in a near future and make it changable from the configuration file. Once that is done it will be easy converting the ladder to whatever value we want. Then we could try the 800 setting and see what impact it has.
wintermute: I think that the ladder code should support the easy change of the 400 constant, so I'll mess around with it in a near future and make it changable from the configuration file. Once that is done it will be easy converting the ladder to whatever value we want. Then we could try the 800 setting and see what impact it has.
- Wintermute
- Inactive Developer
- Posts: 840
- Joined: March 23rd, 2006, 10:28 pm
- Location: On IRC as "happygrue" at: #wesnoth-mp
Re: Ladder Site Online...
Sounds good!eyerouge wrote:wintermute: [/b]I think that the ladder code should support the easy change of the 400 constant, so I'll mess around with it in a near future and make it changable from the configuration file. Once that is done it will be easy converting the ladder to whatever value we want. Then we could try the 800 setting and see what impact it has.
"I just started playing this game a few days ago, and I already see some balance issues."
Re: Ladder Site Online...
Check it out. Was that what you hoped for? Print them out and have them side by side. I think you are correct that we should consider changing it to 800. Any more thoughts? Also, would you do me the favour of writing up a short text explainig the difference and what happens mathematically in a way that a 4-year old would understand itWintermute wrote:Sounds good!

It would be kind of cool to have some diagram over the win expectancy or at least a table or something, to see how it changes.
- Attachments
-
400vs800.tar.bz2
- Pdf:s of 3 players history and the ladders when using a constant of 400 vs 800 in the Elo formula.
- (1.76 MiB) Downloaded 312 times
- Wintermute
- Inactive Developer
- Posts: 840
- Joined: March 23rd, 2006, 10:28 pm
- Location: On IRC as "happygrue" at: #wesnoth-mp
Re: Ladder Site Online...
I'll take a look tonight or tomorrow.eyerouge wrote:Check it out. Was that what you hoped for? Print them out and have them side by side. I think you are correct that we should consider changing it to 800. Any more thoughts? Also, would you do me the favour of writing up a short text explainig the difference and what happens mathematically in a way that a 4-year old would understand itWintermute wrote:Sounds good!I think we need to do it like that if we're changing something this big, people need to understand what happens and why, how it works, when we announce it.
It would be kind of cool to have some diagram over the win expectancy or at least a table or something, to see how it changes.

"I just started playing this game a few days ago, and I already see some balance issues."
-
- Posts: 205
- Joined: September 15th, 2006, 1:22 pm
Re: Ladder Site Online...
@Wintermute and eyerouge:
I followed your discussion on the rating formula. Here are some thoughts:
- The ELO rating system is inherently probabilistic: that is, even for chess games it is assumed to some games are won due to good or bad luck. Such 'luck' influences can have many forms, explicitly rolling dice is just one. Other (less obvious) influences would be fluctuations in players concentration (e.g. 'having a bad day'). Wouldn't it be 'lucky' to win a game against a good player and get lots of points, just because he had a bad day or was distracted for a moment?
What I'm saying is: don't overrate the differences of the influence of luck between chess and wesnoth. Even a game were the worlds best player only had a 60% percent chance of beating an average player (i.e. a very random game, with only little player influence) could be rated with the same procedure. But the spread of scores would be much smaller, which correctly reflects the limited influence of skill on game results.
- I have some doubts that your suggested change to the formula will solve the problem. By doing the 400-800 change, you are effectively rescaling all scores. That is, in the long run, you will get a ladder where players who were 200 points apart before the change will now be 400 points apart.
I would rather suggest, that the number of points won or lost in a game is reduced, so that more games are needed for bigger score changes. This in turn reduces the influence of luck.
A more sophisticated solution would be to reduce the influence of a single game on the score of a player for each game that person previously played. That way, a player with 1500 points and 20 games would NOT be treated the same as someone with 1500 points and 100 games. Such an approach would take the precision of the current ranking into account (i.e. smoothly decrease the k-value for a game based on how experienced players are).
BTW I suggested a self-made, statistically based scoring algorithm related to ELO and the Rasch-Model some months ago along with code in Matlab (see 'coders corner'). The algorithm should be able to deal with both 1v1 or team games and not only rates players, but also factions (...who will all get the same score eventually, since balancing in Wesnoth is excellent
). Tell me if you're interested.
I followed your discussion on the rating formula. Here are some thoughts:
- The ELO rating system is inherently probabilistic: that is, even for chess games it is assumed to some games are won due to good or bad luck. Such 'luck' influences can have many forms, explicitly rolling dice is just one. Other (less obvious) influences would be fluctuations in players concentration (e.g. 'having a bad day'). Wouldn't it be 'lucky' to win a game against a good player and get lots of points, just because he had a bad day or was distracted for a moment?
What I'm saying is: don't overrate the differences of the influence of luck between chess and wesnoth. Even a game were the worlds best player only had a 60% percent chance of beating an average player (i.e. a very random game, with only little player influence) could be rated with the same procedure. But the spread of scores would be much smaller, which correctly reflects the limited influence of skill on game results.
- I have some doubts that your suggested change to the formula will solve the problem. By doing the 400-800 change, you are effectively rescaling all scores. That is, in the long run, you will get a ladder where players who were 200 points apart before the change will now be 400 points apart.
I would rather suggest, that the number of points won or lost in a game is reduced, so that more games are needed for bigger score changes. This in turn reduces the influence of luck.
A more sophisticated solution would be to reduce the influence of a single game on the score of a player for each game that person previously played. That way, a player with 1500 points and 20 games would NOT be treated the same as someone with 1500 points and 100 games. Such an approach would take the precision of the current ranking into account (i.e. smoothly decrease the k-value for a game based on how experienced players are).
BTW I suggested a self-made, statistically based scoring algorithm related to ELO and the Rasch-Model some months ago along with code in Matlab (see 'coders corner'). The algorithm should be able to deal with both 1v1 or team games and not only rates players, but also factions (...who will all get the same score eventually, since balancing in Wesnoth is excellent

Re: Ladder Site Online...
Any idea when the replay upload function will be reinstated? I played my first two games on the ladder (under nick: ZergRush) today and went 1-1 against the same person (overconfident in the first game and my leader got penned in by a dying orc army T_T).
Search by nationality among Ladder players?
It would be useful to add the option to search among ladder players by nationality on the ladder webpage.
- Wintermute
- Inactive Developer
- Posts: 840
- Joined: March 23rd, 2006, 10:28 pm
- Location: On IRC as "happygrue" at: #wesnoth-mp
Re: Ladder Site Online...
I understand what you are saying here, and I largely agree. Having played tournament chess for years (also, years agoAngry Andersen wrote:- The ELO rating system is inherently probabilistic: that is, even for chess games it is assumed to some games are won due to good or bad luck. Such 'luck' influences can have many forms, explicitly rolling dice is just one. Other (less obvious) influences would be fluctuations in players concentration (e.g. 'having a bad day'). Wouldn't it be 'lucky' to win a game against a good player and get lots of points, just because he had a bad day or was distracted for a moment?
What I'm saying is: don't overrate the differences of the influence of luck between chess and wesnoth. Even a game were the worlds best player only had a 60% percent chance of beating an average player (i.e. a very random game, with only little player influence) could be rated with the same procedure. But the spread of scores would be much smaller, which correctly reflects the limited influence of skill on game results.

This is true. My idea would be to spread the scores out a bit while also tweaking the way that scores are calculated, with the goal of providing more stability. Fewer players bouncing around 5 ranks or more by losing 3 games in a row, for example. The key here is that more is going on that just doubling the distance between players.Angry Andersen wrote:- I have some doubts that your suggested change to the formula will solve the problem. By doing the 400-800 change, you are effectively rescaling all scores. That is, in the long run, you will get a ladder where players who were 200 points apart before the change will now be 400 points apart.
Reducing the points awarded per game might have a similar long-term effect on stability, but there are two questions I have about that. First, is there a danger that by awarding a low number of points per game we introduce more rounding error into the calculations? I don't know for sure, but it seems like it could be a factor, since we currently do NOT keep fractional points. Second, since the whole ladder is still basically provisional (only a handful of players have more than 50 games played), it might take a long time to "get anywhere" if major changes in rating are on the order of 100s of games. Perhaps that would be fine later on after many hundreds of games have been played by a large enough quorum of players.Angry Andersen wrote:I would rather suggest, that the number of points won or lost in a game is reduced, so that more games are needed for bigger score changes. This in turn reduces the influence of luck.
A more sophisticated solution would be to reduce the influence of a single game on the score of a player for each game that person previously played. That way, a player with 1500 points and 20 games would NOT be treated the same as someone with 1500 points and 100 games. Such an approach would take the precision of the current ranking into account (i.e. smoothly decrease the k-value for a game based on how experienced players are).
I have not seen that, but it sounds interesting. I know nothing of the Rasch-Model, do you know of a good place to get some info? AFAIK, no one is dead-set on Elo, but it is just what has been easy to implement/does a pretty good job.Angry Andersen wrote:BTW I suggested a self-made, statistically based scoring algorithm related to ELO and the Rasch-Model some months ago along with code in Matlab (see 'coders corner'). The algorithm should be able to deal with both 1v1 or team games and not only rates players, but also factions (...who will all get the same score eventually, since balancing in Wesnoth is excellent). Tell me if you're interested.
"I just started playing this game a few days ago, and I already see some balance issues."
Re: Ladder Site Online...
I'm actually into the bunch of people (albeit we're not many) that agree with you. I personally also believe that luck has much less to do with the average Wesnoth game than most players, especially newcomers, seem to believe.Angry Andersen wrote:- The ELO rating system is inherently probabilistic: that is, even for chess games it is assumed to some games are won due to good or bad luck. Such 'luck' influences can have many forms, explicitly rolling dice is just one. Other (less obvious) influences would be fluctuations in players concentration (e.g. 'having a bad day').
I base it on the fact that good players seem to keep winning most of the time or more often even if the RND is "against them", since being good per definition also means that you can handle the risks better in a game where there are "random" elements like dices etc.
I also don't think that an umodified Elo (more or less the current system we use) that doesn't try to compensate for the luck factor is a huge problem. The suggestion was originally Wintermutes.
That said, I do however agree with what he wants to accomplish, even if I'm clueless about the correctness of the method he suggested.

If you guys look at the pdf:s I posted you'll see that the numbers seem to make way more sense in the 800 sheets. Why? Because:
Allthough I don't agree with Wintermute that Wesnoth is closer to poker than to chess since I'd claim the very opposite, I would still maintain that wesnoth is closer to poker than chess is close to poker. I would also insist that Wesnoth has way more inherent and built in random opportunities/variables than chess has. Proving it is easy: In wesnoth you have everything you can find in chess, but more of it (units, combos, space etc). But more importantly: Whatever can distract you in chess can distract you in wesnoth. And on top of that, you have a RND that is built in. You don't have that in chess. That makes Wesnoth, indeed, into a game that is closer to poker than chess is.Wintermute wrote:Wesnoth is game of calculated risk in the same way that poker is.
We still don't have players that are beyond 1900. That's potentially a problem. It seems as if it's hard to climb beyond that or even to that point. Question is why. Answer lies in part in what Wintermute is discussing - the influence of RND. (I would however argue that since all players can pick opposition, they should keep to players in their own class and by doing so see to it that they don't lose chunks of points by playing a much lower rated player and lose the game due to RND going crazy)
Our previous problem was that we used too low K values and movement on the ladder was too small. If you take into account the patience people have and the number of Wesnoth games they'll play on the ladder in average, among other things, the result can easily become a dead ladder or one where there is little to none distinction between players in the eyes of the players.Angry Andersen wrote:I would rather suggest, that the number of points won or lost in a game is reduced, so that more games are needed for bigger score changes. This in turn reduces the influence of luck.
Also, the smaller each movement is, the longer time it takes for each player to reach his/her "true" rating.
I have been thinking about a way to measure rating accuracy for a very long time and would love to see some concrete way to do it. Bring it on. Show me how it can be done with a formula and explain it to me like I'm 4 years old and we could see if it works and should be coded or not. Sounds like a great feature to the ladder system.Angry Andersen wrote:A more sophisticated solution would be to reduce the influence of a single game on the score of a player for each game that person previously played. That way, a player with 1500 points and 20 games would NOT be treated the same as someone with 1500 points and 100 games. Such an approach would take the precision of the current ranking into account (i.e. smoothly decrease the k-value for a game based on how experienced players are).
Btw: The K value is already decreased for players that are more experienced. The more games you play, the closer you get to a lower K value. A game doesn't have a K value: The two players do. And each one of them have their own depending on their current Elo.
You can see settings and the exact formula in the SVN of the ladder project, in the elo.class.
I think Winter is on to something here: We must act in a pragmatic way, and adjust later on if needed. Most players don't play that many games, and we must take that into account by creating a system that is usable in such a multiplayer setting. The ladder has been around for exactly one year now and it hasn't picked up some steam until just recently. And even now we only have a max of 20 games/day. Overall, only 1% (!) of all Wesnoth games are ladder games, according to Wesnoth server loggs last time I saw them.Wintermute wrote:Second, since the whole ladder is still basically provisional (only a handful of players have more than 50 games played), it might take a long time to "get anywhere" if major changes in rating are on the order of 100s of games. Perhaps that would be fine later on after many hundreds of games have been played by a large enough quorum of players.
mr russ has actually written some code so that the ladder can use a totally different system than Elo - the Glicko(2?). I think he has the code more than half done and that it works as intended, but we haven't implemented it yet since he seems buried in IRL stuff right now.Wintermute wrote:I have not seen that, but it sounds interesting. I know nothing of the Rasch-Model, do you know of a good place to get some info? AFAIK, no one is dead-set on Elo, but it is just what has been easy to implement/does a pretty good job
It would be super cool to let the ladder system support as many rating systems as possible - whoever is interested in writing a small class (not much code at all really) is welcome to do so, just tell me and I'll set you up with everything you need. The wesnoth ladder will however keep on using Elo until we have some solid proof that it's better to switch over to an alternative rating system and that system is already integrated into the ladder.
Wintermute: Did you look at th pdf:s? Still up for it? And where's that write-up so people like me can understand it?

No, it's in the hands of chains and his friend and involves some physical fiddling with the computer the site is on. I can't influence it since it's across the Atlantic for meJoz wrote: Any idea when the replay upload function will be reinstated?

- Wintermute
- Inactive Developer
- Posts: 840
- Joined: March 23rd, 2006, 10:28 pm
- Location: On IRC as "happygrue" at: #wesnoth-mp
Re: Ladder Site Online...
It seems to do about what expected. I am still for it, and I will think about a brief write-up.eyerouge wrote:Wintermute: Did you look at th pdf:s? Still up for it? And where's that write-up so people like me can understand it?
"I just started playing this game a few days ago, and I already see some balance issues."
Re: Ladder Site Online...
ladder only works good when you play against rivals with ~ the same rating. it cant be fixed. ever.
I can see you!...
- Doc Paterson
- Drake Cartographer
- Posts: 1973
- Joined: February 21st, 2005, 9:37 pm
- Location: Kazakh
- Contact:
Re: Ladder Site Online...
Nice to know.Zlodzei wrote:ladder only works good when you play against rivals with ~ the same rating. it cant be fixed. ever.

I will not tell you my corner / where threads don't get locked because of mostly no reason /
because I don't want your hostile disease / to spread all over the world.
I prefer that corner to remain hidden /
without your noses. -Nosebane, Sorcerer Supreme
because I don't want your hostile disease / to spread all over the world.
I prefer that corner to remain hidden /
without your noses. -Nosebane, Sorcerer Supreme
-
- Posts: 205
- Joined: September 15th, 2006, 1:22 pm
Re: Ladder Site Online...
Bellow is a suggested formula for a smoothly decreasing k-value in an ELO-type rating system (the rate at which scores change with each reported game is proportional to the k-value, i.e. big values=fast change, small values=slow change). Such a formula prevents an arbitrary division between provisional and non-provisional players, by placing every player somewhere on a provisional - non-provisional continuum based on the number of games played. With an appropriate choice of parameters, such a function might add a lot to the stability of the ladder.
k=40*exp(-N/100), where N is the total number of games played
(unfortunately I don't know how to insert a plot of the function into this message)
The function starts at k=40 for a player's first game and then decreases smoothly to a value of ~15 after 100 games played and ~5 after 200 games. So after 200 games, the rating of the player will only change at 1/8th of the speed that it did for the first games.
Maybe the k-values should have some absolute lower limit C, in which the general shape of the formula would become k=A*exp(-N/B)+C
I suggested my self-made algorithm to the TripleA ladder. They have a pretty similar discussion about rating systems, but unfortunately these guys are less active than the Wesnoth community, so things move slowly. If anyone is interested, the posts can be found in the TripleA-forums:
http://tripleawarclub.org/forums/index. ... topic=1346
Implementing different systems in parallel, as suggested by eyerouge, would be brilliant for testing purposes!
@eyerouge & Wintermute: as far as I understand, we all agree that Chess, BfW and Poker all involve some amount of randomness. The amount of randomness obviously differs between all 3, but there is no principal difference in applying an ELO-type rating system to them. I suggest that k-values should be lower for games that involve more randomness, since obviously more games will be needed to estimate a players playstrength.
k=40*exp(-N/100), where N is the total number of games played
(unfortunately I don't know how to insert a plot of the function into this message)
The function starts at k=40 for a player's first game and then decreases smoothly to a value of ~15 after 100 games played and ~5 after 200 games. So after 200 games, the rating of the player will only change at 1/8th of the speed that it did for the first games.
Maybe the k-values should have some absolute lower limit C, in which the general shape of the formula would become k=A*exp(-N/B)+C
I suggested my self-made algorithm to the TripleA ladder. They have a pretty similar discussion about rating systems, but unfortunately these guys are less active than the Wesnoth community, so things move slowly. If anyone is interested, the posts can be found in the TripleA-forums:
http://tripleawarclub.org/forums/index. ... topic=1346
Implementing different systems in parallel, as suggested by eyerouge, would be brilliant for testing purposes!
@eyerouge & Wintermute: as far as I understand, we all agree that Chess, BfW and Poker all involve some amount of randomness. The amount of randomness obviously differs between all 3, but there is no principal difference in applying an ELO-type rating system to them. I suggest that k-values should be lower for games that involve more randomness, since obviously more games will be needed to estimate a players playstrength.
-
- Posts: 205
- Joined: September 15th, 2006, 1:22 pm
Re: Ladder Site Online...
The Rasch-Model is a psychological-mathematical model. It is the theoretical basis of psychometric tests, e.g. intelligence tests. The ELO-system is similar to this model in some ways, i.e. the ELO-system can be derived from the principles underlying the Rasch-model. Wikipedia ( http://en.wikipedia.org/wiki/Rasch_model ) has some information on this model. For the mathematically inclined, this would be the correct place to start in order to obtain a deeper understanding of rating procedures in general. There is a whole science called 'test theory' which is devoted to estimating persons abilities based on responses to 'test items' (a test item here would be a single game between different players).Wintermute wrote:I have not seen that, but it sounds interesting. I know nothing of the Rasch-Model, do you know of a good place to get some info? AFAIK, no one is dead-set on Elo, but it is just what has been easy to implement/does a pretty good job.
- Wintermute
- Inactive Developer
- Posts: 840
- Joined: March 23rd, 2006, 10:28 pm
- Location: On IRC as "happygrue" at: #wesnoth-mp
Re: Ladder Site Online...
I agree, that testing things with the data we have is a good place to go. I think that your model is likely the way to go "down the road" if not right now. The potential issue that I see is that if we use your model with the current stats, there are simply not enough games played for this to be any improvement on the current system. Again, this may or may not be the case, but what I could imagine happening is that (since few people have played even 100 games), ratings in general will be low, and that seems like it would be LESS stable whenever a good, new player joins. That player can play 20 games, win almost all of them, and perhaps shoot up OVER the heads of everyone else. Keeping in mind that player's ranks for those that have played lots of games will be moving slowly. Then such a hotshot newcomer might lose 5 games against top players and plummet down almost equally as fast. This is of course speculation - I haven't run the numbers, but have you considered/dismissed such possibilities?Angry Andersen wrote:k=40*exp(-N/100), where N is the total number of games played...
The function starts at k=40 for a player's first game and then decreases smoothly to a value of ~15 after 100 games played and ~5 after 200 games. So after 200 games, the rating of the player will only change at 1/8th of the speed that it did for the first games.
Maybe the k-values should have some absolute lower limit C, in which the general shape of the formula would become k=A*exp(-N/B)+C
Implementing different systems in parallel, as suggested by eyerouge, would be brilliant for testing purposes!
I also wonder, as I mentioned before about rounding off low K-values causing poor results? If you have only 5 points to play with, won't this encourage preying on weak players, because you always get at least 1 point, and can't really get that many points from anyone?
"I just started playing this game a few days ago, and I already see some balance issues."