Ladder Site Online...

DrinianRzor · Post by **DrinianRzor** » April 9th, 2008, 11:00 pm

I agree with having two separate ratings for teams and individual players. As I mentioned to Morphine on the server, i suggest not letting Isar's Cross be a valid 2v2 map for the ladder. Personally I find that its not as well balanced between all civs and IMO luck is a larger factor.

eyerouge · Post by **eyerouge** » April 10th, 2008, 12:20 am

morphine:

The main issue is that new teams would always start at 1500 rating, regardless of the players skill. This is always an issue with a new ladder, when beginners get dominated by very good players, and lose a lot of rating from it. However, once enough games are played and newcomers (especially already experienced players) become a small percentage of the ladder, this issue becomes irrelevant.

Actually it's not much of an issue with a ladder system that all new players have an entry rating: It must be so, and as you write yourself, it is only "problematic" until the newcomer has been correctly rated by playing enough games. Elo gets more accurate the more games you play. As do most other systems. If not, it seems they're flawed.

The problem with Wesnoth is that the amount of ladder games played is low compared to other games using an ELO rating. Games take a while and most players aren't looking for hardcore gaming.

I've heard this argument from chains as well. I believe it's not true: Elo systems are used for a million type of games which all take different amount of effort and/or time to finish. Chess, for one, is not a game where a normal average player has played 1000 rated games. That would take long time accomplish in most chess clubs, yet they all use Elo all the time. When Elo was invented chess wasn't even played on the internet, which means they have to be on location etc.

In either case, there are at least two-three players already who are closing to 100 games on the ladder. Some of them are more or less new to it. A game of Wesnoth doesn't take much longer (if longer at all) than an average chess game. Even if they don't match exactly, the difference isn't large enough to matter.

I also contest that it matters how players use the ladder, if they're hardcore or not: I myself am not. I love to use it anyways since it keeps track of my progress. A hardcore player would of course have similar inetentions, but a totally different mindset than me. Yet, we can both use the same tool and benefit from the statistics.

Let's, for the sake of argument, assume wesnoth players really do play less games than chess for example: We are in the middle of a process where we're evaluating the rating system. This far we've come to the conclusion that our K-value is to low. It's set on 24 now. We will probably raise it and recalculate everyones rating of course. Chains have already written the code for it. Having a higher K would mean somrthing like "it takes less games to be more properly rated than now". I won't go into the details here as it would require plenty of explaining & he would be able to do it better than me, but the result is that a higher K will widen the range between the top / bottom players and also make the rating system more "sensitive" to a game than it is now. In essence, it won't matter much that lesser games are played if we change the current Elo somewhat.

Even on the current 1v1 ladder, we can see a lot of beginners getting trashed by experienced players. They don't stand the slightest chance, yet they have a very close rating. Of course, the more we play on the ladder, the more this issue is fading away. // However with a 2v2 ladder, there will be even less games played by the exact same teams, especially if people want to play with many different allies, which is, I think, an important part of the fun in Wesnoth.

We are also fixing this and have almost agreed on a solution to simply only give points to the higher rated player if he isn't 200 elo points above the lesser rated player. We will probably also use other ranges here, to modify the Elo the winner would get, and the modifier would be a % based on the elo difference between them.

Examples:

1. Higher rated player wins against lower rated player. Current Elo would perhaps give him X points, even if he has an Elo rating difference that is >= 200. Our idea suggest that we set the modifier to 0 since he has 200 more points. So, he'd win X * 0 = 0 elo points by beating the lesser skilled player. An example of how this would change the ladder can already be seen as there is one player in particular who excels on picking apart much more lower skilled players. (And no, I won't mention names, but what is in the ladder is already public, so go figure..

)

2. Same scenario. This time the higher rated player has only 150 elo points more. Modifier would the be set to 0.25. Meaning that he'd win X * 0.25 = Y p by beating the lower rated player.

3. Same scenario. 0 to 99 points of difference. Modifier could be set to, say, 1.
Thus: 1 * X = Y = points won.

All the above are just examples. We're not sure what the ranges will be or how many different ranges we will have. They will however change if we go with another K value of course. We will also probably want to keep it simple, with a maximum of 2-3 ranges the most.

The effect of all that is simple: Players will play other players who have about the same skills (+-200) as them self. In result, raping newbies won't really be a problem.

(We'd probably also get better quality overall of the games played ...but that's not the point with using the modifier.)

Lastly, about the players beeing close: They wont be in the future, as we will make the K value higher.

especially if people want to play with many different allies, which is, I think, an important part of the fun in Wesnoth.

I'm not sure we should use assumptions like those. It seems better to just give people the tool and let them use it as they want, granted the tool still measures what its supposed to

I myself would never ever play a 2vs2 game with random partner. I'm not interested in it. You are, hence it's cool for you. If we design the 2vs2 system we will do it for the "random" team, as we have agreed already, so this is maybe not an issue after all since both type of players can be happy in it

What would be the purpose of a 2v2 ladder ? As I see it, it would be mainly 2 things:
1) Have a competitive team play (as in, your rank reflects your gaming skill).
2) Be able to play games with players of roughly the same skill level as you, both as allies and opponents

Agree. I'd also like to add that although the above two might appear to be a fact sometimes, they could very well miss their target if they had a weird Elo behind them, so, we must really ask ourself what exactly we measure by using a specific way of calculating the 2vs2 elo in a 2vs2 ladder.

If most of the teams keep hovering around 1500 for a while, then you lose both the competitive aspect and the matching system aspect. If it takes 20 games (40+ hours? not counting the scheduling time) for a team to reach his appropriate rating, the 1500~ ratings will just be a real mess, with random players against random players. Don't get me wrong, it can be fun, but the ladder wouldn't make sense.

I think I indirectly have answered this one above somewhere: The 1500 wont be a problem won't be there since we will use a much higher K than we use today (and lower the K for top tier teams, as they do in chess, to stabilize the top.. regular elo theory) So, there are no worries here since the argument, while being fully correct, depends on current ladder K- value of 24. Also, since we will introduce the modifiers, people won't play fully random people - they will usually play people within their own classes. (Implementation of mods would maybe need to be rethinked for the 2vs2 elo)

Could you be more specific on how this rating would be calculated ? If I'm correct, the ELO system is a 0 sum system, for a reason, so this rating would have to be calculated in a similar way than I suggested: your team's personal ratings against the opposing team's personal ratings. The risk you would take could be completely different between your team's rating and your personal rating. Let's say you are playing 1450 (your team) vs 1550, you would risk something like -8 points for a +14 points gain. However, if your personal rating is 1650, and the opposing team's average personal rating is 1450, you would risk something like -18 points for a +4 points gain.

This is the exciting part

There seem to be 2 ways that come to mind in which we could do it:

Way 1: Team is an individual
This is the way I had in mind in my last reply, but clearly not the only one. In this solution it works like this:

P1 & P2 = Team7
P3 & P8 = Team10

When each team is created it gets 1500 (with the current K, which will change, so only read the numbers as examples). The team is treated just like an individual is in the current 1vs1. Whenever P1 & P2 play together, that combo of players = Team7, a unique team.

Now, they lose against team10. We do a regular elo calc where we use each teams elo rating.

On top of this, we could at the same time have a personal 2vs2 record that would work like the system below:

Way 2: Players Teamwork Rating

This suggestion is probably what you wrote: To let every individual player have a 2vs2 elo rating that would be effected, and how it is effected depends on the partners dito rating and the oppositions. So:

Team 1 Elo = (p1 1500 + p2 1500) / 2 = 1500
Team 2: ...etc etc

The risk you would take could be completely different between your team's rating and your personal rating. Let's say you are playing 1450 (your team) vs 1550, you would risk something like -8 points for a +14 points gain. However, if your personal rating is 1650, and the opposing team's average personal rating is 1450, you would risk something like -18 points for a +4 points gain.

This could seem like an issue of we use the double-measure system as I suggest in Way 1 (else not), yes. However, I would call it a feature, and not a problem: Of course the 2 different Elos, as suggest in Way 1 could "conflict" since they measure totally different things, but thats also what makes it weird to talk about "conflict" in the first place. A player which strives to have both elos maxed out at all time would get to play very few games since all parameters would have to be be correct for him to gain the most in both elos. That won't be a problem because if we have a 2vs2 ladder it will be based on one of the elo ratings, not an average of both. Or, we could even let the 2vs2 laddeer arrange people based on either of them. Again, since those two values have no relation when we decide your position on the 2vs2 ladder, there is no conflict between them. (Hope I didn't confuse you now.. = / )

However, to summarize, rating specific teams against each others needs a large amount of games to make sense. What you propose is to skip the registration part for each team, but still match specific teams against each others. In itself it allows players to play "on the fly" with each others, which solves a part of the problem, but the issue with the ratings remains.

No, they don't need a larger amount of games than any other way to play Elo (1vs1 for example) since the Elo formula is the same. However, I'm beginning to think that your suggestion is of more use when implemented, since people will probably have plenty of partners in most cases and there will be like 4 games played per unique team, in average, even if some teams will have many more. Also, there is no issue with the ratings ..now... I hope. = P

First one is obvious and we talked about it already, it's the fact that 2 players used to play with each other are more likely to perform better than 2 strangers. I don't think it's a big issue tho, as people learn general strategies valid with everyone, and they have the time to talk about it in game as well. It's the "lesser evil" I was talking about in my last post.

I agree, and I also agree it's not a huge problem really. Why? Because people have the choice with how and whom to play with and against. We leave it open to them. Most would probably form teams in the longer run anyhow, then again, even if they don't - we won't have a problem with the system. It's blind for team-composition.

Second one is more tricky, it's the fact someone playing with a much lower rated player will take a lot more risks than him (considering his own rating). The good side of things is that if 2 players of different ratings start playing regulary with each other, they will end up with the same rating (their own team rating somehow). I think we would need some veteran Wesnoth players opinion about it, as in how much a strong player matters in a matchup of 2 averages players vs 1 weak and 1 strong player.

Why is that a problem? I'd like to just maintain that it isn't, because it's an issue of team-composition. People could either compose their teams rationally, or irrationally. In both cases it would be the result of their own choices. Meaning, in both cases, the player is happy with it as it is. If so, why should we bother thinking about it or trying to enforce tems where both players are equally skilled? What interest would we have in such an endeavour? I'd say none.

Also, if you play a with a much lower rated player than yourself you would lose less points than you would if you played with a player that had as many points as yourself. It's a perfect balance: You risk losing the crappier player you play with, but lose less the more he sucks, while you risk winning the better player you play with, but win less the better he is. The elo solves it all.

drinian:

i suggest not letting Isar's Cross be a valid 2v2 map for the ladder. Personally I find that its not as well balanced between all civs and IMO luck is a larger factor.

I'll leave that discussion to whoever is the balance expert of 2vs2 maps.

I'm clueless in those matters. It's also clear that the discussion must be had sooner or later.

Maybe there are some threads in here that deal with 2vs2 map balance?

Post by **Doc Paterson** » April 10th, 2008, 12:07 pm

eyerouge wrote:
drinian wrote: i suggest not letting Isar's Cross be a valid 2v2 map for the ladder. Personally I find that its not as well balanced between all civs and IMO luck is a larger factor.
I'll leave that discussion to whoever is the balance expert of 2vs2 maps. I'm clueless in those matters. It's also clear that the discussion must be had sooner or later. Maybe there are some threads in here that deal with 2vs2 map balance?

Drinian is absolutely right, and I would never recommend Isar's for a tournament, or any match involving a ranking system. Any other 2v2 map in mainline should be fine for these purposes.

morphine · Post by **morphine** » April 10th, 2008, 5:19 pm

When you compare Wesnoth to chess (games length, frequency, etc.), there is one big difference between the 2: chess players are looking at their ELO rating over the course of years, not months. There are already millions of players (or thousands at least) with a meaningful rating they got over the past years. Of course we could assume the same with Wesnoth, but I'm not sure building a ladder over this assumption is a good idea, at least as long as it doesn't have a solid player base and a majority of correctly rated players.

About having 2 different ratings: the team rating and the personal rating. I don't think it's a good idea at all, and we better have only one of the 2.

Let's consider first how the rating formulas would work for each type of rating.

Rating specifics teams against each others is rather straightforward: each teams have a given rating, starting at 1500, and go up and down according to the ELO formula. The hard part is how to rate an individual in a team ladder.

I see 2 ways to do it:

1) Only consider the teams rating to compute how much they gain or lose each game. For example, player1 with 1600 personal rating plays in a 1450 rated team against a 1550 rated team. He will either win +16 points or lose -8 points. This is a fair way to compute a player personal rating after each game (assuming the teams ratings are correct).

2) Consider the personal rating against the team rating, so both players come closers to each others rating-wise after a game, but it isn't fair for the higher rated player of the team. My first suggestion in this thread shows one way to do it, but it can also be less harsh for the higher rated player.

Both of these formulas would be valid for a truly random team ladder (as in, you can't choose your ally), as the 1rst formula is always fair for both players, and the 2nd formula is fair is you assume you are evenly matched with lower and higher rated allies.

Now let's see how it would work if we had one of these 2 individual ladders along with a specific team ladder:

1) If we only consider team ratings, then playing with as many new teams as possible would be the best way to increase your personal rating, but the worst way to increase your team rating. The individual ladder wouldn't make sense, as we would have weak, high rated individual. As a side effect, if someone wants to focus on his personal rating, we would have strong, low rated teams.

2) If we consider personal rating against team rating, then playing with as few new teams as possible would be the best way to increase both your personal and team rating. The personal ladder wouldn't make sense, as we would have strong, low rated individuals. As a side effect, if someone wants to focus on his personal rating, he would have to restrict himself to play with the same ally all the time.

Either the only ladder is the specific team ladder, and the individual rating just a statistic in your profile (but it wouldn't really measure anything), or each ladder would suffer from the other in one way or the other. That's why I think we can't have both.

Now as I wrote in my last post, I think it would take too long for most players to achieve anything with a specific ally. Of course it would work in the long run, but it would take a while. Maybe I'm wrong on this point, but I think this is what we need to consider to choose between a team rating and an individual rating.

So this is my last long post on this topic, I let you seal the deal

I think an individual rating would be more social and fun, but on the other hand I don't really have any 2v2 experience in Wesnoth, so it's only theory here.

eyerouge · Post by **eyerouge** » April 10th, 2008, 10:08 pm

morph:

Interesting post.

morphine wrote:When you compare Wesnoth to chess (games length, frequency, etc.), there is one big difference between the 2: chess players are looking at their ELO rating over the course of years, not months. There are already millions of players (or thousands at least) with a meaningful rating they got over the past years. Of course we could assume the same with Wesnoth, but I'm not sure building a ladder over this assumption is a good idea, at least as long as it doesn't have a solid player base and a majority of correctly rated players.

The biggest difference between Chess & Wesnoth when it comes to Elo rating, is, from my understanding:

The fact that Wesnoth isn't a full information game, as chess is. We have a random generator, and in the ladder, we also have war of fog. Although Wesnoth has the random element, in the long run it doesn't matter much since the game is highly skillbased, hence a rating of some kind makes perfect sense (compare with rating a player in a 100% dice based game).
That a game can be, and most often is, of pre-determined finite time in competitive chess. Again, this doesn't matter much for us because of plenty of reasons, and the same goes for:
That a turn in chess is infinite, if you disregard the finite time, if such a time exists to start with.

I don't understand how the fact that "the big chess players are looking at their ELO rating over the course of years, not months" has to do with anything (except us using a higher K-value, which we will): Every chess player starts with 1500. Every chess player has to play game 1, to get to play game 2, to get to play game 3 and so on. The chess player has a rating all the time, even when she's new in the chess rating system. It doesn't matter at all for the Elo formula itself if you play your 10:th game or your 1000:nd game. The formula is the same. It matters only to us, when we try to understand how accurate a players rating is. As you write - the more games a player has behind her, the more inclined we get to assume that it's more accurate than the case where the inverted relation is true.

What you state, that the chess players look at their Elo over several years instead of months is both true and false: Yes, the veteran chess players have an Elo that reflect several years of play (and so would a veteran Wesnothian). However, most chess players have not.

I don't honestly know the average length of an active chess players career and how many % of the chess players are old veterans and how many of the total amount are newbies or intermediate, but, it seems very likely that most chess players are not veterans or anywhere near veterans. Such is usually the case in most competitive games, and even if it's not, what does it actually tell us about the Elo rating?

Elo allows you to compute the rating after every game, or, after a series of games. Again, the formula is the same. You can play 1000 games and then compute the correct Elo, or you can, as the ladder does, compute & display it after every played game. Time has little to do with anything, more than indirectly of course, as longer timespan usually equals more played games (then again, Gallifax has rocked my ass of when it comes to number of played games, even though I was the first member of the ladder and have been there the longest time, so time evidently tell us very little. Individual activity tells us everything.)

All the ranting above doesn't matter much. I mainly wrote it to try to understand your point better as I'm afraid that I don't follow your conclusion - where's the argument, what does it tell us, and why? Luckily we can just skip ahead and for the sake of simplicity blame it all on my lack of understanding.

Especially since it's almost off-topic (again, maybe it's not if you have a valid point, then it would be ignorant of me to deem it off-topic.)

About having 2 different ratings: the team rating and the personal rating. I don't think it's a good idea at all, and we better have only one of the 2.

I believe I have expressed myself clumsy and apologize for doing that: I never wanted to suggest that we should use both to determine the actual rank (position on the ladder) of a player. I suggested that we keep a record of both. If we do that, then you can in effect view two different types of 2vs2-rankings (it would all still be the same ladder, just two separate views of it): One rating would be one view, and the other another view. Depending on what you want to know or compete in, you'd go for a specific play style. If you want to have the highest team elo you wouldn't care much about your individual 2vs2 rating, and vice versa. I agree that the values look incompatible, but I also point out that it's misleading to even speak of the values in terms of compability to begin with: They measure totally different things.

Rating specifics teams against each others is rather straightforward: each teams have a given rating, starting at 1500, and go up and down according to the ELO formula. The hard part is how to rate an individual in a team ladder.

I see 2 ways to do it:

1) Only consider the teams rating to compute how much they gain or lose each game. For example, player1 with 1600 personal rating plays in a 1450 rated team against a 1550 rated team. He will either win +16 points or lose -8 points. This is a fair way to compute a player personal rating after each game (assuming the teams ratings are correct).

All in that quote is fully ok with me and I think we agree on those.

2) Consider the personal rating against the team rating, so both players come closers to each others rating-wise after a game, but it isn't fair for the higher rated player of the team. My first suggestion in this thread shows one way to do it, but it can also be less harsh for the higher rated player.

That however, I oppose, because: 1) it would encourage way less dynamic partner pairings than the alternate system and 2) probably lead to less games played. It also seems to make less sense as a way to rate skills compared with the first suggestion.

Either the only ladder is the specific team ladder, and the individual rating just a statistic in your profile (but it wouldn't really measure anything), or each ladder would suffer from the other in one way or the other. That's why I think we can't have both.

A specific team ladder rating (say "Team Baltazars elo", where the members of Team Baltazar are always the same) keeps track of that specific teams skills.

A "random" team ladder rating keep track of how good team player you are in "random" teams. It's probably better to not even call it random as it wouldn't always be. A better name is perhaps "non-specific team elo" or dynamic team elo.

Keeping track of both is possible and there is no conflict at all because they're two separate concepts, showing very different things. The non-specific team elo is some kind of historical evaluation of all your team games, with whoever partner you played them. The specific-team elo is of course only of value when we look at the specific team, telling us something about that teams skillset, and nothing else (i.e. it doesn't reveal how well the team members play when they are in other teams, which the non-specific team elo does).

I'd like to clarify once more that I'm not suggesting that we "mix" the two numbers together somehow. On the contrary: They always remain to be as separate entities, and Elo is used to calculate both of them.

An example which illustrates their use if ms. X joins a new volleyball team. In that team the coach keeps track of how well she performs. Next season she changes team, and next she changes again. The coach of the third coch can of course go back and look how she performed within every team, and he can then also draw some blunt conclusion about her average performance (the result of all she ever did in all her previous teams). Having a higher non-team specific average would mean she is a more adaptable player, one which has an easier time finding her role within any given team.

Now as I wrote in my last post, I think it would take too long for most players to achieve anything with a specific ally. Of course it would work in the long run, but it would take a while. Maybe I'm wrong on this point, but I think this is what we need to consider to choose between a team rating and an individual rating.

First of, I don't see why it must take longer time, unless of course people have different time tables than their in-game partner. While it doesn't have to be so, I agree with you that it will probably be like that for most players.

Second, I don't see why we must make the choice between the different ways. Isn't it true that what I suggest, to use both of them, gives us more info about each player?

So this is my last long post on this topic, I let you seal the deal I think an individual rating would be more social and fun, but on the other hand I don't really have any 2v2 experience in Wesnoth, so it's only theory here.

If I for some valid reason had to choose between the two I'd agree with you that the non-specific team rating is a priority compared to the specific. That said, I don't believe we have to choose or that having both has any kind of drawback at all. On the contrary.

I also don't want to seal the deal as it would give the impression I'm some kind of authority (even if you didn't mean it like that): I strongly believe that the coder doing the real work should do it only if he sees it as fun and meaningful himself, else most GPL projects would go extinct soon enough

Personally I want both the numbers. I also don't want you to code something you don't believe in yourself, hence, my suggestion is that you implement the system as you want. You could add one of the ways (non-specific team rating) and leave the rest to me/whoever or do both of them. I'd be grateful in any case and you'd contribute whatever you decide.

nataS · Post by **nataS** » April 11th, 2008, 9:14 am

We were brainstorming about fighting abuse on the ladder a few moments ago.

I suggested that instead of only the winner, both players have to report the match, within two hours (for example) after the game. If the losing player doesn't report, the game will be accepted automatically after the two hours. You could make a rule enforcing this, and if a player fails to report three times in a row with at least two different players his/her account gets automatically deactivated pending admin review. Optionally you could let the player resolve the issue itself by accepting the games that still await to be reported.

Another option to help fight abuse, is to let people upload or save a replay of the game. I know other ladders (certain clanbase games) have this as a rule. But this might scare away players that lack the required experience with computers.

eyerouge · Post by **eyerouge** » April 11th, 2008, 10:47 am

Thanks for your input.

nataS wrote:We were brainstorming about fighting abuse on the ladder a few moments ago.

I suggested that instead of only the winner, both players have to report the match, within two hours (for example) after the game. If the losing player doesn't report, the game will be accepted automatically after the two hours. You could make a rule enforcing this, and if a player fails to report three times in a row with at least two different players his/her account gets automatically deactivated pending admin review. Optionally you could let the player resolve the issue itself by accepting the games that still await to be reported.

Your suggestion would probably make it easier for abusers to get caught in many cases, yes. (It doesn't solve it 100% as the loser could still report and contest the winners report, meaning that an idiot could still cause trouble if he wanted to.)

The problem with the occasional abuse is, this far, not big enough to motivate more complicated reporting routines. If the problem gets out of hand one solution could be what you suggest.

Only time abuse caused serious trouble was recently due to some wicked coding of the original ladder software, something which I fixed now that I discovered it (all explained in the edits of the latest news post). And even then it all got well - almost all active players contacted me, helping out and setting things straight.

What makes abuse very hard to get away with already is that 1) all can see the result of it and 2) all can see if a player has never been around the main server. Number one would be the main thing.

The only pattern I've seen this far when it comes to abuse, in the sense that false reports are made, is that:

The person didn't do it intentionally, and simply failed to grasp the system. Probably the person lacks English skills or is very young. This one is very hard to fix unless we start translating the whole site. Then again, if we start doing that, we encourage more non-english speaking people to join, which would in return cause problems once they start playing with other member than those who speak their native tongue.
The person was simply an idiot and did it intentionally. For this one, there is no real total fix. But you are right that the routine can be altered so that it becomes harder for such persons. Question is how one maintains balance, how to keep the ladder easy enough to use for the normal player, while at the same fighting of the abusing players. The more complicated we make things, the more we punish not only the abusers but also the normal players. It's a thin line sometimes, hence we have to compare the magnitude of the problems with the effect of their "fix".
It's my impression that the most false reports have been made by non-english speakers, or people who speak limited english. At least this far. This alone doesn't tell us much and no conclusion should be drawn from it, but it's still a curious coincidence.
They get caught as soon as it's apparent enough or a player sees that somebody reported falsely.
Users who make false reports seem to do it one time only. Very few keep on doing it, and this far all have quit doing it and probably wesnoth or multiplayer wesnoth all together.

Another option to help fight abuse, is to let people upload or save a replay of the game. I know other ladders (certain clanbase games) have this as a rule. But this might scare away players that lack the required experience with computers.

Once chains is done with the replay parser we will probably enforce the uploading of replays. We currently don't do it since we're planning to introduce the parser and the statistics part with it, but once we've implemented it we're likely to require that people upload a replay. That in itself doesn't however stop abuse unless somebody downloads that specific replay, watches it, sees that it's fubar and then reports it to us, which is plenty of things that has to happen with every game in order for it to work as an anti-abuse thingie. That suggest that we maybe shouldn't enforce the upload after all as it has limited anit-abuse functionality, if you think about it.

morphine · Post by **morphine** » April 11th, 2008, 3:06 pm

About having the 2 kind of ratings (team and individual), I always considered it would be 2 different ladders, so there is no misunderstanding about that. However, as you said yourself several times in your last post, they measure totally different things. That's exactly what is bothering me: players would compete for different ladders, while playing the same games. In a way, if no one was playing with their ratings in mind, or if all matchups were random, it wouldn't be an issue. But as people can care more about one than the other, it can create the side effects I was talking about in my last post. Everyone should be able to use the ladder for their own purpose, be it gather stats about their playing time, have a rating reflecting their skill at the game, find evenly skilled allies and opponents or just compete to have the best rank they can on the ladder.

That said, a quick explanation about my 2nd proposition for individual rating, penalizing the higher rated player (possibly, only a little bit). It was a clumsy attempt at mixing individual rating and team rating.

If we go for a simple individual rating (as in, both players in the team lose and gain the same points), it means that 2 players with different ratings will never reach the same ratings if they keep playing together. Let's say player1 (1500 rating) starts teaming most of the time with player2 (1600 rating). If they manage to gain 200 rating together, player1 will be at 1700 while player2 will be a 1800. This is quite a big difference for 2 players who actually did there whole rating together. With my proposed formula, it would bring both of them at 1750 rating instead.

However, I agree it's not acceptable in this shape, as it's not a fair system for the higher rated player. Another way to achieve the same result and keep it fair, would be adjust the gains and losses of the players according to their rating difference with their ally.

Let's say our team of player1 (1500 rated) and player2 (1600 rated) play against a 1550 rated team. Their own team rating would be 1550, so with a K-value of 24, their gain/loss ratio would be +12/-12. Given their rating difference (100 rating) we adjust their ratio by, let's say, -25% for the higher rated player and +25% for the lower rated player: player1 gain/loss ratio would be +15/-15 and player2 +9/-9.

Let's say the same team play against a 1650 rated team. The gain/loss ratio of their team would be +16/-8. Given the same logic, player1 gain/loss ratio would be +20/-10 and player2 +12/-6.

I think it would be the most interesting and fair system we discussed to far:
1) You can play with the ally of your choice and not suffer any penalty.
2) To significantly increase your rating, you are encouraged to play with evenly or better rated players as both allies and opponents. This follows the same logic than a normal 1v1 ladder.
3) Playing with the same team will truly achieve a team result, both allies converging to the same rating if they are successful.

What do you think of this solution?

Edit: This is my last post about it I swear

Took me hours this week just to write these posts... I can implement a first version of a 2v2 ladder based on a personal rating system, and be careful to leave the option to add a specific team ladder later, if there is either a popular demand or someone else wants to do it. We could already see who is interested in such a ladder, and take decisions based on that.

eyerouge · Post by **eyerouge** » April 12th, 2008, 1:05 am

it's not a fair system for the higher rated player. Another way to achieve the same result and keep it fair, would be adjust the gains and losses of the players according to their rating difference with their ally.

Let's say our team of player1 (1500 rated) and player2 (1600 rated) play against a 1550 rated team. Their own team rating would be 1550, so with a K-value of 24, their gain/loss ratio would be +12/-12. Given their rating difference (100 rating) we adjust their ratio by, let's say, -25% for the higher rated player and +25% for the lower rated player: player1 gain/loss ratio would be +15/-15 and player2 +9/-9.

Okey, let's check if I get you right.

Team 1 (2vs2 elo 1550)
P1 (2vs2 elo 1500)
P2 (2vs2 elo 1600)

We agree on everything, except for maybe the fact that you want to use a modifier to even out the difference in skills between P1 & P2 when we reward/punish the players for their game. In the quote you suggest that it's more fair to give/take x% more elo of the less skilled player and vice versa.

Exactly why do we want to do that (use the x% modifier)? How do you reach the conclusion that, for example the less skilled player, should be given more elo or losing more?

Also, why do this more when Elo does it already? If team 1 wins, in your example, we would take the losing teams rating, and we would:

1. Use elo with the losing teams rating against p1 in the winning team
2. Do the same for p2.

That results in a thing which is very similar to your suggestion, but, it's free from modifiers. Elo itself is the modifier. What we do is to take the average elo from the losing team ((p3+p4) / 2), and treat it like a 1vs1 game against p1, and a 1vs1 game against p2. I'd say that solves all problems.

Your claiming that:

I think it would be the most interesting and fair system we discussed to far:
1) You can play with the ally of your choice and not suffer any penalty.

How is that true? On the contrary, you are always penalized for being the better player, as you want to deduct 25% from what that player would have won using the very same system, but without the modifier. You always get special treatment as the higher rated player when you lose, since you lose 25% less than you should have.

2) To significantly increase your rating, you are encouraged to play with evenly or better rated players as both allies and opponents. This follows the same logic than a normal 1v1 ladder.

Isn't that just the same as saying "anybody can play with anybody"? I mean, if you play with better player, then you must be the badder player ( = 1 better 1 badder) If you play with equals you play with equals, and with that we have covered all situations. There are no others.

3) Playing with the same team will truly achieve a team result, both allies converging to the same rating if they are successful.

Now I'm confused. You want the players that play together much to achieve the same rating in the longer run. Why would we want that if we don't use team specific ratings? Isn't the argument based on a thought that is more or less, in concept, identic to one that advocates that team specific rating matters? Because if it didn't matter to you, then you wouldn't care how the result looked for 2 players that usually play together - the whole point with team specific rating is to measure exactly those 2 players (aka that specific teams) rating. If we don't go for team specific rating, team specific arguments won't really matter.. or am I missing something here? *admit that I'm tired.. just came home*

his is my last post about it I swear Took me hours this week just to write these posts... I can implement a first version of a 2v2 ladder based on a personal rating system, and be careful to leave the option to add a specific team ladder later, if there is either a popular demand or someone else wants to do it. We could already see who is interested in such a ladder, and take decisions based on that.

Sounds just great

The best thing would be to not hardcode any values, to have them all in a config file, like we kind of already have. (K-value, modifiers, ranges etc). That way the finer points of this discussion won't matter as the system would be well prepared for any adjustments in values and we could both enable/disable the modifiers by setting them to 0, just like we could adjust the K (which will need to be higher than 24 according to chains.. we're still hunting the magic number..)

The ladder this far is coded in php4.x. and it uses MySQL. Would it work for you? It's because of our host, and also because I don't think specific php5 stuff will be required. You'd probably re-use plenty of the existing code already

We'll most likely move it over to a php5 host in the future, but that doesn't change anything now.

morphine · Post by **morphine** » April 14th, 2008, 2:37 pm

We agree on everything, except for maybe the fact that you want to use a modifier to even out the difference in skills between P1 & P2 when we reward/punish the players for their game. In the quote you suggest that it's more fair to give/take x% more elo of the less skilled player and vice versa.

Exactly why do we want to do that (use the x% modifier)? How do you reach the conclusion that, for example the less skilled player, should be given more elo or losing more?

From my last post:

"If we go for a simple individual rating (as in, both players in the team lose and gain the same points), it means that 2 players with different ratings will never reach the same ratings if they keep playing together. Let's say player1 (1500 rating) starts teaming most of the time with player2 (1600 rating). If they manage to gain 200 rating together, player1 will be at 1700 while player2 will be a 1800. This is quite a big difference for 2 players who actually did there whole rating together. With my proposed formula, it would bring both of them at 1750 rating instead."

Both players played a team up to 1750 rating together. So in all fairness, their individual ratings should be 1750 as well, because that's their achievement on the "team" ladder.

Also, why do this more when Elo does it already? If team 1 wins, in your example, we would take the losing teams rating, and we would:

1. Use elo with the losing teams rating against p1 in the winning team
2. Do the same for p2.

That results in a thing which is very similar to your suggestion, but, it's free from modifiers. Elo itself is the modifier. What we do is to take the average elo from the losing team ((p3+p4) / 2), and treat it like a 1vs1 game against p1, and a 1vs1 game against p2. I'd say that solves all problems.

This is what I suggested in my first post if I'm correct. We agreed it wasn't an acceptable solution because the higher rated player on a team would take more risk than his ally, while playing in the same team.

What I propose here is to make the higher rated player move slower on the ladder and the lower rated player faster. It's different from what normal ELO is doing, as it keeps the same gain/loss ratio. It could make a balanced individual rating system while taking a team progress in account (when there is one).

We can agree that a team with a 200 rating difference between the two players is hard to rate correctly. It's as if the lower rated player is taking an opportunity to play with an awesome ally: he should win big and lose equally big. The higher rated player will play with weaker players: his gains and loss should be less significants.

How is that true? On the contrary, you are always penalized for being the better player, as you want to deduct 25% from what that player would have won using the very same system, but without the modifier. You always get special treatment as the higher rated player when you lose, since you lose 25% less than you should have.

Maybe there is a flaw in my logic somewhere, but if the gain/loss ratio is the same, then the only difference is the speed at which a player moves on the ladder. When he has played enough games, he will tend to the same rating, and the smaller the modifier, the bigger the rating difference needs to be to make a significant difference.

Isn't that just the same as saying "anybody can play with anybody"? I mean, if you play with better player, then you must be the badder player ( = 1 better 1 badder) If you play with equals you play with equals, and with that we have covered all situations. There are no others.

I just meant it would encourage people to look for evenly rated players as their allies (which is best for ladder purpose).

Now I'm confused. You want the players that play together much to achieve the same rating in the longer run. Why would we want that if we don't use team specific ratings? Isn't the argument based on a thought that is more or less, in concept, identic to one that advocates that team specific rating matters? Because if it didn't matter to you, then you wouldn't care how the result looked for 2 players that usually play together - the whole point with team specific rating is to measure exactly those 2 players (aka that specific teams) rating. If we don't go for team specific rating, team specific arguments won't really matter.. or am I missing something here? *admit that I'm tired.. just came home*

It's in fact an attempt to take in account specific team achievements in an individual rating ladder.

Let's say the modifiers are: 5% (< 25 rating difference), 10% (< 100) and 20% (< 300). To reach the same rating than your higher rated ally, you would need (on average, K=24, without counting losses):
-25 -> 0: 20 wins (you gain and lose ~10% more points than your ally, 5% more than normal ELO)
-100 -> -25: 30 wins (~20% more)
-300 -> -100: 40 wins (~50% more)

This make 3 categories of teams, the standard evenly rated team, the team with allies of the same category and the weird team with a very strong and a very weak player. The 2nd category make it harder to gain points when you are already high rated, as it should, if you don't play with a high rated ally as well.

That said, we can start with a "simple" individual rating, as it wouldn't make a lot of difference at start anyway, and see when we have a small database of ladder games if something should be changed. Same with specific teams ratings, everything can be stored for when we need it.

eyerouge · Post by **eyerouge** » April 14th, 2008, 4:49 pm

morphine wrote:
eyerouge wrote:Exactly why do we want to do that (use the x% modifier)? How do you reach the conclusion that, for example the less skilled player, should be given more elo or losing more?

"If we go for a simple individual rating (as in, both players in the team lose and gain the same points), it means that 2 players with different ratings will never reach the same ratings if they keep playing together. /../ This is quite a big difference for 2 players who actually did there whole rating together. With my proposed formula, it would bring both of them at 1750 rating instead."

Both players played a team up to 1750 rating together. So in all fairness, their individual ratings should be 1750 as well, because that's their achievement on the "team" ladder.

a) If P1 & P2 play all their 2vs2 games together, in the same team, they will always have the same 2vs2 Elo. Hence, there's no need of a modifier since they have won and lost together and don't play any games with another partner. (This is however a rare case in a system that doesn't use team specific rating...) If P1 & P2 play some games together, and some with others, and so on, using modifiers makes even less sense as it would be a number that represents mainly how good you are to trick a better to player to play with you.

b) A rating that is not team specific shouldn't even try to represent the the team since it can never do it properly as it doesn't measure that - it measures the individual players skills in team x: By using modifiers we even out the difference in player skills between P1 & P2, which are both on the same team. We seem to do it in order to represent the idea that the players are "equally good" in 2vs2 because they keep winning "together". This is false, and the proof is that you suggest that we should use the modifier to begin with: If the players were equally good then no modifier would be needed.

morphine wrote:
eyerouge wrote:Also, why do this more when Elo does it already? If team 1 wins, in your example, we would take the losing teams rating, and we would:

1. Use elo with the losing teams rating against p1 in the winning team
2. Do the same for p2.

That results in a thing which is very similar to your suggestion, but, it's free from modifiers. Elo itself is the modifier. What we do is to take the average elo from the losing team ((p3+p4) / 2), and treat it like a 1vs1 game against p1, and a 1vs1 game against p2. I'd say that solves all problems.
/../ We agreed it wasn't an acceptable solution because the higher rated player on a team would take more risk than his ally, while playing in the same team.

If I didn't accept it I must have misunderstood what you suggested, and in such case I apologize for my misreading. I actually advocate that solution because I believe it's the only one where the math works out (all depending on what you
want to measure of course) in the sense that it 1) measures individual player skills and 2) does it fair.

You're saying that it isn't acceptable because the higher rated player takes a larger risk. That is true, but, it is also not a problem - it's a feature of the Elo system that reflects reaility: If you play with a weaker player, then reality is such that it suggest that you are taking a larger risk. Why should that not be reflected in our rating system? It's real and it makes sense.

If I understand you correctly, you want a system where more games will be played since it will be easier to find a partner if you know that it doesn't matter much if you play with a weaker player if you are the stronger one, that if if you lose, you won't be too affected of the loss because you happened to play with a less skilled player.

I can see how it would benefit activity on the 2vs2 ladder and that it most likely would lead to more games since it would be easier to find partners. What I can't see if how the system can be considered fair from any point of view. The modifiers are anti-fair. They take a fair system and turn it into a system that reflects reality to a lesser degree. It manipulates the numbers, and that manipulation alone also make the numbers less useful.

If I'm the lesser skilled player wit 1750 and I team up with you, who are the higher skilled player at 1920, and we start playing together and winning (or losing), modifiers are of little use if we want to know my skills, or if we want to know your skills. Modifiers make some sense if we want the rating to show us something else, something which is in its nature different from individual skills. Why? Because such type of rating would reflect the inherent realtions between the two partners in the same team. This alone is a problem which will shine through if you consider:

Eyerouge & Morphine form team 1. They play 30 games, and eye then leaves the game. Morph finds another partner.
Morph and Peter form team 2. They play 50 games.
At the same time Morph and Anna form team 3. They play 30 games.

In the example above morphines individual non-specific team elo rating would be a very strange mix of: 1) the results of all the games and 2) the relation morph had skill wise to each and every team member in the different teams over time.

This means that the more different teams you play in, and the more skilled partners you pick, the more and more meaningless the rating becomes since it's in/de-flated by the modifiers. The relations the modifiers track are a problem, because when morph & anna play morphs rating wouldn't reflect morphs skills in a 2vs2: It would reflect his skills in relation to the opponent and also the ally. Therein lies the problem.

morphine wrote:What I propose here is to make the higher rated player move slower on the ladder and the lower rated player faster. It's different from what normal ELO is doing, as it keeps the same gain/loss ratio.

Elo already does that by using different K values: The lesser skilled players are usually rated with a K of 32 or 24, while the high rated players have a lower K. (We're also implementing this at the 1vs1 ladder soon) The different K values has that very effect.

morphine wrote:We can agree that a team with a 200 rating difference between the two players is hard to rate correctly. It's as if the lower rated player is taking an opportunity to play with an awesome ally: he should win big and lose equally big. The higher rated player will play with weaker players: his gains and loss should be less significants.

I think this is where half of my confusion can be found: I believe it's a major difference in rating individual players 2vs2 skills and rating a teams skills.

Individual players 2vs2 skills would be an Elo that only showed how well that specific player has fared in 2vs2 games. The number itself should not show us anything about his partners or what relations he had to them. It shows us how many points he has acquired by playing together with x in a 2vs2 game.

The team skills are a totally different matter - they show how well that specific team plays. They are extremley easy to measure - every team gets 1500 when it is formed and the team always has the same 2 members. You then compare each teams elo against the other teams elo, and there is no talk about the individual members anywhere.

Now, in your quote, you don't make this distinction. You speak of individuals ratings and a team rating, and of some kind of relation between those two, where part of that relation is expressed by your notion of the modifiers. This is where my main objection can be found - there should be no such relation since there is none in reality that doesn't interfere with the rating of a players or a teams skills.

You also claim that "The higher rated player will play with weaker players: his gains and loss should be less significants." Why should it? Where in the world do we rate people that way, and for what purpose? If a team makes it into division 1 in a sport and they buy a new crappy player which they let play in their next game - in what way does that effect the scoring in the sport? It doesn't. That's my point: The scoring is the same, it doesn't matter if a bad coach bought a bad player and that player was on the field playing like [censored].

I admit that your suggestion is appealing at first. But is it really fair? Doesn't it lead to a heap of situations where it would be exploited by players searching to only play with better skilled players? And doesn't it cause even more inflation within the whole Elo? This last point seems more or less to be a fact and speaks heavily against the modifiers.

morphine wrote:Let's say the modifiers are: 5% (< 25 rating difference), 10% (< 100) and 20% (< 300). To reach the same rating than your higher rated ally, you would need (on average, K=24, without counting losses):

But thats the problem: In the case with specific teams there is no need to reach the same rating as your partner as you already have 1 team rating and you both share that rating (if p1 & p2 = team 1 and team1:s rating is 1500 then both p1 and p2 would have a rating of 1500 since they are team1) - you'd compare teams on the ladder and not individuals. Each team has only 1 rating,

In the case where you let every player have hos own 2vs2-rating it would still not make sense to adjust his rating to move closer to another players rating. Why? Because his rating should be his alone to whatever degree it's possible within an Elo. If we by looking at my rating not only see a relation to my opponents, but also a relation to my team mates, we can't use my rating to measure my skills. Surely that number would tell us something else.

That is why I still mean that it doesn't make sense to try to reach you allys rating and us being an active part of that with modifiers designed to close the gap between you and partners x y z etc: It distorts skill measurement.

That said, we can start with a "simple" individual rating, as it wouldn't make a lot of difference at start anyway, and see when we have a small database of ladder games if something should be changed. Same with specific teams ratings, everything can be stored for when we need it.

Sure, if you're still convinced your approach measures skills more than the one I suggest, it would be really cool to play around with the 2vs2 ladder and fake plenty of results and games to see what conclusions we can draw. We'd have to do that anyways with whatever system we'd use

Where should I send/upload the files and dumps of the database?

Post by **Wintermute** » April 14th, 2008, 4:56 pm

I just had a look at the ladder website for the first time, and it is quite impressive! Nice job. I think "unofficial" rankings like this will make a lot of players that wished that wesnoth had such a formal system happy. I am very glad that wesnoth does not offically endorse rankings, but the system you have setup seems quite good. Even a player like me, who isn't really intersted in Wesnoth rankings might sign up just to see what it's all about.

eyerouge · Post by **eyerouge** » April 14th, 2008, 8:53 pm

Wintermute wrote:I just had a look at the ladder website for the first time, and it is quite impressive! Nice job. I think "unofficial" rankings like this will make a lot of players that wished that wesnoth had such a formal system happy.

Thank you.

Although we're no where near finished developing the ladder system it self it works well enough for it's core purposes and I believe it's on pair with most other game ladders our there that use Elo. In any case we're continuing with the improvements of it whenever we notice that something needs adding or fixing. Currently chains is working on the replay parser, once that's integrated into the ladder I believe more people will be interested in it as it will bring some cool number crunching abilities to it.

Brutorix · Post by **Brutorix** » April 16th, 2008, 8:00 am

I've been having trouble getting logging onto the ladder site. I just typed in my username and password on the news page and clicked log in just to be redirected to the news page - not logged in. I've tried again and again but can't seem to get anywhere.

If I mistype my username or password though it redirects me to a login failed page though - I don't understand what's going on.

tsr · Post by **tsr** » April 16th, 2008, 8:04 am

Brutorix wrote:I've been having trouble getting logging onto the ladder site. I just typed in my username and password on the news page and clicked log in just to be redirected to the news page - not logged in. I've tried again and again but can't seem to get anywhere.

If I mistype my username or password though it redirects me to a login failed page though - I don't understand what's going on.

Are you sure you are not logged in, the only difference is that you:
- have your nick at the top of the right column
- have a small 'log out' link at the bottom of the right column
- see a 'profile' icon isntead of a 'join' icon as the second in the top row

/tsr

The Battle for Wesnoth Forums

Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...

Re: Ladder Site Online...