I have described the conception of rating system for Insurgency in an earlier post. Having two months of data now, it is time for an evaluation of thus system before i throw into “production” and allow it to actually balance games.


Methodology 

A rating system’s performance is measured by quality of it’s predictions for future games. During a round, all connects, disconnects and teamchanges are logged. I therefore, after the round ended, have a dataset that looks like this:

winner_team: 3
player | normalized_playtime | team
Sheppy | 0.8 | 2
XFlix  | 1.0 | 3
Neo    | 0.3 | 2
...

From that I can calculate what the system would have expected the outcome to be, the code for that (remember rating is a gausscurve with a mean/sigma) looks something like this:

mu1 = sum((p.rating.mu - p.rating.sigma^2 ) x (1 + 1-p.normalized_playtime) for p in winner_team)
mu2 = sum((p.rating.mu - p.rating.sigma^2 ) x (1 + 1-p.normalized_playtime) for p in loser_team)
delta_mu = mu1-mu2

At this point I wished I wouldn’t have slept during most of my stochasticis math lectures during my 4th semester - whatever - so because I subtract the loser team from the winner team, I would always expect the delta_mu to be positive, if it’s negative, that means it would have expected the loser team to win.


Actual Evaluation 

Correct: 1726 Incorrect: 750 CorrectRatio: 0.697

That is actually quite good, especially since > 70% of the players joining each day are new players. If we set the threshold for delta_mu, at 9 x sigma^2 - nine being the average number of unique players per team per round - and call those predictions “certain” then we get even better results. But only 1/4th of the games have absolute delta_mu’s that are above this threshold:

Correct: 541 Incorrect: 123 C.Ratio: 0.815

Here is a histogram, y-axis is the amount of rounds (green) and wrongly predicted rounds (purple) and x-axis is the absolute delta_mu. As expected the amount of errors drops with higher absolute delta_mu.

complete rounds predicts


But what about predicting the outcome at the start of the round?

While the above is somewhat impressive (at least to me), I also tried to apply the existing ratings to the connected players at the start of a round to predict the outcome, this however was very unsuccessful. The Code is similar to the above, however I obviously don’t have playtimes at that point, so my data just looks like this

player | team
Sheppy | 2
XFlix  | 3
Neo    | 2
...

From that I put in Security (2) as the winner team and Insurgent (3) as the loser team. After the round I compare the computed value to the actual outcome. That basically means:

Negative  value + security  win = Good
Positive  value + insurgent win = Good
Negative  value + insurgent win = Bad
Positive  value + security  win = Bad

This leads to the following overall predictions:

Correct: 1252 Incorrect: 1222 C.Ratio: 0.506

Whups. That’s not good at all. In fact that’s really, really shitty, because 50% is what you would get if you would just predict a winner at random. But maybe it’s just that the 0.1% certainty predictions screw up the stats, so lets do a Histogram again: Negative values are predictions for security, postive are for insurgent.

pre-round-predicts-histogram

Fuck. While obviously the big bar at 0% is meaningless, the problem seems to be that I don’t have very high certainties at any point, and even among the highest certainties I have, the error rate is still high - delta_mu = 9*sig^2 considered 90% certainty.


“Postmortem” of the pre-round-prediction, an explanation attempt

So why is the pre round prediction THAT bad. Well obviously the only difference to the post round “prediction”, is that players will join and leave and we don’t yet know about them. Firstly I think this does confirm one of my long standing suspicions: That many games are not screwed up by them being unbalanced in the first place, but by players leaving the game. This has some implications on how I will later have to implement my balancing system, because people generally leave, when they perceive to be losing (obviously some people just leave at random, but the rate of players leaving in losing teams is far higher). At first I wanted to balance teams, so that they will be equally matched by skill, but now I wonder, if I should include the map-imbalance within that rating, because I would otherwise lead to a sort of “negative feedback loop” that would cause the team on the less-favored side to lose players and therefore lose way more often than the system would expect them to.


Live-Results 


Feel free to send me a mail to share your thoughts or if you maybe even want to share date in return for API access!