I have described the conception of rating system for Insurgency in an earlier post. Having two months of data now, it is time for an evaluation of thus system before i throw into “production” and allow it to actually balance games.
A rating system’s performance is measured by quality of it’s predictions for future games. During a round, all connects, disconnects and teamchanges are logged. I therefore, after the round ended, have a dataset that looks like this:
winner_team: 3 player | normalized_playtime | team Sheppy | 0.8 | 2 XFlix | 1.0 | 3 Neo | 0.3 | 2 ...
From that I can calculate what the system would have expected the outcome to be, the code for that (remember rating is a gausscurve with a mean/sigma) looks something like this:
mu1 = sum((p.rating.mu - p.rating.sigma^2 ) x (1 + 1-p.normalized_playtime) for p in winner_team) mu2 = sum((p.rating.mu - p.rating.sigma^2 ) x (1 + 1-p.normalized_playtime) for p in loser_team) delta_mu = mu1-mu2
At this point I wished I wouldn’t have slept during most of my stochasticis math lectures during my 4th semester - whatever - so because I subtract the loser team from the winner team, I would always expect the delta_mu to be positive, if it’s negative, that means it would have expected the loser team to win.
Correct: 1726 Incorrect: 750 CorrectRatio: 0.697
That is actually quite good, especially since > 70% of the players joining each day are new players. If we set the threshold for delta_mu, at
9 x sigma^2 - nine being the average number of unique players per team per round - and call those predictions “certain” then we get even better results. But only 1/4th of the games have absolute delta_mu’s that are above this threshold:
Correct: 541 Incorrect: 123 C.Ratio: 0.815
Here is a histogram, y-axis is the amount of rounds (green) and wrongly predicted rounds (purple) and x-axis is the absolute delta_mu. As expected the amount of errors drops with higher absolute delta_mu.
But what about predicting the outcome at the start of the round?
While the above is somewhat impressive (at least to me), I also tried to apply the existing ratings to the connected players at the start of a round to predict the outcome, this however was very unsuccessful. The Code is similar to the above, however I obviously don’t have playtimes at that point, so my data just looks like this
player | team Sheppy | 2 XFlix | 3 Neo | 2 ...
From that I put in Security (2) as the winner team and Insurgent (3) as the loser team. After the round I compare the computed value to the actual outcome. That basically means:
Negative value + security win = Good Positive value + insurgent win = Good Negative value + insurgent win = Bad Positive value + security win = Bad
This leads to the following overall predictions:
Correct: 1252 Incorrect: 1222 C.Ratio: 0.506
Whups. That’s not good at all. In fact that’s really, really shitty, because 50% is what you would get if you would just predict a winner at random. But maybe it’s just that the 0.1% certainty predictions screw up the stats, so lets do a Histogram again: Negative values are predictions for security, postive are for insurgent.
Fuck. While obviously the big bar at 0% is meaningless, the problem seems to be that I don’t have very high certainties at any point, and even among the highest certainties I have, the error rate is still high -
delta_mu = 9*sig^2 considered 90% certainty.
“Postmortem” of the pre-round-prediction, an explanation attempt
So why is the pre round prediction THAT bad. Well obviously the only difference to the post round “prediction”, is that players will join and leave and we don’t yet know about them. Firstly I think this does confirm one of my long standing suspicions: That many games are not screwed up by them being unbalanced in the first place, but by players leaving the game. This has some implications on how I will later have to implement my balancing system, because people generally leave, when they perceive to be losing (obviously some people just leave at random, but the rate of players leaving in losing teams is far higher). At first I wanted to balance teams, so that they will be equally matched by skill, but now I wonder, if I should include the map-imbalance within that rating, because I would otherwise lead to a sort of “negative feedback loop” that would cause the team on the less-favored side to lose players and therefore lose way more often than the system would expect them to.
Feel free to send me a mail to share your thoughts or if you maybe even want to share date in return for API access!