Back2Warcraft's Elo Rating
Back2Warcraft's Elo Rating is a rating list of the Warcraft 3 professional players curated by Back2Warcraft. The ratings are calculated based solely on map scores in high level tournaments. The ratings do not take the eventual tournament position into account, or indeed any other factors such as prize money and ladder standings.
The ratings are calculated monthly, and released at the beginning of the month. They are based on more than 2500 high level games since December 2015.
The ratings use the following standards:
- The first games entered to the system are WCA 2015, where all players were given an initial rating of 2000
- The rating period is one month
- The ratings prior to November, 2016 is considered part of the transient response and not trustworthy
- All games must come from tournaments where at least half the players already have a rating (preferably) or a well established provisional rating
- The minimum number of games to get a rating is 12, before that the players have a provisional rating only
- The minimum number of games to be included in the rankings is 15
- There is no rating decay; the rating will not decrease if a player is inactive
- There is a decay on number of games played for the player, so inactive players will be dropped from the rankings after a period of no games
The calculations are based around Arpad Elo rating system first implemented in chess in 1960. Back2Warcraft’s Elo Rating, while following the basic principles of this system, are implemented using the slightly impoved and more modern version currently in use by the USCF. A straightforward explanation of mathematics can be found in a paper by Mark E. Glickman, a rating system expert, in Chapter 3 and beyond.
The main differences between this variant and the more traditional Elo system is the variable K-factor and the anti-inflation system described by the special rating formula, see the section below.
The Elo rating score is calculated in three basic steps, predict, compare and adjust.
The first thing the system does when considering a new result is to predict what the outcome should have been. So before looking at the actual result, the system calculates the likelihood of each player winning the game based on their rating. This likelihood is called the expected score of each player, where the score is simply defined by win = 1, loss = 0. Since only one player can win, the sum of the expected scores is 1.
So in a game between two players rated P1=2000 and P2=1900, the expected scores are 0.64 for P1 and 0.36 for P2. Or in other words, the higher rated player is expected to win 64% of the time if the ratings are true. Since P2 is expected to score 0.36 points each time these players compete, if there’s a situation where they play each other five times the total expected score is 0.36 x 5 = 1.8. Similarly, P1 has an expected score of 0.64 x 5 = 3.2.
The predicted score is then compared to the actual score. For each player the difference between these is called the delta. This value is a measure of how the player is performing compared to what could be expected given their current rating. So a positive delta means the player is overperforming his rating, a negative delta means underperforming.
So continuing with the example above, given a 1-0 victory for P1, the delta is set to (Actual score – Expected score) = (1 – 0.64) = +0.36. For P2 the delta is (0 – 0.36)=-0.36. P1 did slightly better than predicted, P2 slightly worse. If we change up the score, and now let the weaker player win with a 1-0 victory for P2, the deltas become -0.64 and +0.64. So, not surprisingly, when an upset occur, the deltas are higher.
Now, given a five game series between P1 and P2 with a score of 3-2 victory for P1, the delta of P1 would be (3 – 3.2) = -0.2. The delta of P2 would be (2 – 1.8)=+0.2. The relatively small deltas indicate that the ratings of the players are true, and that little adjustment is necessary. A 3-2 score is right in line with what we expect when a player rated 2000 plays against a 1900.
Taking the sum of all the deltas of a player over the rating period then gives one number indicating whether the rating should increase, decrease or stay as is.
The final step is then to create a new rating for the player, more accurate to how the player is actually performing. The new rating is given by the old rating plus the adjustment based on the delta described above, or mathematically: Rating = OldRating + K*delta, where K > 0.
If the player has overperformed, delta is a positive number, and the new rating will be higher than the old. If the player has underperformed, the delta will be negative and the rating decrease. The conversion from a delta to an actual adjustment is done by multiplying delta with the factor K, called the K-factor
One of the key characteristics of a rating system is how much weight to put on any one result. In an Elo system this size is called the K-factor. A high K-factor puts a lot of weight on new results, while a low value is the opposite.
So if there’s a low K-factor, the system rewards strong performances consistently over time, where players are only slowly climbing or falling on the list. Such a list can feel a little too slow, where the list will lag behind reality.
If the K-factor is high, the system is very quick to adjust. This means that that the most recent tournament results are considered as the most important. This can be problematic as the list becomes too volatile, with players rising and falling many places every rating period, and the rankings merely a copy of the last tournament placements.
The solution then is to have the K-factor vary. For each player, every rating period the system calculates a unique K-factor based on the level of the player, the number of games played previously and the number of games in the current rating period. Thus, for established pros with many registered games the ratings will put relatively little stock in just a single result, but for a newcomer with almost no registered games the system will quickly adjust the rating based on fresh results.
The current ratings was released 1st of March, 2017.