Back2Warcraft's Elo Rating

Back2Warcraft's Elo Rating is a rating list of the Warcraft 3 professional players curated by Back2Warcraft. The ratings are calculated based solely on map scores in high level tournaments. The ratings do not take the eventual tournament position into account, or indeed any other factors such as prize money and ladder standings.
The ratings are calculated monthly, and released at the beginning of the month. They are based on more than 2500 high level games since December 2015.
Rules[edit]
The ratings use the following standards:
 The first games entered to the system are WCA 2015, where all players were given an initial rating of 2000
 The rating period is one month
 The ratings prior to November, 2016 is considered part of the transient response and not trustworthy
 All games must come from tournaments where at least half the players already have a rating (preferably) or a well established provisional rating
 The minimum number of games to get a rating is 12, before that the players have a provisional rating only
 The minimum number of games to be included in the rankings is 15
 There is no rating decay; the rating will not decrease if a player is inactive
 There is a decay on number of games played for the player, so inactive players will be dropped from the rankings after a period of no games
Calculations[edit]
The calculations are based around Arpad Elo rating system first implemented in chess in 1960. Back2Warcraft’s Elo Rating, while following the basic principles of this system, are implemented using the slightly impoved and more modern version currently in use by the USCF. A straightforward explanation of mathematics can be found in a paper by Mark E. Glickman, a rating system expert, in Chapter 3 and beyond.
The main differences between this variant and the more traditional Elo system is the variable Kfactor and the antiinflation system described by the special rating formula, see the section below.
The Elo rating score is calculated in three basic steps, predict, compare and adjust.
Predict[edit]
The first thing the system does when considering a new result is to predict what the outcome should have been. So before looking at the actual result, the system calculates the likelihood of each player winning the game based on their rating. This likelihood is called the expected score of each player, where the score is simply defined by win = 1, loss = 0. Since only one player can win, the sum of the expected scores is 1.
So in a game between two players rated P1=2000 and P2=1900, the expected scores are 0.64 for P1 and 0.36 for P2. Or in other words, the higher rated player is expected to win 64% of the time if the ratings are true. Since P2 is expected to score 0.36 points each time these players compete, if there’s a situation where they play each other five times the total expected score is 0.36 x 5 = 1.8. Similarly, P1 has an expected score of 0.64 x 5 = 3.2.
Compare[edit]
The predicted score is then compared to the actual score. For each player the difference between these is called the delta. This value is a measure of how the player is performing compared to what could be expected given their current rating. So a positive delta means the player is overperforming his rating, a negative delta means underperforming.
So continuing with the example above, given a 10 victory for P1, the delta is set to (Actual score – Expected score) = (1 – 0.64) = +0.36. For P2 the delta is (0 – 0.36)=0.36. P1 did slightly better than predicted, P2 slightly worse. If we change up the score, and now let the weaker player win with a 10 victory for P2, the deltas become 0.64 and +0.64. So, not surprisingly, when an upset occur, the deltas are higher.
Now, given a five game series between P1 and P2 with a score of 32 victory for P1, the delta of P1 would be (3 – 3.2) = 0.2. The delta of P2 would be (2 – 1.8)=+0.2. The relatively small deltas indicate that the ratings of the players are true, and that little adjustment is necessary. A 32 score is right in line with what we expect when a player rated 2000 plays against a 1900.
Taking the sum of all the deltas of a player over the rating period then gives one number indicating whether the rating should increase, decrease or stay as is.
Adjust[edit]
The final step is then to create a new rating for the player, more accurate to how the player is actually performing. The new rating is given by the old rating plus the adjustment based on the delta described above, or mathematically: Rating = OldRating + K*delta, where K > 0.
If the player has overperformed, delta is a positive number, and the new rating will be higher than the old. If the player has underperformed, the delta will be negative and the rating decrease. The conversion from a delta to an actual adjustment is done by multiplying delta with the factor K, called the Kfactor
The KFactor[edit]
One of the key characteristics of a rating system is how much weight to put on any one result. In an Elo system this size is called the Kfactor. A high Kfactor puts a lot of weight on new results, while a low value is the opposite.
So if there’s a low Kfactor, the system rewards strong performances consistently over time, where players are only slowly climbing or falling on the list. Such a list can feel a little too slow, where the list will lag behind reality.
If the Kfactor is high, the system is very quick to adjust. This means that that the most recent tournament results are considered as the most important. This can be problematic as the list becomes too volatile, with players rising and falling many places every rating period, and the rankings merely a copy of the last tournament placements.
The solution then is to have the Kfactor vary. For each player, every rating period the system calculates a unique Kfactor based on the level of the player, the number of games played previously and the number of games in the current rating period. Thus, for established pros with many registered games the ratings will put relatively little stock in just a single result, but for a newcomer with almost no registered games the system will quickly adjust the rating based on fresh results.
Rating inflation[edit]
Prediction[edit]
Rating difference  Bo1  Bo3  Bo5  Bo7 

+400  82%  92%  96%  98% 
+300  76%  86%  90%  94% 
+200  68%  76%  81%  85% 
+150  64%  71%  75%  78% 
+100  60%  64%  67%  70% 
+50  55%  57%  59%  60% 
+25  52%  54%  55%  55% 
0  50%  50%  50%  50% 
Current rating[edit]
The current ratings was released 1st of June, 2017.
Rank  Change  Player  Rating  Rating change 

1  (+1)  120  2196  (+43) 
2  (+1)  Happy  2156  (+3) 
3  (2)  Lyn  2139  (47) 
4  ()  TH000  2135  (+28) 
5  ()  Moon  2091  (+37) 
6  ()  FoCuS  2066  (+13) 
7  (+1)  Fly100%  2048  (+20) 
8  (1)  WFZ  2022  (16) 
9  ()  Infi  2008  (7) 
10  (+1)  ReMinD  1993  (+6) 
11  (+3)  Foggy  1992  (+23) 
12  ()  LawLiet  1974  (3) 
13  (+2)  Yumiko  1972  (+4) 
14  (1)  Check  1959  (12) 
15  (5)  Colorful  1955  (43) 
16  ()  ReprisaL  1954  (+0) 
17  (+2)  XiaoKK  1944  (+0) 
18  ()  Life  1941  (7) 
19  (+2)  Zhou_Xixi  1908  (16) 
20  ()  Yange  1899  (40) 
21  (4)  Sonik  1890  (61) 
22  (+2)  Romantic  1886  (+18) 
23  (1)  alice  1883  (6) 
24  (+8)  Sini  1853  (+41) 
25  (2)  Sok  1845  (44) 
26  (+4)  Hawk  1837  (+19) 
27  (2)  Lucifer  1835  (16) 
28  (+1)  Rudan  1831  (+5) 
29  (1)  Believe  1829  (+0) 
30  (4)  Bany  1818  (24) 
31  ()  tbc_bm  1814  (+0) 
32  (5)  OrcWorker  1807  (32) 
33  ()  chaemiko  1805  (+10) 
34  (+1)  So.In  1802  (+15) 
35  (1)  Anima  1775  (17) 
36  (New)  blade  1753  (+94) 
37  (New)  nowayoc  1744  (+2) 
38  (2)  Sheik  1673  (31) 
39  (2)  Imperius  1627  (3) 