Back2Warcraft's Elo Rating

From Liquipedia Warcraft Wiki


Back2Warcraft's Elo Rating is a rating list of the Warcraft 3 professional players curated by Back2Warcraft. The ratings are calculated based solely on map scores in high level tournaments. The ratings do not take the eventual tournament position into account, or indeed any other factors such as prize money and ladder standings.

The ratings are calculated monthly, and released at the beginning of the month. They are based on more than 2500 high level games since December 2015.

Rules[edit]

The ratings use the following standards:

  • The first games entered to the system are WCA 2015, where all players were given an initial rating of 2000
  • The rating period is one month
  • The ratings prior to November, 2016 is considered part of the transient response and not trustworthy
  • All games must come from tournaments where at least half the players already have a rating (preferably) or a well established provisional rating
  • The minimum number of games to get a rating is 12, before that the players have a provisional rating only
  • The minimum number of games to be included in the rankings is 15
  • There is no rating decay; the rating will not decrease if a player is inactive
  • There is a decay on number of games played for the player, so inactive players will be dropped from the rankings after a period of no games

Calculations[edit]

The calculations are based around Arpad Elo rating system first implemented in chess in 1960. Back2Warcraft’s Elo Rating, while following the basic principles of this system, are implemented using the slightly impoved and more modern version currently in use by the USCF. A straightforward explanation of mathematics can be found in a paper by Mark E. Glickman, a rating system expert, in Chapter 3 and beyond.

The main differences between this variant and the more traditional Elo system is the variable K-factor and the anti-inflation system described by the special rating formula, see the section below.

The Elo rating score is calculated in three basic steps, predict, compare and adjust.

Predict[edit]

The first thing the system does when considering a new result is to predict what the outcome should have been. So before looking at the actual result, the system calculates the likelihood of each player winning the game based on their rating. This likelihood is called the expected score of each player, where the score is simply defined by win = 1, loss = 0. Since only one player can win, the sum of the expected scores is 1.

So in a game between two players rated P1=2000 and P2=1900, the expected scores are 0.64 for P1 and 0.36 for P2. Or in other words, the higher rated player is expected to win 64% of the time if the ratings are true. Since P2 is expected to score 0.36 points each time these players compete, if there’s a situation where they play each other five times the total expected score is 0.36 x 5 = 1.8. Similarly, P1 has an expected score of 0.64 x 5 = 3.2.

Compare[edit]

The predicted score is then compared to the actual score. For each player the difference between these is called the delta. This value is a measure of how the player is performing compared to what could be expected given their current rating. So a positive delta means the player is overperforming his rating, a negative delta means underperforming.

So continuing with the example above, given a 1-0 victory for P1, the delta is set to (Actual score – Expected score) = (1 – 0.64) = +0.36. For P2 the delta is (0 – 0.36)=-0.36. P1 did slightly better than predicted, P2 slightly worse. If we change up the score, and now let the weaker player win with a 1-0 victory for P2, the deltas become -0.64 and +0.64. So, not surprisingly, when an upset occur, the deltas are higher.

Now, given a five game series between P1 and P2 with a score of 3-2 victory for P1, the delta of P1 would be (3 – 3.2) = -0.2. The delta of P2 would be (2 – 1.8)=+0.2. The relatively small deltas indicate that the ratings of the players are true, and that little adjustment is necessary. A 3-2 score is right in line with what we expect when a player rated 2000 plays against a 1900.

Taking the sum of all the deltas of a player over the rating period then gives one number indicating whether the rating should increase, decrease or stay as is.

Adjust[edit]

The final step is then to create a new rating for the player, more accurate to how the player is actually performing. The new rating is given by the old rating plus the adjustment based on the delta described above, or mathematically: Rating = OldRating + K*delta, where K > 0.

If the player has overperformed, delta is a positive number, and the new rating will be higher than the old. If the player has underperformed, the delta will be negative and the rating decrease. The conversion from a delta to an actual adjustment is done by multiplying delta with the factor K, called the K-factor

The K-Factor[edit]

One of the key characteristics of a rating system is how much weight to put on any one result. In an Elo system this size is called the K-factor. A high K-factor puts a lot of weight on new results, while a low value is the opposite.

So if there’s a low K-factor, the system rewards strong performances consistently over time, where players are only slowly climbing or falling on the list. Such a list can feel a little too slow, where the list will lag behind reality.

If the K-factor is high, the system is very quick to adjust. This means that that the most recent tournament results are considered as the most important. This can be problematic as the list becomes too volatile, with players rising and falling many places every rating period, and the rankings merely a copy of the last tournament placements.

The solution then is to have the K-factor vary. For each player, every rating period the system calculates a unique K-factor based on the level of the player, the number of games played previously and the number of games in the current rating period. Thus, for established pros with many registered games the ratings will put relatively little stock in just a single result, but for a newcomer with almost no registered games the system will quickly adjust the rating based on fresh results.

Rating inflation[edit]

Prediction[edit]

Match outcome prediction
Rating difference Bo1 Bo3 Bo5 Bo7
+400 82% 92% 96% 98%
+300 76% 86% 90% 94%
+200 68% 76% 81% 85%
+150 64% 71% 75% 78%
+100 60% 64% 67% 70%
+50 55% 57% 59% 60%
+25 52% 54% 55% 55%
0 50% 50% 50% 50%

Current rating[edit]

The current ratings was released 1st of March, 2017.

Rank Change Player Rating Rating change
1 (-) South Korea Orc Lyn 2182 (-23)
2 (-) China Undead 120 2136 (+2)
3 (-) China Human TH000 2115 (-8)
4 (-) Russia Undead Happy 2111 (+52)
5 (+1) South Korea Orc FoCuS 2085 (+48)
6 (-1) China Undead WFZ 2041 (+0)
7 (-) China Human Infi 2032 (-1)
8 (-) China Human Yumiko 2029 (+0)
9 (-) South Korea Night Elf LawLiet 2014 (-3)
10 (-) China Orc Fly100% 1996 (-16)
11 (+7) China Night Elf Zhou_Xixi 1977 (+31)
12 (-) South Korea Night Elf Moon 1972 (-5)
13 (-2) South Korea Night Elf ReMinD 1963 (-20)
14 (-1) China Night Elf Life 1951 (-23)
15 (-) Ukraine Night Elf Sonik 1951 (+0)
16 (+1) South Korea Night Elf Check 1947 (+0)
17 (+2) China Night Elf Colorful 1946 (+9)
18 (-2) South Korea Human ReprisaL 1934 (-13)
19 (-5) Ukraine Night Elf Foggy 1925 (-32)
20 (+2) United Kingdom Night Elf WarchiefRich 1922 (+0)
21 (+2) China Orc ZDR 1921 (+0)
22 (+3) China Orc XiaoKK 1920 (+19)
23 (-2) South Korea Night Elf Bany 1914 (-13)
24 (-4) China Human Romantic 1905 (-28)
25 (+1) China Human Banbo 1893 (+0)
26 (+1) Serbia Night Elf Rudan 1888 (-2)
27 (-3) South Korea Undead Lucifer 1887 (-25)
28 (-) South Korea Human Sok 1885 (+6)
29 (-) China Night Elf Sini 1869 (+0)
30 (-) Belarus Orc OrcWorker 1861 (-3)
31 (-) Russia Human Hawk 1845 (-2)
32 (+4) South Korea Undead Believe 1844 (+44)
33 (-) China Undead tbc_bm 1837 (+11)
34 (New) South Korea Night Elf alice 1825 (-74)
35 (+2) France Human Anima 1797 (+0)
36 (-1) South Korea Orc So.In 1796 (-21)
37 (+1) China Undead Fantafiction 1716 (+0)
38 (New) South Korea Human chaemiko 1699 (+36)
39 (-) Russia Human Imperius 1668 (-21)
40 (-) Russia Undead Sheik 1660 (+43)