Back2Warcraft's Elo Rating

From Liquipedia Warcraft Wiki


Back2Warcraft's Elo Rating is a rating list of the Warcraft 3 professional players curated by Back2Warcraft. The ratings are calculated based solely on map scores in high level tournaments. The ratings do not take the eventual tournament position into account, or indeed any other factors such as prize money and ladder standings.

The ratings are calculated monthly, and released at the beginning of the month. They are based on more than 2500 high level games since December 2015.

Rules[edit]

The ratings use the following standards:

  • The first games entered to the system are WCA 2015, where all players were given an initial rating of 2000
  • The rating period is one month
  • The ratings prior to November, 2016 is considered part of the transient response and not trustworthy
  • All games must come from tournaments where at least half the players already have a rating (preferably) or a well established provisional rating
  • The minimum number of games to get a rating is 12, before that the players have a provisional rating only
  • The minimum number of games to be included in the rankings is 15
  • There is no rating decay; the rating will not decrease if a player is inactive
  • There is a decay on number of games played for the player, so inactive players will be dropped from the rankings after a period of no games

Calculations[edit]

The calculations are based around Arpad Elo rating system first implemented in chess in 1960. Back2Warcraft’s Elo Rating, while following the basic principles of this system, are implemented using the slightly impoved and more modern version currently in use by the USCF. A straightforward explanation of mathematics can be found in a paper by Mark E. Glickman, a rating system expert, in Chapter 3 and beyond.

The main differences between this variant and the more traditional Elo system is the variable K-factor and the anti-inflation system described by the special rating formula, see the section below.

The Elo rating score is calculated in three basic steps, predict, compare and adjust.

Predict[edit]

The first thing the system does when considering a new result is to predict what the outcome should have been. So before looking at the actual result, the system calculates the likelihood of each player winning the game based on their rating. This likelihood is called the expected score of each player, where the score is simply defined by win = 1, loss = 0. Since only one player can win, the sum of the expected scores is 1.

So in a game between two players rated P1=2000 and P2=1900, the expected scores are 0.64 for P1 and 0.36 for P2. Or in other words, the higher rated player is expected to win 64% of the time if the ratings are true. Since P2 is expected to score 0.36 points each time these players compete, if there’s a situation where they play each other five times the total expected score is 0.36 x 5 = 1.8. Similarly, P1 has an expected score of 0.64 x 5 = 3.2.

Compare[edit]

The predicted score is then compared to the actual score. For each player the difference between these is called the delta. This value is a measure of how the player is performing compared to what could be expected given their current rating. So a positive delta means the player is overperforming his rating, a negative delta means underperforming.

So continuing with the example above, given a 1-0 victory for P1, the delta is set to (Actual score – Expected score) = (1 – 0.64) = +0.36. For P2 the delta is (0 – 0.36)=-0.36. P1 did slightly better than predicted, P2 slightly worse. If we change up the score, and now let the weaker player win with a 1-0 victory for P2, the deltas become -0.64 and +0.64. So, not surprisingly, when an upset occur, the deltas are higher.

Now, given a five game series between P1 and P2 with a score of 3-2 victory for P1, the delta of P1 would be (3 – 3.2) = -0.2. The delta of P2 would be (2 – 1.8)=+0.2. The relatively small deltas indicate that the ratings of the players are true, and that little adjustment is necessary. A 3-2 score is right in line with what we expect when a player rated 2000 plays against a 1900.

Taking the sum of all the deltas of a player over the rating period then gives one number indicating whether the rating should increase, decrease or stay as is.

Adjust[edit]

The final step is then to create a new rating for the player, more accurate to how the player is actually performing. The new rating is given by the old rating plus the adjustment based on the delta described above, or mathematically: Rating = OldRating + K*delta, where K > 0.

If the player has overperformed, delta is a positive number, and the new rating will be higher than the old. If the player has underperformed, the delta will be negative and the rating decrease. The conversion from a delta to an actual adjustment is done by multiplying delta with the factor K, called the K-factor

The K-Factor[edit]

One of the key characteristics of a rating system is how much weight to put on any one result. In an Elo system this size is called the K-factor. A high K-factor puts a lot of weight on new results, while a low value is the opposite.

So if there’s a low K-factor, the system rewards strong performances consistently over time, where players are only slowly climbing or falling on the list. Such a list can feel a little too slow, where the list will lag behind reality.

If the K-factor is high, the system is very quick to adjust. This means that that the most recent tournament results are considered as the most important. This can be problematic as the list becomes too volatile, with players rising and falling many places every rating period, and the rankings merely a copy of the last tournament placements.

The solution then is to have the K-factor vary. For each player, every rating period the system calculates a unique K-factor based on the level of the player, the number of games played previously and the number of games in the current rating period. Thus, for established pros with many registered games the ratings will put relatively little stock in just a single result, but for a newcomer with almost no registered games the system will quickly adjust the rating based on fresh results.

Rating inflation[edit]

Prediction[edit]

Match outcome prediction
Rating difference Bo1 Bo3 Bo5 Bo7
+400 82% 92% 96% 98%
+300 76% 86% 90% 94%
+200 68% 76% 81% 85%
+150 64% 71% 75% 78%
+100 60% 64% 67% 70%
+50 55% 57% 59% 60%
+25 52% 54% 55% 55%
0 50% 50% 50% 50%

Current rating[edit]

The current ratings was released 1st of June, 2017.

Rank Change Player Rating Rating change
1 (+1) China Undead icon small.png 120 2196 (+43)
2 (+1) Russia Undead icon small.png Happy 2156 (+3)
3 (-2) South Korea Orc icon small.png Lyn 2139 (-47)
4 (-) China Human icon small.png TH000 2135 (+28)
5 (-) South Korea Nightelf icon small.png Moon 2091 (+37)
6 (-) South Korea Orc icon small.png FoCuS 2066 (+13)
7 (+1) China Orc icon small.png Fly100% 2048 (+20)
8 (-1) China Undead icon small.png WFZ 2022 (-16)
9 (-) China Human icon small.png Infi 2008 (-7)
10 (+1) South Korea Nightelf icon small.png ReMinD 1993 (+6)
11 (+3) Ukraine Nightelf icon small.png Foggy 1992 (+23)
12 (-) South Korea Nightelf icon small.png LawLiet 1974 (-3)
13 (+2) China Human icon small.png Yumiko 1972 (+4)
14 (-1) South Korea Nightelf icon small.png Check 1959 (-12)
15 (-5) China Nightelf icon small.png Colorful 1955 (-43)
16 (-) South Korea Human icon small.png ReprisaL 1954 (+0)
17 (+2) China Orc icon small.png XiaoKK 1944 (+0)
18 (-) China Nightelf icon small.png Life 1941 (-7)
19 (+2) China Nightelf icon small.png Zhou_Xixi 1908 (-16)
20 (-) South Korea Nightelf icon small.png Yange 1899 (-40)
21 (-4) Ukraine Nightelf icon small.png Sonik 1890 (-61)
22 (+2) China Human icon small.png Romantic 1886 (+18)
23 (-1) China Nightelf icon small.png alice 1883 (-6)
24 (+8) China Nightelf icon small.png Sini 1853 (+41)
25 (-2) South Korea Human icon small.png Sok 1845 (-44)
26 (+4) Russia Human icon small.png Hawk 1837 (+19)
27 (-2) South Korea Undead icon small.png Lucifer 1835 (-16)
28 (+1) Serbia Nightelf icon small.png Rudan 1831 (+5)
29 (-1) South Korea Undead icon small.png Believe 1829 (+0)
30 (-4) South Korea Nightelf icon small.png Bany 1818 (-24)
31 (-) China Undead icon small.png tbc_bm 1814 (+0)
32 (-5) Belarus Orc icon small.png OrcWorker 1807 (-32)
33 (-) South Korea Human icon small.png chaemiko 1805 (+10)
34 (+1) South Korea Orc icon small.png So.In 1802 (+15)
35 (-1) France Human icon small.png Anima 1775 (-17)
36 (New) Space filler flag.png Space filler race.png blade 1753 (+94)
37 (New) Space filler flag.png Space filler race.png nowayoc 1744 (+2)
38 (-2) Russia Undead icon small.png Sheik 1673 (-31)
39 (-2) Russia Human icon small.png Imperius 1627 (-3)