Elo rating

From Liquipedia StarCraft Brood War Wiki
Jump to: navigation, search

The Elo rating system is a method for calculating the relative skill levels of StarCraft players in 1v1 match-ups by the TLPD. It is named after its creator Arpad Elo with its invention serving as a chess rating system. Other objective player ranking systems including the KeSPA ranking and ladder points systems on the Fish server and ICCup.

Simple Explanation[edit]

At its simplest, every player begins at the same starting point. A win gives the player points, while a loss makes the player lose points. Winning against an opponent with a higher Elo earns the player more points, while losing to an opponent with a lower Elo causes the player to lose more points. Winning against an opponent with lower Elo gives the player less, losing against a higher Elo means the player loses fewer points.

Detailed Explanation[edit]

The difference in the Elo ratings between two players serves as a predictor of the outcome of a game. Two players with equal ratings who play against each other are expected to score an equal number of wins. A player's Elo rating is represented by a number which increases or decreases based upon the outcome of games between rated players. After every game, the winning player takes points from the losing one. The difference between the ratings of the winner and loser determines the total number of points gained or lost after a game. In a series of games between a high-rated player and a low-rated player, the high-rated player is expected to score more wins. If the high-rated player wins, then only a few rating points will be taken from the low-rated player. However, if the lower rated player scores an upset win, many rating points will be transferred. The lower rated player will also gain a few points from the higher rated player in the event of a draw. This means that this rating system is self-correcting. A player whose rating is too low should, in the long run, do better than the rating system predicts, and thus gain rating points until the rating reflects their true playing strength.

The Elo rating is calculated by the below equation:

Where S = 1 in a victory and 0 in defeat while the K-factor or K-coefficient, K, equals 40 for players who have played a limited amount of games (less than 20) while equaling 20 for all others. The gradation of the K-factor reduces ratings changes at the top end of the rating spectrum, reducing the possibility for rapid ratings inflation or deflation for those with a low K factor. At the same time, newer players benefit from the ability to climb to a rating more representative of their skill. All players start at 2000 rating points.

The Elo rating is sometimes used to gauge a player's skill level, but is more mathematically attuned to determining the odds or statistical probability of victory. For example, two players who share the same Elo rating at a given moment have an equivalent, or 50/50, chance of winning the game. A player whose rating is 100 points greater than his/her opponent's rating is expected to win 64% of the time. If the difference is 200 points, then the player with the higher rating should have a 76% probability of winning.

On the TLPD, two types of Elo ratings are featured[1]:

  • The Elo columns show the current Elo rating of the corresponding player, cumulatively and against each of the three races. This provides a snapshot of current performance.
  • The Elo Peak columns show the highest Elo rating the player ever reached. This is, in effect, an easy gauge of a player's dominance at a given time. By comparing peak performance between players, the reader gains a sense of how dominant one player is compared to another at a fixed time.

Some in the community have also used a calculation called Average Elo to measure a player's Elo peak in performance against all three races cumulatively, rather than the Elo peak against only one race.[2]

Comparison to KeSPA rating[edit]

Although both points systems are determined by past performance, the KeSPA ranking is heavily weighted on the type of achievement. While Elo points are determined by how a player performs against his/her previous opponents, specifically how relatively strong the opponent is as gauged by the difference in Elo ratings between the two players. How strong the opponent is, at the same time, is determined by his/her performance against his/her previous opponents and their relative Elo rating differences.

KeSPA points are proportional to the level of achievement, such that a championship or finals game weighs much more heavily than a qualifying match. In contrast, Elo calculations count each game the same, regardless of the context.[3]

Additionally, KeSPA rankings take into account when the player performed and earned points, weighing the most recent games the most heavily. The games played in the previous three months receiving full weighting, while earlier achievements decayed over time up to 12 months. As such, while the KeSPA system only calculates accumulated points over a finite period, the Elo system is cumulative over a lifetime of past performances.

It is noted that both systems are entirely results-based and offer no concrete reflection of the players themselves and how they'll perform on any given day.[4]

Elo Ratings in Reflection[edit]

Criticism has long existed about Elo ratings increasing over time, or inflation. Analysis has shown the average player's Elo rating has grown by about 28 points from 2001 to 2009.[5] Some have explained that this inflation is partly due to the newer players gaining points from older/declining players who continue to compete even as they decline, such that the points lost during these losses are gained by the newer players. As such, newer players in each "generation" will have this advantage of a points premium. Meanwhile, it has been suggested that an overall increase in ratings reflects greater skill as players' overall abilities and game play have improved over the eras as they have learned and built upon past players' experiences and strategies.

Interestingly, a phenomenon that was observed indicated that Elo ratings require varying amounts of time to reach a finite peak or maximum.[6]

  • In years 2000-2001, during the early years with very few pro-gamers and StarCraft only gaining in the number of players and competition, BoxeR hypothetically had an Elo of 2100 when he played the 2001 OSL. His opponents had likely not played many games yet (games that counted towards their Elo rating), so those opponents had an Elo closer to 2000. When BoxeR won the Starleague, he would earn a comparatively small amount of points because his Elo was significantly higher than his opponents, which results in fewer points won. Hypothetically, he would have a new Elo of 2120.
  • In 2008, several years later, the stronger players have had some time to reach to their maximum/representative ranking. Bisu would also have an Elo of 2100. When he won the 2008 MSL, Bisu played against experienced players who had endured qualifiers and were thus well-traveled Starleague players, earning Elo points along the way, so they likely also had Elo ratings in the 2100-range. With these victories, Bisu would hypothetically gain 100 Elo points to reach a peak of 2200. Starting at the same Elo rating (2100), despite playing and winning as convincingly as BoxeR did in 2001, Bisu would thus gain more points. After the same length of career as BoxeR, Bisu would have a 2200 peak Elo versus the 2120 of BoxeR, even though both were equally "dominant" in winning his respective Starleague. This is because Bisu played against 2100 Elo opponents, whereas BoxeR played against 2000 Elo opponents.

As such, in this example, the earlier player needed much more time to reach the player's peak Elo because of the availability of stronger competition.