Elo Win Probability Calculator

Step 1

Enter player ratings or pick two players from a list. Alternatively, enter an Elo difference or an expected score (and a draw probability for Chess). Dependent variables will automatically adjust.

Choose a rating source:
(players rated 2400 or better, source)
(source)
(players rated 4d or better, source)1,3
(players rated 4d or better, source)2,3

(source)4
(source)4

Rating 1

Rating 2


Elo difference:

Elo formula:

Player 1 expected score: (for one game, or one set in Tennis)5

Draw probability: (see note)6

Step 2

Enter match details to obtain the probabilities of winning a match.

Competition format:

by a margin of two. (Draw probability must be zero.)

One-click settings:

Current score:

Result

OutcomeProbability
player 1 win
player 2 win
draw

Notes

  1. EGF ratings are not Elo ratings, but they can be converted to an Elo scale (logistic distribution).
  2. AGA ratings are not Elo ratings, but they can be converted to an Elo scale (normal distribution).
  3. For EGF and AGA you can type "rank 4d" as the name for a generic 4d, "rank 10k" for a generic 10k, etc.
  4. Tennis ratings from Tennis Abstract are Elo ratings (logistic distribution) but for a match. The rating difference is converted to an Elo difference for a set in step 1 so that we get back probabilities for a match in step 2.
  5. The expected score is the win probability plus half of the draw probability.
  6. For Chess, the draw probability is estimated from Rating 1 and Rating 2 using a homemade formula loosely fitting the plot on this page.

About the Elo scale

The starting point of the Elo rating system is a curve mapping rating differences to expected scores (which is the same as win probabilities if there are no draws). Unfortunately there is disagreement over which curve should be used. As an example, in a game with no draw if player A has an 80% chance of beating B, and if B has an 80% chance of beating C, then what is the probability of A beating C? If you think that the answer is approximately 94.1%, then you're in the "logistic distribution" camp. If you think that the answer is approximately 95.4%, then you're in the "normal distribution" camp.

Elo curves

Both are justified mathematically: In our example, the chances of winning are in a ratio of 4:1 between A and B and in a ratio of 4:1 between B and C, so it's reasonable to expect a ratio of 16:1 between A and C, giving a probability of A beating C of 16/17 ≈ 94.1%, the logistic distribution value. Another approach is to assume that if A has a (50+ε)% chance of beating B, and if B has a (50+ε)% chance of beating C, then A has a (50+2ε)% chance of beating C. Now, whatever is the game A, B and C are playing, best-of-n is a meta-game that the players can also play. If we find n such that (50+ε)% becomes 80% in a best-of-n, then (50+2ε)% becomes 95.4% in a best-of-n, the normal distribution value.

The normal distribution was apparently Arpad Elo's original suggestion, but it is quite harsh on the underdog and many people claim that the logistic distribution works better in practice, so what should we use? I don't know. This page automatically selects the distribution recommended by the rating source, but it lets you override it.

About Go ratings

Go strength is traditionally expressed with a dan/kyu rank in units of "stones". Since this is the preferred way to communicate Go strength, a rating system must map a rank difference in stones to a win probability (or equivalently to an Elo difference). Unfortunately, I believe that the mappings used by both the AGA and the EGF are very unrealistic, which is a real shame because other parts of their rating systems are very polished. The worst offender is the AGA which postulates a constant ~270 Elo per stone, which is clearly too much for weak players. The EGF system is better because it attempts to make the value rank-dependent, but the curve used is not ambitious enough and is not a good fit to the EGF's own winning statistics. To fix this, I created the source "Go (pros + EGF + AGA)" which I recommend using for your Go probability needs instead of the individual EGF or AGA sources (the "pros" source is fine and should give the same results up to rounding errors). I also added AlphaGo at 3080 EGF for fun, although we don't really know its true strength.

Below is a graph plotting AGA's constant, EGF's curve, and my proposed homemade curve against actual EGF game statistics from 2006-2015. The actual statistics behave strangely below about 12 kyu, as noted by Geoff Kaniuk when also attempting to fit a model. I think that this is happening because the minimum EGF rank is artificially set to 20 kyu which distorts the winning statistics at the lower end, so we can safely ignore this range. Also the goal shouldn't be to blindly fit the experimental data because in practice we expect some error between a player's rating and their true rank, and we can reason that such noise in the ratings causes the experimental curves to be lower than the curve we would obtain if the players' ratings had time to converge to their true value.

Go curves

Below I've collected other estimates of the number of Elo points per stone that I could find or derive. I took some of these estimates into account when designing my curve. It's hard to say how reliable each of these is.

Elo per stone Source
~50 Pandanet IGS Ranking System. Points awarded imply values in the range 44-56.
226 for 2d+
148 for 30k-5k
KGS Rating Math
~160 according to the figure axis
230 according to the caption
AlphaGo Nature paper, Figure 4a, citing KGS
~187 for 6-10d KGS
153 for 2-6d KGS
AlphaGo Nature paper, Extended Data Table 6. From the Elo differences of CrazyStone, Zen and Pachi with 4 handicap stones and komi. Assuming that White is getting the komi, then this is the correct handicap for a 3-stone difference in stength, so the raw Elo differences were divided by 3.
191 Opening Keynote: "The Story of Alpha Go". "One stone stronger is about 75%". This converts to invEloLogistic(0.75) ≈ 191 (function available on the JavaScript console).
302 for pros Komi statistics. "a change of 1 point in the value of komi would produce a change of 3.1% in the percentage of games won by black." Assuming that one stone is worth 14 points, then some bold extrapolation gives invEloLogistic(0.5 + 0.031) * 14 ≈ 302 (function available on the JavaScript console).
infinite for ~13d Hand of God. "most estimates place God three ranks above top professionals." The EGF scale sets 1p = 7d and 9p = 9.4d, so top pros might be close to 10d.

Sadly the AlphaGo video claims that some constant value works across all levels. My guess is that the AlphaGo team only deals with professional-level strength, so they never encountered any problem with using a constant. The IGS value may seem out of place, but the median player strength on IGS is about 4 kyu and 50 Elo per stone is a good match for that according to my curve.


Page created: September 29, 2016
Page last updated: February 25, 2017

back to François Labelle's homepage