Now that we’ve spent some time outlining the virtues we seek in our tournament designs, we can begin to get geeky. In this post, I’ll introduce a simple way of testing various tournament designs by running simulations to produce an estimate of the tournament’s fairness.
Fairness is not, of course, a simple concept, and some aspects of it are more measurable than others. I don’t see any reasonable way to measure fairness(A), continuity with past practice. Fairness(B), equal chances, might lend itself to something like a Gini coefficient, where you’d measure the difference between the perfect equality of result and the actual results of a simulation. But I think that Fairness(B) is not something we need a separate measure of. We really don’t particularly want to run tournaments where everyone, regardless of skill, has the same chance to win. Fairness (B) reminds us that we need to structure things so that anyone who enters could win the tournament if they play well, but it is going too far to say that we want poor players to win as often as good players.
Fairness(C), the extent to which the tournament rewards superior play does seem ripe for some kind of quantification. Let me outline an approach:
We’ll test the idea on a small, but not trivially small, tournament – an elimination tournament with 16 entrants. Each of the 16 entrants is gets a random “error factor” between zero and one. The error factors are sorted, and the lot of them are rezeroed on the lowest figure. Thus, the best simulated player in any iteration has a error factor of zero, and the other 15 have error factors that range from just above zero to one – they are kept constant for each iteration of the simulation, and then redrawn for the next one.
The results of an individual match are obtained by adding a luck factor, also randomly drawn between zero and one, to the error factor of each player. The winner is the player with the lower score. Notice that this means that skill and luck are equally important to the result.
Now, it might well be objected that this is an oversimplified model of actual play. It might be better, for example, to generate random error rates and luck factors according to a normal distribution. And we’ll probably want to alter the relative importance of skill and luck to reflect what we know about how particular games work – more emphasis on skill for football, perhaps, and less for baseball. More for chess, less for backgammon. But let’s go with something really simple, for now – both random factors are equal, and uniformly distributed.
In this first test, I’m interested only in who wins the tournament. If the tournament were perfectly fair (remember, we’re talking fairness(C)), the best player would always win. To the extent that anyone else wins, the chance factors in the simulation have submerged the skill factors, and the result is a bit less fair.
Here’s my proposed coefficient of fairness:
1 / (average skill of winner + 0.01)
Now, since we zeroed on the skill of the best player, the average skill of the winner is the average excess error rate of the actual winner. The 0.01 is there to prevent the coefficient from going to infinity – instead, if the best player does always win, the fairness coefficient will be 100.
OK, I’ve run 10M simulated tournaments that are single elimination, and another 10M double elimination. (For now, I’m running the standard lower bracket in the doubleelimination. There are a number of other ways to draw the lower bracket, and we’ll test them later.) Both simulations are for unseeded, blinddraw tournaments.
The results (the envelope, please):
Single elimination: 8.73
Double elimination: 10.86
Well, the direction seems right – we would expect the doubleelimination tournament to be fairer than the single elimination tournament. But beyond that, the numbers are pretty meaningless because we have nothing else to compare them to. Is 2.13 a big gain in fairness, or a small one?
In subsequent posts, I’ll try to provide other reference points that will begin to give a clearer understanding of what’s going on. We’ll look at the effects of seeding. Of the number and placement of byes. Of different ways to draw the lower bracket (there are more than you probably realize). And anything else that attracts our curiosity.
In the meantime, however, let’s give a sense of what this means in practical terms by showing how the hypothetical players fared according to their relative skill levels. The number on the left is the ordinal of skill – 1 is the best player, 2 the second best, and so forth down to 16 for the worst player. Then in the columns are the percentage of wins for those players in each of the two tournament formats:
single elimination  double elimination  

1

30.15%

34.69

2

21.82

23.61

3

15.61

15.73

4

10.98

10.20

5

7.58

6.47

6

5.14

3.99

7

3.39

2.38

8

2.19

1.38

9

1.36

0.77

10

0.82

0.40

11

0.47

0.20

12

0.26

0.095

13

0.13

0.041

14

0.064

0.018

15

0.027

0.0026

16

0.011

0.0015

Using a double elimination format has improved the chances of the top three players, and harmed the chances of everyone else.
There is more that can be learned from this simulation, but that’s for another post.
5 thoughts on “Measuring Fairness”