As promised in yesterday’s post, I ran simulations to see if the 32 bracket would behave the way the 16 bracket did, showing a fairness advantage for the unshifted format where the tournament is seeded. It did.
A number of these bracket simulations have been run now, and I thought it would be good to gather them together in one place so that they can be compared, and tentative inferences drawn.
Here is a table that shows the number of rounds, coefficient of fairness, and the mean number of bad repeat pairings for all combinations of these factors: bracket size (16 or 32); single elimination or double elimination, shifted or unshifted (and, for 32s, which shift was used); and blind draw or seeded:
format

rounds

fairness

duplicates


16 SE BD

4

8.73

0

16 SE seeded

4

10.99

0

32 SE BD

5

9.01

0

32 SE seeded

5

11.13

0

16 DE unshifted, BD

8

10.86

0.42

16 DE shifted, BD

7

10.97

0.40

16 DE unshifted, seeded

7

12.53

0.30

16 DE shifted, seeded

7

12.28

0.39

32 DE unshifted, BD 
10

10.82

0.48

32 DE CD shift, BD

9

10.85

0.46

32 DE ED shift, BD

9

11.03

0.53

32 DE unshifted, seeded

10

12.60

0.46

32 DE CD shift, seeded

9

12.47

0.50

32 DE ED shift, seeded

9

12.54

0.63

SE = single elimination; DE = double elimination; BD = blind draw. The duplicates count is the average number of times, per tournament, a repeated pairing happens before the last two rounds.
As might be expected, a larger bracket, seeding, and double rather than single elimination are all associated with greater fairness. Shifted blind draw tournaments are fairer than unshifted ones, but where the tournament was seeded the unshifted brackets outperformed the shifted ones. The ED shift outperformed the CD shift on fairness, but not on duplicate reduction.
The frequency of duplicates varies a good deal from one context to another. They don’t happen at all, of course, in single elimination tournaments. All of the results here are for tournaments with optimal drops, which means that duplicates are considerable rarer than they are for most of the tournaments run in the real world. I haven’t include any brackets with bad drops or any gratuitous imbalance in structure.
This table just scratches the surface of what there is to be discovered. Here are some of the limitations, and suggestions of where I might go in the future. Some of these are reminders, others new ideas:
 All of the simulations assumed a full 16 or 32team bracket. I’ll explore the effect of byes, both at to their number and their placement later. My guess is that how one handles byes is at least as important as any other factor.
 The competition model is rather naive. I’ve arbitrarily chosen to weigh luck and skill equally in the result of individual matches, which won’t fit most competitions very well – there should be more chance, for example, in baseball or backgammon, and less in chess or American football.
 Some time soon I plan to rework the model using Gaussian rather than uniform random factors. While I’m at it, I’ll parameterize the skill/luck balance to let me more closely approximate the results of different kinds of real events.
 None of the double elimination brackets required the survivor of the lower bracket to defeat the upper bracket champion twice to win the tournament. I expect that the possibility of this extra round would have a small, but not trivial, positive effect on measured fairness for all of the double elimination formats.
 The fairness statistic is strictly fairness (C), measuring only the tendency to reward superior skill. And in these calculations, the only reward considered was outright tournament wins. There’s obviously much more to do to make sure that tournaments are fair even in the eyes of players who have no realistic hope of winning outright.
 The bracket shifts included in the models don’t exhaust the possibilities. You may think they look exotic, but if so, you ain’t seen nothing yet.
 And, of course, we haven’t even begun to investigate other kinds of tournaments – round robins, Swiss systems, and various hybrids. And we haven’t begun to consider competitions where players compete against a standard, or against more than one opponent at a time, as often happens in such sports as athletics, golf, and skiing.
One thought on “The Results So Far”