# Repairing the Magic Mirror

In Magic Mirror on the Wall … I attempted to make use of a new formula for the fairness (C) statistic to determine what bracket configuration for an eight-team single-elimination tourney was the fairest of them all.

Some problems with the new measure became apparent, and after a little fruitless tinkering, it became apparent that the new measure was not fit for purpose. Revisiting the question of the fairest 8SE, we find that the defects in the bad measure weren’t just theoretical – they led to an incorrect result.

The problem with the new measure became apparent because it was producing ridiculous figures for individual trials where one or both of the components of the calculation, the expected aggregate skill value and the actual aggregate skill value, were negative. But the death knell sounded most clearly, I think, when it appeared that the new fairness (C) measure was sensitive to the absolute value of the skill levels of the players. A tourney with skill levels of {0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9} was judged less fair than a tourney with relative skill levels that were identical, but higher in absolute terms by a fixed amount, {2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9}.

This led us astray because it caused us to prefer an elite tourney, where all of the skill levels are drawn from the upper half of the gaussian distribution. This raised the mean absolute values of the skill levels from {-1.42, -0.85, -0.47, -0.15, 0.15, 0.47, 0.85, 1.42} to {0.14, 0.28, 0.44, 0.60, 0.80, 1.03, 1.31, 1.80}. By all rights, this should have sent fairness (C) down because the mean skill levels are closer together, which increases the influence of luck. But the new measure more than offset this effect by inflating fairness (C) because the absolute values of the mean skills were higher.

Using the good old measure, the standard unseeded conventional bracket scores 26.91. Add seeding and you get down to 14.30. And moving to a cascade bracket gets this down to 7.46. The true winner of the fairest of them all competition should have been the seeded cascade bracket, with luck = 1 and a 65/35 payout.

The management of the pageant regrets this scoring error, but must take action to correct it. Miss Cascade65 will rein as Miss SE8, and Miss CascadeWTA, as first runner-up, will act as Miss SE8 if Miss Cascade65 is unable for any reason to fulfill her obligations. Miss EliteCascade65, who is deposed through no fault of her own, has been retrospectively awarded the title of Miss Congeniality.

## 5 thoughts on “Repairing the Magic Mirror”

1. Winner-take-all scores better in fairness (C) than 65/35 when not seeded, but worse when seeded. This makes sense… I think.

In the conventional bracket, consider that this is single-elimination, giving no chance to escape from a bad draw. If the top two teams end up on the same side of the bracket, one of them cannot be paid, and the second-strongest team being screwed out of money hurts fairness (C). When the bracket is seeded, they can’t meet before the final, so this problem does not exist.

In the cascade65 unseeded, a random team instantly gets 35% of the payout (or 65% if they win, which will happen about one-fourth of the time) without doing anything, so of course it’s going to perform poorly. Conversely, when it’s seeded, that guaranteed payout goes to the top team.

I guess I’ll use this as an excuse to write a quick mock-up of the simulator and test a couple things myself. It’s probably not as robust as your version, though it should serve about the same purpose. (Your methodology isn’t fully explained on this site, so I made a few assumptions. All the numbers in this article match, so I’m assuming I’m fairly close.) I ran only 800k trials on each of these, so the “real” numbers may still be a couple hundredths of a point off, but it’s close enough.

If third place is paid, fairness (C) becomes even stronger for the same reason. Assuming 50/30/20 payouts and luck = 1, with a third place match if one is necessary:
Cascade, seeded: f(C) = 5.64 (Not much of a surprise. It’s strong for the same reason 65/35 is strong, as it guarantees payment to a top seed.)
Conventional, seeded: f(C) = 11.96 (More chances at the money = a lower chance for a top-2 team to end up not getting paid.)
Conventional, unseeded: f(C) = 23.22 (Actually loses to winner-take-all by a bit, but still does far better than 65/35.)

Finally, the conventional bracket with 50/25/25 payouts scores 21.88 (unseeded) and 11.72 (seeded). Assuming we’re giving the winner at least half the pot, this is the “fairest one of all.” Of course, you already showed a month ago that 50/25/25 is best when a consolation exists, so this is no surprise.

Like

2. I’m impressed.

I tried to recreate one of your runs, and got a slightly different result. Running an 8 cascade bracket, conventionally seeded, at luck = 1, with a 50/30/20 payout, I got a fairness (C) of 7.19.

That’s not to say that your simulator is wrong – the most exciting thing for me about your having written a simulator is that you may be doing it differently for good reason. If we take the trouble to figure out the discrepancy, I may learn something that will improve mine. So perhaps it’s just as well that my methodology isn’t crystal clear. But I’ll be glad to work with you to figure out what’s going on it you care to.

I’ll try recreating some of your other runs when I get the leisure – probably tomorrow.

Like

1. Hmm… We received different results at 50/30/20 after receiving the same result at 65/35? This suggests that there’s something fundamentally different in our payout models.

Mine was an attempt to copy yours, which makes this somewhat strange.

If I were to hypothetically run the same setup with a 0/100 or 0/0/100 payout (paying only the runner-up or third place, respectively), fairness (C) would actually be a small negative, roughly -0.48 for second place and -0.44 for third. This is because of the chance of the first or second seeds receiving more money than they are “entitled” to, breaking the model.

If I weight the rankings by their fairness, plugging in the 11.74 I received for the winner…

(11.74 * 50) + (-0.48 * 30) + (-0.44 * 20) = 5.87 + -0.14 + -0.09 = 5.64

This isn’t the way I’m actually calculating fairness within my model, but it’s mathematically identical. If there is an error somewhere, it has to be before this.

Incidentally, for the 65/35, the math works out to the same result as your model gave you.

(11.74 * 65) + (-0.48 * 35) = 7.63 + -0.17 = 7.46

If the math is correct and the pattern holds, the third place finisher by themselves would have to have a positive fairness (C) of roughly 7.3, which is incredibly unlikely.

So that gives a few possibilities as to the similarities and differences in our models:

1. There’s something specifically wrong with paying out third place in one of our models. (Unlikely as, from what I can tell, our results also differ on 16 brackets, even when third place isn’t paid.)
2. One of them has a critical mistake and the results that are the same are because of luck. (Again unlikely.)
3. We’re calculating things differently. (This is probably what’s happening.)

I’d suggest comparing our code, but mine is currently a barely-readable mess since I wrote and debugged it over an afternoon.

Like

1. I just re-ran the three simulations of the 50/30/20 payout, and they all check out–I must have been doing something wrong last night.

For conventional, seeded, f(C) = 11.99;
For conventional, blind draw, f(C) = 23.25;
For the cascade, seeded, f(C) = 5.63.

My bad. I was running lots of differently-seeded cascade brackets for today’s post, and must have been not quite right when I reverted the seeds to test the new payout.

I don’t understand the way you’re calculating f(C), but the results look right (or, rather, they’re the results I get).

I’m really impressed, now, that you could knock out a version of the simulator – even a quick and dirty one – in an afternoon!

Liked by 1 person