Here’s another first draft of a section from the potential monograph, Tourneygeek’s Guide to Tournaments: FairnessC.
As before I’m linking to a PDF rather than putting the text here because I use some formatting that won’t work well as a blog post.
I think that fairness (C) is one of the more important concepts, and hope you will find this new explanation a good deal clearer than the one that was initially floated early in tourneygeek’s run, and added to in fits and starts since.
5 thoughts on “TGT: Fairness (C)”
1. Would a Contest with a Elite Threshold equal to or greater than 0 (or a contest with an “anti-Elite” threshold less than or equal to 0) follow some sort of chi-squared distribution or a derivative thereof (e.g F)?
2. I’m thinking that an “Ugly Bottom” could be mitigated by testing a tournament against a pay structure based on a Harmonic Sequence: All places are paid, but each individual place pays more than the place below it; places of equivalent level of elimination (e.g. the Quarterfinal losers of a single-elim being 5th-8th) are averaged together.
A Harmonic Pay Structure should be used as a diagnostic tool only, and is not intended to be utilized as an actual tournament pay structure. Its sole purpose is to gauge how well a bracket does in maintaining its (theoretical) skill level throughout.
Alternately, you could base the diagnostic “payout” for each position on the proportion of the expected place’s “skill quotient” to that of the Field. For any given entrant i, total entrants n, skill level Z(i), such that Z(1)>=Z(2)>=Z(3)>=…>=Z(n):
Each Prize Point P(i) = [Z(i)-Z(n)]/[sigma(j=1->n):(Z(j)-Z(n))]*100.
Again, places of equivalent level of elimination are averaged together.
This averaging of the payouts of equivalent places seems to justify the practice of Tiered Seeding utilized by ATP: If there is no difference in the result between 33rd and 64th in a 64-team tourney, why should there be a difference in where they’re placed (and, theoretically, who they lose to in the 1st round)?
I brought back my quick mocked-up simulator and I’m receiving different (though not a very large difference) fairness(C) values than you in a couple cases. I’ll check it over, but in a 4-player bracket there’s not much potential for error outside of a problem with the simulation itself.
I ran 2 million trials of 4/SE and received fairness(C) values of 17.72 (winner take all) and 19.51 (65/35). The margin of error on both of those is about 0.03. For ease of reference, your figures were 17.52 and 19.45, both outside that range and significantly in the former’s case.
I suspect that this discrepancy is caused by the programs themselves, either through slight rounding errors adding up to something significant or a faulty pseudorandom number generator. I’m using Python and relying on other people’s libraries, so it’s probably on my end, but it’s something I’ll keep in mind. Hopefully this doesn’t cause me too many sleepless nights. I’ll leave the number-crunching to the professionals.
Good catch. Thanks for keeping me honest.
I think I’ve found the problem, and it’s on my end.
My simulator has built in a feature that sets a floor and a ceiling for skill levels, which is useful for running “elite” simulations. For some reason (remember, I didn’t write this), it has defaults of 0 and +4 for the floor and ceiling, respectively, and I’ve gotten into the habit of just changing the floor to -4 when I wanted to sample the whole range.
Apparently there is a little bit of an effect from the parts of the tail I was clipping. Rerunning the 4SE WTA luck = 1, with the floor and ceiling at -10 and +10, I get 17.70 for 500,000 trials.
The differences are so small that I probably won’t bother to re-run everything I’ve done, but I will use the wider range going forward.
Strangely enough, when I was checking my code, I also found a slight error, and it was indeed with the pseudo-random number generator, as I had guessed. Random numbers greater than the mean were being generated very slightly more than numbers less than the mean. (In mathematical terms, they averaged about 0.500003 instead of 0.5.) I doubt it’s significant, but it’s always worth looking over code if something feels wrong.