One of my favorite web sites, FiveThirtyEight.com, runs a weekly feature by Oliver Roeder called the Riddler, which each week poses a couple of interesting (and usually difficult) questions in math, logic, and probability.
This week’s “Riddler Classic” question concerns tournament design. Given a competition in which the better player wins two-thirds of the time, and that you only care about maximizing the probability that that best player wins, how should your construct a blind-draw tournament with these rather severe constraints: four entrants and four total games; and five entrants and five total games.
(Note that this success criterion is the same as tourneygeek’s earliest measure of fairness (C). So we can relate this challenge to the frequent discussions of fairness (C) elsewhere on this site.)
Now, I’m unable to imagine any competition in which the better player wins two-thirds of the time regardless of the size of the skill differential between the two players. The best player ought to beat the worst player rather more often than they beat the next best. I can see why the Riddler doesn’t want to put its readers to the trouble of using a more realistic match model. But since I have one ready to use, I’m going to seek a solution using my simulator, in which skill levels are handled rather better. In deference to the question as posed, however, I’ll use a skill parameter of 2.4, which is about what’s required to give the better of any two randomly-chosen players a two-thirds chance to prevail.
My solution is below the fold – before looking at it, you might want to give it a shot yourself.
I was dead wrong! For the correct answer (or at least a better one), scroll down to the comment of Donald the Potholer. His “Page Ladder” brackets improve on the ones I found. For the 4/4 case he succeeds 49% of the time, and his 5/5 bracket succeeds 44.8%.
I’ll rewrite this post with new analysis after the official results have been announced.
Four Players
The Riddler notes that the solution for four players and three games is just the standard four player single-elimination bracket. This tourney succeeds when the best player wins both matches.
With the original assumption that the better player always wins two-thirds of the time, the success rate is (2/3)^2, or 44.4%. But with more realistic assumptions the success rate climbs a little. In the first round, the better player wins about 70.2% (rather than 66.7%) of the time because the best player enjoys a somewhat above average skill advantage over the others. This advantage diminishes in the second round, where a surviving best player has to play someone who’s already won a round, but the chances are still a little better: 68.4%. Thus the overall success rate is 48.0%.
Now, what can you do to improve this by playing another game? One extra game is certainly not enough to run any sort of double elimination. And it’s not even enough to turn any round into a best of three. So what can you do with one additional game?
Well, there’s one opponent the winner hasn’t faced. You could have them play. But this sounds like a bad idea. That fourth player is, on average, the worst of the four. If the winner prevails it might reassure you that you have found the best player, but if the best player loses it can only cloud the issue.
How about a rematch of the final? If you simply take the winner of the rematch as the tourney winner, you’ve made the match before it – the one that used to be the final – irrelevant. But how about this. Play a rematch of the final, and if the same player wins you’ll have more confidence that that’s really the best player. If the original winner loses, however, you now have two players with equal claims to the title – they’re one and one against each other, and one and zero against other players. So, in that case, you flip a coin to see who gets the trophy.
The sounds like it might work, but simulation shows that it doesn’t. The best team wins both of the last rounds 33.5% of the time, and splits the last two 28.7%. So, adding half of the splits, 14.35%, to the 33.5 outright wins we get 47.85% success, which is actually a little worse.
Now, if we had another game to play, things would be different. Then we could make the final best two out of three, which yields a success rate of 52.4%. So two additional games makes a difference, and more than half the time you wouldn’t even have to play the fifth game.
Let’s add one more game, a sixth. Now the obvious thing to do is to run a double elimination tourney (without a recharge round). But that doesn’t really help – the success rate is only 52.1%. There are a number of reasons to prefer a double elimination to a single elimination with a multi-game final, but maximizing the rudimentary version of fairness (C) is not among them.
Adding a seventh game would allow you to add a recharge round to the double elimination. Now you get 55.4% success. Another way to use six or seven games is to run a round robin, six games for the basic round, and one game in reserve to resolve some of the ties that round robins are notorious for.
You could also use those seven games to make the basic single elimination with a best-of-five final, which does pretty well at 55.0%. But it would seem odd to most people to lavish five games on the final when two players are eliminated by a single loss, so you’re likely to meet some strong fairness (A) objections.
Five Players
On to the second problem. What’s the best you can do with five players and five games?
As before, let’s start with the baseline provided by the straight single elimination tourney, which is an 8 bracket with three byes, requiring four games.
As might be expected, with five candidates the tourney is somewhat less likely to be won by the best team: 43.1%. Again, is there a way to make this better by adding a single game?
Maybe. Let’s try replaying the final again. But this time, we won’t flip a coin when they two last games are split. We know that the team on top of the bracket is, on average, slightly better than the team on the bottom. So instead of flipping a coin, we’ll give the trophy to that team. And it seem to work, at least a little, nudging success up to 44.0%.
So this time, the extra game can help, a little. And, as before, with another game, deployed to make the final a full best-of-three, there’s a more substantial boost, to 47.2%.
This success comes, however, at a pretty steep cost in terms of fairness (B). Fairness (B) is always an issue when some players get byes. Without the replayed final, the win expectations for the five entry lines are: {14.5%; 14.5%; 23.1%; 24.0%; 24.0%}, which comes to 25.3 for fairness (B). With the replayed final, the expectations are {20.2%; 20.2%; 32.8%; 13.4%; 13.4%}, which makes fairness (B) 39.7. You’d have to want that extra little bit of fairness (C) pretty badly to accept such a grossly unbalanced bracket.
For “extra credit”, Roeder invites us to generalize the question to tournaments with N players and M games. I’ve done a little of this, considering M’s of 5, 6, and 7 for four players, and 6 with five players. I think it’s pretty clear that there’s no simple formula in N and M the would predict optimal results. But one can make the general observation that success declines as N increases, and (usually) improves as M increases.
Perhaps I’ve gotten a little carried away by treating the Riddler’s challenge in the context of actual tournaments rather than ideal ones. I’m not sure that my answers are correct as to the original question is concerned. But perhaps this analysis does show that tournament design is a pretty darned complicated subject.
I think this is a pretty frustrating puzzle because my automatic (and strongly defensible) answer to the actual question will always be “play 2 games of a round-robin.” My greater concerns in tournament design are almost always logistics, fairness, incentive, and lunch menu.
I’ve done some analysis on the 4×4 as a classic Page playoff. The best I can tell is that an unseeded Page will still result in the best team winning 44.4% of the time.
LikeLike
Actually, there’s another option. Well, one that doesn’t involve adding a meaningless game: I call it a “Page Ladder”
Contestants 1 & 2 play the first game. The winner moves straight to the Finals, the Loser enters the ladder at the bottom, along with unlucky contestant #4. Loser is out, winner faces #3. The winner of that game goes to the Final with the winner of the 1st game. How I calculated that process’ rate, it ended up having a 45.68% chance of winning; an improvement of 1.24 percentage points over both Standard Page and Single Elim.
Fortunately, the 5-game 5 entrant problem is a bit more fair: The Page Ladder only yields a 39.2% chance. Adding a recharge for a Round 1 loser only has 38.5%, and for the Round 2 Loser a 39.5%. Having the loser of Round 1 Drop to a Single Elim Tourney with 3, 4, & 5 also yields a 39.5%.
The structure that I found to be the best is a Single Elim+King-of-the-Hill. You have 4 “regular” contenders facing off in a single-elim. The 2 that lose in Round 1 are out, the winners play each other. The entrant that wins 2 rounds goes straight to the Finals. The Loser of Round 2 goes up against the 5th player, the would-be “King of the Hill” for the other Finals slot. This yielded a 40.5% success rate.
Whether these are the answers to this riddle, we should see in about 36 hours. (I didn’t enter because I was only able to solve the 4-entrant question in time.) But once we do, I hope that you’ll have a good bit of analysis on this.
LikeLike
Bravo. I had a nagging feeling that I was missing something.
LikeLiked by 1 person
Metzgerism, you were on the right track with the Page playoff idea. DtP found the needed tweak. Without the earned 1 and 2 positions going to proven good players, you need to drop the loser to the bottom of the stack.
LikeLike
Unfortunately, I didn’t have the “optimal” solutions… from the perspective of trying to find the “Best” Entrant. As is his wont, Riddler posted the solutions in today’s column: https://fivethirtyeight.com/features/can-you-escape-this-enchanted-maze/ This is why I expected a “follow-up” post tomorrow or Sunday and not a “correction”.
There is still a bit of commentary to this, though: The cited 4-with-4 solution, improving on my solution by 1.2 percentage points, goes like this: 1: AvB, 2: W1vC, 3: W1vD, 4: W2vW3.
I bet you can see the problem from here: Excepting the case where W1 loses both Games 2 & 3, Game 4 is a Recharge Round. And, if A wins all of the first 3 games, then Game 4 would not be played. Which, while “optimal”, may not technically fit the definition of the challenge. My consideration was that all N games needed to be played; this solution violates that consideration.
I will leave the 5-with-5 solution to you. I will also direct you to the table that The Riddler cited for any combination of games (G<13) and Players (P<8) (Let's call this table the Friedman-Zieliński Table, after the authors): https://www2.stetson.edu/~efriedma/temp/tournaments/
Of particular note is that the 6-with-5 solution is NOT the standard 6-person single elim: They group 4 competitors on one side and put only 2 on the other side. A basic running of the numbers, however, suggests a commutability among byes for the purpose of "having the best competitor win" (without regard to the disposition of the other competitors): The numbers only set 4 contestants as needing to win 3 games, while the remainder only need to win 2, so theoretically, the standard 6-person Single Elim has the same chance as the grouped-bye variant. (I can see the sports media having a field day ridiculing these souls if they interpret their solution to mean that e.g. the 1 & 2 seeds in an NFL Conference should play each other in the Divisional Rounds rather than the winners of the Wild Card rounds.) On the other hand, there is something to be said for having the conditions for each entrant at a given stage match their respective opponent's condition as much as possible, though that condition still breaks down, albeit in the Final as opposed to the Semi-final in the standard version.
LikeLiked by 1 person
One thing I forgot to mention: The F-Z table only considers G and P. It does not consider the number of Rounds that can be played (R) nor the number of playable Sites (S), which are important practical limitations for tournament directors. USA Ultimate, for example, considers a Nine-Game-per-Team maximum over the course of a two-day tournament to be sacrosanct, codifying it as Theory #3 in its official Tournament Manual. This is a strict imposition on R, not considered by F-Z. As an example, the 4-with-4 solution requires 4 Rounds, whereas the 4-entrant Page Playoff would only require 3.
LikeLike
Agreed. But that’s just one of many practical considerations for real tournaments that aren’t part of the problem. But perhaps we should cut the Riddler some slack here – it’s a math problem, not a real tournament design problem.
LikeLike
Please be patient with me. There’s a lot to say about the matter, and just now I’m busy building new brackets for a tourney to be held soon, which gets priority. But rest assured, I’ll write a follow-up post, probably more than one, when I get the leisure.
I don’t think a solution should be disqualified because it may not use all the games. You could always add a meaningless game if you thought it necessary to pad the tourney to the proper number.
The NFL playoffs are a whole different matter because they’re seeded, and the answers to most of the problems would, I suspect, be different if you took seeding into account. I do take the point about trying to match the records of opponents, but what’s happening here is grouping byes – I suppose it’s the minimal example. And I’ve shown elsewhere in tedious detail that grouping byes impairs fairness, both (B) and (C). So the two may be equivalent with the weird favorite-always-wins-2/3 rule, but with more realistic assumptions the skill progression will be different for the two brackets, and that shows up in the results. It’s not a huge effect – the standard bracket is 40.3%, and the grouped bracket 39.9%. Perhaps the table chooses to do it that way because it’s easier to describe the structure in a small box, but from my point of view it’s just plain wrong.
LikeLike
That 6-in-5 solution actually makes a lot of sense, and my intention is to use something similar for 3*2n-sized brackets: pairwise inequity doesn’t occur until the final, after both teams have already played many games and there is no significant rest advantage. In the standard (NFL/CFL-style) 6-team bracket, that inequity occurs almost immediately, when it provides a strong rest advantage.
I would never do this as a playoff – the “standard” 6-team bracket provides strong incentives to teams with a high amount of elegance – but in a tournament setting I think it provides a pretty nice result.
LikeLike