In Bad Byes, I discussed the problems with grouping byes together in a bracket for a single-elimination tournament. Today, with the help of the newly-developed fairness (B) metric, I’ll extend that analysis with an examination of a double-elimination format.
As before, I’ll analyze a tournament with 24 entries. There are some alternatives I’ll look at some day, but the usual way to handle this situation is to use a 32 bracket with eight byes. The question is whether it’s better to concentrate those eight byes in one half of the bracket, or spread evenly through the bracket, guided perhaps by the seeding lines.
As with so many other tournament design issues, the decision pits one goal against another. If the byes are grouped, the players drawing the bye can begin play immediately – their second-round matches are against other players who drew byes, and so are ready to go.
How much time this saves is an open question. Starting a round earlier may help move things along if it happens that the slow matches that would otherwise hold up the tournament happen to be in the half of the bracket that starts early. It’s not as helpful as a bracket shift, which actually reduces the number of rounds that need to be played, and it may do no good at all if the slow matches that hold up the tournament happen to be in the part of the bracket that doesn’t start early.
Against this possible efficiency gain, one has to weight the fairness loss resulting from the severely un-balanced bracket. In a single elimination format, grouping the byes caused the fairness (C) statistic to drop from 2.167 to 2.128. And the fairness (C) metric is not particularly good at reflecting fairness (B) problems in the early rounds of a tournament. Applying the new fairness (B) measure shows the problem even more clearly. For the single-elimination, fairness (B) drops from 2.924 all the way to 0.971.
But perhaps things aren’t so bad for the grouped byes in a double-elimination format. The lower bracket offers players another path to victory, and perhaps that other path is less compromised by grouping the byes, so that the loss of fairness is mitigated.
Now, when testing an idea with my simulator, I have generally drawn up what I think is the best possible bracket that embodies the idea. I want optimum drops to reduce the number of repeat pairings, for example, and as much bracket balance as the idea allows. But frankly I don’t know how to draw a lower bracket to go with and upper where the byes are grouped. I know that it’s going to be harder to avoid repeat pairings, but I don’t know just how much I want to tinker with the overall balance to reduce the repeats. I can’t really take fairness as my guide because I’ve already decided, in the interest of efficiency, to accept a degree of imbalance that I’d otherwise avoid.
So I’ve decided to run this test not on one of my own brackets, but on one created by a friend. My friend is an experienced tournament director, and a very creative drawer of brackets. I don’t always (or even usually) agree with him about every detail, but I almost always learn something when I look at one of his designs. He is, for example, the person most responsible for introducing me to the shifted bracket, and I’ve become a big fan of shifted brackets.
Here are analyzed versions of the brackets I’m testing: 24groupcdupper, and 24groupcdlower. For comparison, I’ll use a design of my own that spreads the byes in the way I recommend: 24uppercd and 24lowercd.
There are a few things to note about my friend’s bracket. It’s basically a CD shift, and so I’ve chosen a CD shift to compare it against. In general, I prefer the ED shift, but CD makes sense in this case because my friend’s bracket was designed not as a full double elimination, but rather as a championship with consolation in which the loser of E1 doesn’t drop – that team is simply awarded second place in the championship. Since E1 is not dropping, the ED shift is unavailable, and choosing the CD makes perfect sense.
Another anomaly is in the details of the drops. I initially approached the problem of modeling this format as a matter of running a 32 bracket with eight byes. But this way the drops are done in this bracket can’t be accommodated by a 32 bracket – I would have to have used a 64 bracket with 40 byes, and my simulator wouldn’t do that without some signification changes. For that reason, I made a separate version of my simulator specifically for running this format. In my main simulator, I handle byes by putting in entrants with skill levels so low that they’ll never win a match. But in the bespoke version, I created only the lines I needed. As a result, the tally for the mean number of wins by skill rank does not compare directly, because it included byes as wins in the comparison version, but not in the other one.
The results? There are a number of interesting things to note on the analyzed brackets, but I’ll summarize the highlights:
- The grouped byes bracket was a bit lower on fairness (C): 2.880 as opposed to 2.910. I’m still getting used to the revised measure, but I have the sense that this is a fairly substantial difference;
- The grouped byes were awful on fairness (B): 1.963 as opposed to 4.928. On fairness (B), the grouped byes were even worse than a single elimination run on the spread byes, which came in at 2.924. When you contrive to make a double elimination less fair than a single elimination, you must be doing something wrong;
- The grouped bye bracket did lead to more repeat pairings. Eight of the grouped-bracket matches could be rematches, while that was true of only five on the comparison bracket. discounting the last two matches, which is only fair because they wouldn’t be played at all if the text bracket was used, as intended, for a consolation rather than a full double elimination, the grouped bracket averaged 0.639 repeats for the tournament, as opposed to 0.377 for the comparison bracket;
- The new fairness (B) statistic punishes only inequities occurring on starting lines, but in the test bracket there are some eye-popping differences elsewhere in the bracket, also. Look, for example, at the first round of the lower bracket, where all four quadrants are all different from each other, and the third quadrant is very different indeed.
All in all, the experiment shows again that grouping the byes, even in a double elimination, impairs the fairness of the bracket substantially. To my mind, you’d have to really, really, want to start the second round early to justify treating your players so unfairly.
20 thoughts on “More Bad Byes”
Can you compare against having the second round start immediately after by pairing every other instead of the whole side? So A1/A2 feeds to B1, two byes into B2, A3/A4 feeds to B3, and so on. And then possibly A1/A2 and A3/A4 feeds to B1 and B2, and A5/A6 and A7/A8 feed to B4 and B5.
Jeb Horton has a 96 player bracket that works as you suggested, Kevin.
That is an interesting idea. You would get everyone playing at the start of the tournament, but you’d limit the imbalance of the bracket by mixing everyone in by the third round. I’d expect a simulation would show a result between the two I looked at, but it would be good to know which is was closer to before deciding whether is was a sensible compromise or not.
I’ll have to rejigger the simulator a little to test the idea, and I’ll want to see how the standard CD-shift drops work – I may have to come up with new ones. It may take a while. But in honor of the blog’s first real comment I can scarcely refuse to do a little extra work – watch this space.
Dan–This is mostly a copy/paste from our email conversation
This post backgammoncentric.
To get the Fairness(B) numbers that you did, it seems to me that you had to give players who got a bye in the zero loss bracket credit for a win. I have no inside information on this. I would love to be corrected if I am wrong.
The highest winning percentage for anyone with at least 100 match wins since 2009 in USBGF rated events is 64% (Jonah Seewald 103-58). Mochy is 129-76 63%. In backgammon, I think that the top expected win rate is about 5/8. The best player in your simulations won the 24 player single elimination tournament 38.6% of the tournaments and won the double elimination tournament 45.8% of the time. This translates into a match win rate between 81% and 82%. This is not backgammon. If one wants to make a bracket for a 24 team double elimination basketball tournament, one has to worry more about strength of schedule in the 1 loss bracket. Since the better player in bg is generally only a small favorite, strength of schedule in the one loss bracket is a smaller consideration. The much more important consideration is making sure everyone plays the same number of total matches, within 1. Spacing out the byes in the zero loss portion of the bracket is more fair within the zero loss bracket, but by much less than your numbers suggest in a backgammon tournament.
I do the first round pairings in the consolation bracket a little differently now for the 24 Main/Consolation (This wasn’t really meant for double elimination). In bracket order from the top down in the consolation first round: B8 v A1 B5 v A4 B3 v A6 B2 v A7 B7 v A3 B1 v A8 B4 v A5 B6 v A2. This is a little better.
Back to single elimination with progressive consolation, not double elimination. We usually don’t get a round number of participants in a tournament. Let us say we get 21 players. We are using your 24 player bracket. Player #4 got a bye in the first round, then loses his first match. He then gets another bye in the consolation, as there is no A2 loser. With your bracket and a 21 player tourney, sometimes a 1-1 player and a 3-1 player are in the same round of the consolation, perhaps playing each other. Using the first round pairings above for my 24 (21) bracket, this will never, ever happen. To me, this pretty much blows away any other fairness considerations. When the byes play each other, we know that the B1, B2, B3, and B4 losers will have 0-1, not 1-1 records. In a byes-spaced-evenly tournament, all second round losers could have either a 0-1 or a 1-1 W-L record. This makes it impossible to avoid the 1-1 v 3-1 possibility in a byes-spaced-evenly tournament. I think that if you ran the simulation using a 21 player tournament with the first round pairings above in the byes-bunched-together bracket and win probabilities more suited to backgammon (no one wins more than 5/8 matches or loses more than 3/8), my best guess is that the byes-bunched-together bracket would get a better fairness (B) number than the byes-spaced-out bracket, even running it as a double elimination bracket. In saying this, I am assuming that players getting a bye (playing one less match overall and getting to the same round) would get credit for a win, and those getting double byes would get credit for 2 wins in the Fairness (B) metric. If not, then the Fairness (B) metric has limited usefulness.
And here’s the relevant part of my e-note:
I definitely take your point about the simulations yielding way too many wins for the best player. My intent in the blog as a whole is, as much as possible, to talk about tournament design in the abstract rather than design for particular kinds of competition, and so I settled on an equal balance of skill and luck as my base case. But I did note that there was more luck in some competitions than others, even mentioning that there would be a lot of luck in backgammon. One of the things I added to my model a while ago was the ability to specify a parameter for the amount of luck. Your information about the winning percentages of Seewald and Mochy should enable me to tune experiments with backgammon in mind – I’ll just boost the luck parameter until I’m finding the right distribution of wins, at least for the best players.
I realize that whenever you have more than 8 byes in my 24 bracket you run the risk of someone getting two byes, and that that’s highly undesirable. I haven’t worked out how it is that your revised drops take care of this, but I will.
I’m a little less confident that you are that the two-byes problem will swamp the bye grouping effect. But that’s the nice thing about having a simulator – once we have reasonable parameters, and ask it the right questions, we can look forward to actually answering the question.
I like to think that I’ve already got one reasonably solid result from my model. I expected to find that shifting the lower bracket to save a round would compromise fairness to some extent. Finding that it actually helps fairness (except in seeded tournaments) makes it much easier to recommend the approach whole-heartedly. Of course, there may well be more to learn here – finding that the shift effect on fairness becomes negative for seeded tournaments leads me to suspect that it would also become negative if the luck factor was sufficiently reduced.
Dan–I guess I’ll just reply here instead of email.
My bracket does not fix the extra bye in the consolation problem with 18 or 19 players (perhaps that is why I suggested a 21 bracket in a future sim). I did a 20 Main/Consolation bracket years ago that had a working consolation for 17 to 20 players. I don’t know where it is now. This conversation has me thinking that I really need to do a 20 bracket as well.
Also in your email response to my response–“The fairness (B) measure is much simpler than your surmise. It’s just based on the standard deviation of the winning percentage by bracket line for the lines that hold the starting position for players.” To me, this implies that players receiving a bye got no credit for getting wins with fairness (B).
This is right off the page 24uppercd.pdf
1 45.8 5.34
2 20.4 4.67
3 11.5 4.19
4 7.19 3.83
5 4.71 3.54
6 3.19 3.30
7 2.21 3.10
8 1.53 2.92
9 1.07 2.76
10 0.76 2.62
11 0.53 2.49
12 0.37 2.37
13 0.25 2.26
14 0.17 2.16
15 0.11 2.06
16 0.08 1.96
17 0.05 1.87
18 0.03 1.78
19 0.02 1.69
20 0.01 1.61
21 0.01 1.52
22 0.00 1.43
23 0.00 1.33
24 0.00 1.20
There were 10000000 double elimination tournament simulations. The first number in the line represents a player. The second line represents the % of times that player won the whole tournament. The third number in the line represents what you called “match wins”, Dan. In a 24 player double elimination tournament with no second final, there are a total of 46 matches played. If you add up the numbers in column 3 for all 24 players, you get 62 “match wins” out of 46 matches played. 62 would be the match win number if the 8 byes in the main and the 8 byes in the consolation each received credit for a match win. This chart is one reason why I thought that you were giving credit for byes.
The other reason is logic. A line in the above article states, “The grouped byes were awful on fairness (B): 1.963 as opposed to 4.928.” For now, let us assume that we are not giving anyone credit for byes with the fairness (B) statistic. With evenly spaced byes, the player with the bye would be playing a player who just won a match. On average, this player with the bye would be an average player, so that player would lose more than 50% of the time. With byes grouped together, players with byes play other players with byes. The win % for each spot in the bracket should stay closer to 50% over the 10000000 simulations. A 1.963 fairness (B) number for the byes-grouped-together says that the SD for the number of wins in each spot of the draw would be about .5 . Absolutely, positively, no way if byes don’t count for wins. A .5 SD is within the range I would expect if byes were credited as wins.
Dan–Either I misinterpreted your statement and byes really do count as wins, or something got disfigured somewhere.
Correction! I am a liar. I stated in the above response:
With your bracket and a 21 player tourney, sometimes a 1-1 player and a 3-1 player are in the same round of the consolation, perhaps playing each other. Using the first round pairings above for my 24 (21) bracket, this will never, ever happen.
Yes, the double bye situation can happen in my bracket with a 21 player field.If Player #11 with a bye starting play in the second round (match B6) would lose, then he would get another bye. The double bye situation is still significantly more likely if you use a spaced out bracket format. There is an easy switch that makes the double bye problem go away with exactly 21 players in my bracket.
I see the confusion. The fairness (B) statistic is calculated by looking at the win percentage from the entry lines on the bracket, not from the experience of individual players. I accumulate many statistics both by player number and by line – the numbers above and below the lines on the bracket are all tied to that particular line, while the numbers in the table in the lower right corner of the upper bracket are totals accumulated by player. At any rate, the numbers I’m calculating fairness (B) from are the twenty-four entry lines on the left of the bracket. For your bracket, reading from the top, they’re 4.876, 4.861, …, 4.892, 3.787, …, and 3.816. The thing I’m trying to show with the fairness (B) measure is the extent to which there players can get drawn to lines that, from the very beginning, have unequal chances of winning. That’s why a blind draw to a completely balanced bracket would yield a fairness measure of very nearly 100 – the only variation in winning chances from the starting lines would be the little bit or random variation that is still there even after 10M trials. It’s a line-based measure because I think of it as a characteristic of the way the bracket is drawn, not of the players. I don’t consider it a fairness problem that the best players win more often – to the contrary, the best players winning is exactly what leads to a high score for fairness (C).
You’re right that the match-win figures for my bracket players include the bye wins. That’s because the way I’ve programmed the simulator to handle byes is to fill the bye lines with phantom players that have such low skill levels that they couldn’t possibly win a match. Perhaps I should put some logic that excludes those games from the win tally – it would just be a line or two of code. But I’m not sure that’s the best thing to do, because it would also have an odd effect. It would make it look like players who get byes are at a disadvantage because (at least on my bracket) they face somewhat better competition in the first round they actually play. That wouldn’t be particularly important when you’re looking at a blind draw, though it would work to the advantage of the worst teams, who would not only have an equal chance at a upper bracket bye, but a better than equal chance of picking up a lower bracket bye. But in a seeded tournament where the higher ranked players get the byes, there would be a strange threshold at which smooth inverse relationship between skill level and match wins is broken when you get to the bye threshold.
The match-win totals by player for your bracket exclude the bye wins. That’s because, owing to the fact that you dropped two B’s into the same first round match in the lower bracket, I couldn’t run the bracket as if it were a 32 with eight byes. I would have to have run it as a 64 with 40 byes, and that would have meant that my phantom bye players would have actually won some matches against other phantom bye players, which would mess up various other things. So, to simulate your bracket, I made a version of my simulator in which the bye matches are not filled with cannon fodder, but simply don’t exist, and thus can’t be counted. I noted this anomaly in the writeup, but didn’t think it significant enough to warrant even more custom programming. Now that you’ve fixed the problem of having two B drops meet each other in the first round of the lower bracket, when I re-run the simulation with a much higher luck factor, I can run the simulation on your bracket the same way I did on mine, and there won’t be that unfortunate difference on the player tally. There should be no effect whatever on the fairness calculations, or on any other numbers.
Dan–I’m interested in which spots in the bracket got which average win numbers, Dan. Any way you could copy all 24 of them somewhere?
Hey Dan–I just figured something out. When you say, “win percentage” you are talking about the percentage of times that the player in that spot in the bracket won the WHOLE TOURNAMENT, not just how many MATCHES that spot in the bracket won. OK, game changer. Putting in a higher luck factor will make both brackets significantly worse as far as fairness (B) is concerned. If we were doing a 24 player single elimination coin flipping tournament, the fairness (B) number should end up at about .55 no matter where you put the byes. If you would do a 21 player double elimination bracket simulation, the higher you set the luck factor, the better the grouped-byes format will look in comparison with the spread-out-byes format.
Stupid autocorrect. “m i s f i g u r e d”, not “disfigured”.
I just belatedly came to the same conclusion.
I played with the parameters, and decided that setting my the factor to 3 and the elite threshold to 0 gave a reasonable approximation of backgammon results. That translates to the game being 75% luck and 25% skill, and assumes that only better-than-average players show up for the sort of backgammon tournaments we’re modeling.
Then I ran it on my own 24 DE, and sure enough fairness (B) tanked. So it seems you’re right that, as the luck factor increases, the thing that matters most is equalizing the number of rounds.
Perhaps this means that we should be looking for something along the lines of Joe Czapski’s Balanced Brackets from tournamentdesign.org. He seems to specialize in rather outlandish-looking brackets designed to equalize the number of rounds. I analyzed one of them (in the post “A Balanced Bracket”) and didn’t see much virtue in it from a fairness point of view, in part because his designs create the same kind of “walled gardens” as your grouped byes. But, as I told him at the time, I thought his designs might come into their own when I started taking byes into account.
Joe’s formats are so unlike what I had in mind when I built my simulator that it’s rather a pain in the neck for me to run them, but I can see that I need to revisit them.
I did some fairness (B) calculations assuming a coin flipping tournament. First, I recalculated the 24 player single elimination and got a fairness (B) number of .66014 this time. Sorry about the corrections. Then–a 21 player coin flipping tournament. I compared 3 different formats. First–the 11 byes grouped at the top, with the first round of the consolation as follows:B8 v A1 B5 v A4 B3 v A6 B2 v A7 B7 v A3 B1 v A8 B4 v A5 B6 v A2. Obviously A1, A2, and A3 do not exist, so the opponents get byes. The fairness (B) number here is .81270 . The second bracket I tested also had the 11 byes grouped together at the top, with a small switch in the first round of the consolation:B8 v A1 B5 v A4 B3 v A6 B2 v A7 B7 v B6 B1 v A8 B4 v A5 A3 v A2. A1, A2 and A3 still don’t exist, so they create byes in the consolation. The fairness (B) number for this one is .83753 . The third bracket tested was Dan’s byes evenly spaced out bracket. The fairness (B) number for this one is .76212 . As expected, the byes grouped together formats get better fairness (b) numbers in a coin flipping tournament. In contests where the favorite is usually a bigger favorite, the spread out bye format will get better fairness (B) numbers. At some point, there is a crossover where fairness (B) numbers are about the same with either type of bracket. I am not smart enough to figure out exactly where that point is.
The 21 player tournaments are double elimination tournaments.