The goal of any statistical model is to represent events in a formal, mathematical way — ideally, with a few relatively simple mathematical functions. Simpler is usually better when it comes to model-building. That doesn’t really work, however, in the case of the College Football Playoff’s selection committee, the group tasked with picking the nation’s four best teams at the end of each season. As you might imagine from a bunch of former coaches and college-administration types, they can sometimes resist the clean logic that an algorithm would love to impose. So while we’ve found that our model can do a reasonably good job of anticipating their decisions, it has to account for the group behaving in somewhat complicated ways.
That’s one of the challenges our College Football Playoff forecast faces, but one of the fun parts, too. Unlike our other prediction models, which only really try to predict the outcomes of games, it also tries to predict the behavior of the humans on the selection committee instead. Here’s a rundown of how we go about doing that.
The key characteristics of the model are that it’s iterative and probabilistic. It’s iterative in that it simulates the rest of the college season one game (and one week) at a time, instead of jumping directly from the current playoff committee standings to national championship chances. And it’s probabilistic in that it aims to account for the considerable uncertainty in the playoff picture, both in terms of how the games will turn out and in how the humans on the selection committee might react to them.
Games are simulated mostly using ESPN’s Football Power Index. We say “mostly” because we’ve also found that giving a little weight to the playoff committee’s weekly rankings of the top 25 teams helps add to the predictions’ accuracy. (We use the Associated Press Top 25 poll as a proxy for the committee’s rankings until the first set of rankings is released in the second half of the season.) Specifically, the model’s game-by-game forecasts are based on a combination of FPI ratings and committee (or AP) rankings — 75 percent on FPI and 25 percent on the rankings.5
In many ways, that’s the simple part. While predicting games isn’t always the easiest endeavor, there’s a science to it that we’ve applied across our many sports interactives over the years. But the next part, the process of predicting the human committee, is unique to our college football model.
After each set of simulated games, our system begins to guess how the committee will handle those results. These predictions account for the potential margin of victory in each game and for the fact that some wins and losses matter more than others. To assist with this part of the process, alongside a separate formula based simply on wins and losses, we use a version of our old friend the Elo rating. In other sports, we use Elo to help predict the games, but in this case, we mainly rely on it to model how college football’s powers that be tend to react to which teams won and how they did it. This special version of Elo is designed to try to mimic the committee’s behavior.
We’ve calculated these Elo ratings back to the 1869 college football season. Between each season, ratings are reverted partly to the mean, to account for roster turnover and so forth. We revert teams to the mean of all teams in their conference, rather than to the mean of all Football Bowl Subdivision teams. Thus, teams from the Power Five conferences6 — especially the SEC — start out with a higher default rating.7 As a consequence of this, our system also gives teams from power conferences more advantages, because that’s how human voters tend to see them.
This conference-centric approach both yields more accurate predictions of game results and better mimics how committee and AP voters rank the teams. For better or worse, teams from non-power conferences (except Notre Dame, that special snowflake among independents) rarely got the benefit of the doubt under the old BCS system, and that’s been the case under the selection committee as well.
Some of the model’s complexity comes in trying to model when the selection committee might choose to break its own seemingly established rules. For example, we discovered in 2014 — when the committee excluded TCU from the playoff even though the Horned Frogs held the No. 3 spot in the committee’s penultimate rankings and won their final game by 52 points — that the committee isn’t always consistent from week to week. Instead, it can re-evaluate the evidence as it goes. For example, if the committee has an 8-0 team ranked behind a 7-1 team, there’s a reasonable chance that the 8-0 team will leapfrog the other in the next set of rankings even if both teams win their next game in equally impressive fashion. That’s because the committee defaults toward looking mostly at wins and losses among power conference teams while putting some emphasis on strength of schedule and less on margin of victory or “game control.”
We’ve had to add other wrinkles to the system over the years. Before the 2015 season, for example, we added a bonus for teams that win their conference championships, since the committee explicitly says that it accounts for conference championships in its rankings (although exactly how much it weights them is difficult to say).8 And late in 2016, we added an adjustment for head-to-head results, another factor that the committee explicitly says it considers. If two teams have roughly equal résumés but one of them won a head-to-head matchup earlier in the season, it’s a reasonably safe bet that the winner will end up ranked higher.
Still, there are no guarantees. Not only do we account for the uncertainty in the results of the games themselves, but we also account for the error in how accurately we can predict the committee’s ratings. Because the potential for error is greater the further you are from the playoff, uncertainty is higher the earlier you are in the regular season. In early October, for example, as many as 15 or 20 teams will still belong in the playoff “conversation.” That number will gradually be whittled down — probably to around five to seven teams before the committee releases its final rankings.
Nate Silver FiveThirtyEight’s founder and editor in chief.
1.5 Forecast updated for 2018 season.
1.4 Forecast published for 2017 season; game-by-game forecasts incorporate team rankings, power conferences given a boost, AP poll used before committee releases rankings.
1.3 Head-to-head results incorporated into model.
1.2 Forecast published for 2016 season.
1.1 Forecast published for 2015 season; conference champion bonus added, uncertainty increased.
1.0 College Football Playoff model first published for the 2014 season.