clock menu more-arrow no yes mobile

Filed under:

Analytics and the NFL in 2016, part four: Improving our data collection

How can we improve our methods for statistically evaluating NFL players with more modern techniques?

If you buy something from an SB Nation link, Vox Media may earn a commission. See our ethics statement.

Football is a sport of small sample sizes. Or is it? True, there are only sixteen games in a regular season. But if you consider things on a per-snap basis, a starter might have 900 or 1100 plays accrued over the course of the year. That compares favorably to baseball, where a top pitcher will face 800 to 1000 batters, or a healthy batsman will get 500 to 600 plate attempts. Basketball is a little different, with the average player attempting around 400 shots but a top 40 player attempting over 1000, but that's due to the nature of the league with small rosters and important superstars. Consider the points at which baseball statistics stabilize, and it's clear that certain measurements should indeed be able to properly evaluate football players - as long as we're smart about using them.

In other words, I think there is room to incorporate counting stats into football. There are two obvious challenges, of course. One is to figure out the impact of each of the 22 different players on each given play, even if one didn't even touch the ball. In some cases, it may just be impossible to quantify that value. Another is the fact that 1/16th of the data for a given season is collected on a single day, putting greater influence on individual injuries and matchups. It's not a perfect science, but I think we can find some trends. Let's talk about stats and some of the different NFL positions, ways we can evaluate them right now, and ways we could improve that evaluation with more informative stats.

In general: Splits

The absolute best thing that we can do right now to be more informed about the NFL is to begin recording split data for on-field tactics. Down and distance and point differential are a good start. We have information about certain time situations, such as the quarter and when there is less than two minutes remaining. We can do better.

ESPN has started recording splits based on the offensive formation, number of defenders on the defensive line, and personnel package, but their data is still incomplete. For example: ESPN's splits page for Tyrod Taylor differentiates between Shotgun, Split Backs, I-Formation, and Lone Setback. But that doesn't cover every scenario - what's the difference between shotgun with an empty set, and shotgun with a lone setback? We should standardize the categories here. First, split based on the quarterback's position for the snap (under center, pistol, shotgun). Then, split based on the number of players in the backfield, in-line, and out wide. That gives us a common, comprehensive way to understand the pre-snap look of an offense.

Second, standardize the personnel package data. ESPN can tell you the number of receivers and the number of tight ends, but those numbers aren't grouped together on their page. Make this easy: Use the double digit personnel numbering system (perhaps with an extra digit to represent extra offensive linemen):

This will, again, give us a common method to understand how different offenses are constructed. If it can be grouped by and compared against other split data, we'll have new ways to analyze situational stats.

Finally, we need to do the same thing for defensive data. Collect the personnel groupings, find out when a defense has three safeties or two down linemen or is running a straight 4-3. Track the number of defenders in the "box." If we wanted to get fancy, we could try to account for the type of coverage being employed (or at least man versus zone), but it's standard practice to disguise coverages these days, making it a challenge for the layman to accurately track it.

Imagine the leap forward we could see in terms of critical analysis of players. To discover that Tyrod Taylor is more accurate out of the shotgun, or that Jarvis Landry receives more targets when he's running routes against zone coverage. NFL teams might already be doing this, but it's high time that the media evaluators start this approach.


The bugaboo with NFL scouting is figuring out how to identify good quarterbacks. We have the "eye test" (which uses many different eyes to tell many different stories), we have counting stats (which aren't very good, I'll explain), and we have results-oriented opinions ("if a quarterback doesn't win in January, he's dead to me"). Every camp has some good points, and every camp thinks the other two are well-meaning uninformed sheep. So how do we reconcile these opinions with stats?

First and foremost, the existing counting stats don't work. At best, they can tell us over a whole season if a player had good results from his plays. At worst, they tell us someone is playing much better or worse than the truth.

Consider our old friend Ryan Fitzpatrick. Last year, Fitz had a career year with the Jets, and now he's holding out for a big contract. He only threw 15 interceptions against 26 touchdowns, but what the counting stats won't tell you is that Fitz had 15 additional passes that were "interceptable." In other words, they were thrown in the vicinity of a defender, who wasn't able to get his hands all the way around the ball for the pick. He also had seven touchdowns that were scored on passes thrown no further than two yards beyond the line of scrimmage - plays that ask a lot more of the receiver than the quarterback. Were it not for the providence of Chan Gailey's scheme and a dose of luck, he might have been benched somewhere on the way to a season with more interceptions than touchdowns.

Completion percentage largely rounds out after about 500 attempts, but in the span of a 35 attempt single game it fluctuates too much. 3 contested throws could be the difference between 60% completion and 50% completion. Touchdowns tell us only that points were scored, they can't explain if a pass was accurate or if the receiver did the hard work.

The work being done by Football Outsiders contributor Cian Fahey is putting us on the right track. We should be recording pass attempts, noting the overall location (accurate or not accurate) regardless of the outcome of the play. Passes that could be intercepted should be noted, and passes that should have been caught but were not should also be recorded. It introduces some judgement calls, but it also creates much more detail, giving us wonderful charts like this:

(Image courtesy of Football Outsiders)

With this, we can understand a detailed view of a quarterback. Where is he accurate? Where does he struggle to deliver the ball? Is he playing in an offense that gives him easy throws? This doesn't show the quarterback's ability to go through progressions or diagnose a defense, but it's an improvement over the status quo - especially if we start thinking about splitting the data on different variables.

Pro Football Focus is doing similar things when charting quarterbacks, receivers, and cornerbacks, but their charts are results-based (completion, incompletion, et cetera), so I put less stock in them.

(Image courtesy Pro Football Focus)

We'll continue the discussion soon with some stat suggestions for running backs, receivers, linemen, and defenders. What do you think is helpful for evaluating quarterbacks?