clock menu more-arrow no yes

Filed under:

A look into the Matrix; or how data nerds see the numbers

New, comments

A look behind the veil of stats-based analysis

One of the widest chasms in the realm of professional sports is the one between stats and the eye test. While the two sides can pass large chunks of time coexisting peacefully, there’s hell to pay when they don’t agree. As a self-professed data nerd and “person who writes things he’d like you to read” I felt it might be nice to take a minute during the slow time of the year to show you how I use data to show how smart and cool I am look at the world of football.

If it isn’t on late night, ditch the top ten

One of the main uses of stats is to try to quantify how far apart teams or players are away from each other. The NFL has long thrived on the concept of parity and “any given Sunday.” With that in mind it’s nice to be able to tell if your team is in the pack or not. And if there’s one thing that’s an unreliable measure of how good a team is at something, it’s a top ten list. Here’s a perfect example.

Here’s a chart of points scored per game (or “ppg”) in the 2018 regular season. The Atlanta Falcons are highlighted in red because they were the tenth-best team in this all-important metric. While it’s certainly better to be tenth best than eleventh or worse take a look at the Oakland Raiders. At 18.1 ppg they’re less than eight points behind Atlanta. Put another way, in an average game between the two teams it should only be decided by one score (touchdown and two-point conversion).

Still not convinced that the top ten isn’t super meaningful? The league average was 23.4 ppg. That means the tenth-best team in the league was only 2.5 ppg better than perfectly average—less than a field goal per game. There are 16 teams within a field goal of perfectly average. Further considering that cluster of 16 teams, there’s less than a touchdown difference between the top of the pile and the bottom.

Don’t get me wrong, a team’s standing in the top ten is still a good thing. The statistics-based view doesn’t negate that fact. Instead the statistics view tries to more precisely place against the rest of the league.

Defining moments

One of the bigger contributing factors in the age-old debate of “to math or not to math,” is the issue of defining measurements. Or rather, the lack of measurements being defined. This isn’t universal with statistics, but as a data nerd I know all too well that once we get going we try to tweak and refine measurements to improve accuracy. This leads us down rabbit holes that started at yards per game and ends in “ANY/A.” Let’s pick on myself for a moment.

This is an extreme example but represents what I mean quite well. What you’re looking at is a small slice of the penalty-harm workbook. Information is pulled from the official play-by-play (“time” column to “information on the play”) into the columns to the right. The harm is automatically calculated. And this is only the data-dump page. The entirety of this data is pulled into pivot tables, which allows me to adjust the information I want to see. Penalty harm by phase of game? You got it. Yards negated by player? Not a problem.

What can be a problem is that none of this is accessible. This page and many like it are never seen by the reader. The formula embedded in the cells is a ghost to everyone but me. I used to have a brief explanation in the penalty recap articles but it bogs down the narrative something fierce. So now you gotta ask if you don’t know how it’s calculated.

Which bring me to the main point: Never hesitate to ask. There are a lot of really good stats out there that need an explanation. The bigger beauty is that you then know what the statistic is really measuring...and also its limitations. There are always limitations.

One stat to rule them all, and on the scoreboard bind them

Here’s the most important lesson that, in my opinion, everyone needs to see from time to time. There’s only one important stat at the end of the day and it’s called “the score.” Every other statistic and metric should only be considered an explanation until the final result is achieved. Because of course we do, we have an example.

Looking at that chart is a lot like looking at a final score. No matter how you slice it, the Buffalo Bills were the seventh-worst team in sacks during the 2018 season. In case you were wondering, the average was 39—so they weren’t too far below. “Slightly worse than average” is not where you’d like to be, though, especially in a measurement we all understand to have a fairly significant impact on the game. Stat nerds love to jump in and tell you “well, actually...” on stuff like this. I know I’ve done it to many of you. I’d apologize, but I’m not sorry in the least.

So are the Bills bad at getting to the quarterback? Well, actually...they’re slightly above average, not below. Kind of. When it comes to sack efficiency, they’re just a tiny tick above the league average. How can they be above average yet obtain a below-average result, though? It’s because sacks are a team stat (like the score) while sack efficiency is purely a measurement of the defense and only on passing downs.

If a team is getting blown out and it leads to an opponent running the ball more, then it leads to less opportunities for a sack. Or if teams are burning clock, or they routinely have short fields to work against, it leads to less plays to defend and also less opportunities for a sack.

What it all really means is this: While the Buffalo Bills defense was on the field and facing passing situations they were alright at getting sacks. But as a team they really were kinda bad at it.

What all of this boils down to is that there are as many ways to enjoy the game as there are fans. Statistics can be an interesting bridge between what we see and the inner workings of the NFL. If you’re into that sort of thing, like I am, consider this particular article as an invitation to come explore with me.