A Stathead's Lament

I have a problem with analytics.

Maybe you do too.

My problem is a lack of data. You may be thinking, "How can there be a lack of data? ESPN can tell us all of the quarterbacks to throw a touchdown with 2 minutes left in the game while eating a cheeseburger!" However, JaMarcus Russell and Jared Lorenzen aren't exactly what I'm looking for.

The problem isn't a lack of data - it's a lack of the right data. I think that football has a market inefficiency, but I don't know what it is. Time and again I see arguments based on the same stats - yards, touchdowns, interceptions, sacks. But we heard the same thing in baseball for a hundred years with Batting Average, RBIs, and ERA leading the way. And now we know better.

There are some contenders out there - stats that people are claiming solve the problem of analytically deciding who the good football players are. But I haven't bought in yet. I don't think the data they're pulling from is good enough.

Let's start with the contender for "best Moneyball stat" that I have the biggest issue with: Pro Football Focus's plus-minus system, and their "signature stats" like Elusive Rating. For the uninitiated, PFF has a two-pronged approach to analytics. The plus-minus system consists of the PFF analysts watching the full game tape for every player, guessing what the player was supposed to do during a play, deciding if he did it, and giving him a positive or a negative score for that play. Add them up, and you get an overall score that tells you exactly how good he is relative to all other players.

It's a great system in theory. But it's not really analytics - there are no statistics involved. The entire process is a judgement call on the part of the analyst. If a QB throws the ball and there's no receiver there and it's intercepted, whose fault is it? Did the WR run the wrong route, or the QB anticipate the wrong route? PFF doesn't know, and likewise they have no way to know whether a corner got beat deep or was simply passing on a player to the safety in a zone who never showed up.
Meanwhile, while the idea to track specific stats is admirable, I haven't seen any analysis by PFF to explain exactly HOW they came up with the idea for calculating stats like Elusive Rating. The fact that CJ Spiller was scoring an Elusive Rating in the 400's midseason while PFF expected their scale to range between 0 and 100 tells us two things: One, Spiller is freaking awesome, and two, not very rigorous testing was done on the formula.

Another contender for Moneyball is Football Outsiders, with their stat DVOA (Defense-adjusted Value Over Average). I like it, but I also don't like it. The idea of DVOA is to sort of replicate the concept of Wins Above Replacement (WAR) in baseball, with a stat that judges how good a player's performance is relative to what the league average in a similar situation would be - adjusted for the differences in defense faced. Why is it nice? It's formed on real math, which I like. They have a good explanation of their system on the site. It also takes a pre-existing stat (yards) and adds another dimension (down and distance, time left) to get a more pure result. Great!

However, it only helps us for a team as a whole. There is still no way to know whether a running back who gains 8 yards on 3rd and 6 was running behind a beefy line or doing it all by himself on a desperation ploy. You can claim to have an idea of the value a running back or quarterback brings to the offense, and over time you might have something. But the stat mostly leaves the individual defenders and linemen out of the equation - it's incomplete.

Speaking of sites that try to keep track of the "expected return" on any given play, Advanced NFL Stats has been developing a stat based on their accumulation of "Win Probabilities." The basic idea is that every game has an outcome, and every play in that game increases or decreases the chances of one team winning. Build a big enough database, and you can know exactly how much that 4 yard gain in the 3rd quarter increased your chance of victory. The two stats, Win Probability Added and Expected Points Added, seek to estimate how helpful a player is over the course of a season. Again, this sounds like a great idea, in theory. But it mostly is incomplete, due to the lack of charting. How do we know what contributed to a linebacker's tackle? We need more data.

So this is the problem I've been running into:

  • I love the idea of Moneyball. Discover something undervalued and maximize its effectiveness. Trust the stat sheet, not what a workout or the film says. Choose a reliable prospect over the boom-or-bust. Research new ideas and strategies, even if everyone else says they're crazy.
  • I don't like the currently available stats, mostly stemming from a lack of charting. While yards and TDs are how you score points, without context they're useless. I need context! Game situation! Timing! Direction! What defenders were involved! The more info, the better.
  • I WANT to see some good stats - the dream is being able to look at a spreadsheet of college players, pick a name out of it, and say, "This is the running back we want to draft. He may have gone to Albuquerque University, but his 'xxxxxx' is the highest in the class, adjusted for competition."
  • I don't like how the stats we have aren't able to isolate the individual performances. For example, a running back, judged on TDs and YPC, will always be held back by a coach taking him out in goalline situations or a bad line, respectively. A quarterback, judged on completion%, will be held back by receivers with butterfingers.

Here are some general thoughts I've had on exploring different analytical ways to measure a player's abilities:

  • For running backs: There are two areas that I think are helpful for a running back - consistent success (the back who always gets positive yardage) and explosiveness (the back who can break open an 80 yard run). I think Yards After Contact (or something involving the amount of broken tackles) is a good place to start. Along with that, some sort of stat that compares frequency of long runs to all runs.
  • For quarterbacks: I think Yards Per Attempt is actually one of the strongest stats we have available today. There's not a lot of complexity behind it, but it tends to correlate very strongly with good QBs. When I'm looking at a good QB, the things I want most are the ability to consistently gain yards, and the ability to limit turnovers. YPA tends to handle the first one very well - tweaking the stat might help it take into account the second.
  • For linemen: I don't know where to begin on that one. That is where the challenge of game charting comes in - to grade linemen analytically, you need to know in data form what they're doing on the field in each play.

What thoughts do you have on stats in general? Any stats you do like? Don't like? Do you believe in Moneyball?

Just another great fan opinion shared on the pages of