I’ve been thinking a lot about baseball player projection system lately, which is just a fancy way of saying I’ve been sick and unable to play 3d video games without feeling nauseous or write creatively so my brain had to go off and do something dumb. A few days ago, Fangraphs published Dan Szymborski’s 2015 ZiPS projections for the St. Louis Cardinals. If you follow enough of Cardinals/sabermetric twitter you know that Szymborski took issue with a particular Cardinal blogger who questioned the necessity of these projections and made some fundamental mistakes regarding the ZiPS process. Piling on Cardinals fans is a national pastime for some reasons we don’t bring on ourselves (the media’s terrible Best Fans in Baseball Narrative) and some reasons we do bring on ourselves (I can’t even look at Cincinnati on the map without muttering “kiss the rings”) so the blog post was passed around, ridiculed, and pulled.
Social media drama is the last thing I ever want to care about, but the argument got me thinking. First off, I respect all the hard-as-hell mathematical work that goes into developing projections. I couldn’t do it. I wouldn’t even know where to start learning how to do it. Second, there is clearly an audience for projections–as demonstrated by the anticipation leading up to the ZiPS reveal. So it’s cool someone is putting in the hard work.
But what do good projections really tell us? I haven’t been able to answer that question and it has stuck with me through a haze of cold medicine. This isn’t just about ZiPS, or STEAMER, or PECOTA, or any projection system in particular but about the concept of projection systems in general. What information are they providing?
“They give us a better idea of how players will perform!” you are shouting at your screen while you add my name to a list that includes Murray Chass. And you’re right, that’s exactly what they do. Maybe.
Bear with me on a thought experiment (groan, I know) while I consider two hypothetical projection systems: the PERFECT system and the BEST system.
The PERFECT system: The PERFECT system correctly and accurately projects player performance. As indicated by its name, it gets nothing wrong. In December of 2013, the PERFECT system projected Matt Carpenter to get 709 plate appearances and a .272/.375/.375 triple slash. How does it do this? I dunno. Let’s say that, to borrow heavily from the film Interstellar, it tracks batted balls in the future by the minute changes in gravitational fields as they travel backwards through space-time.
Now, the PERFECT system trivializes baseball in lot of ways and calls into question important concepts like free will and predestination. But it also does one thing really well: it’s the only projection system in the goddamn universe that predicts Allen Craig will have a .266 BAbip in 2014. It’s also the only one that sees Pat Neshek coming. The really weird stuff–the stuff that has a lot of value to predict–is only truly caught by the PERFECT system.
“That’s not fair!” you reply and move my name above that of noted blogger Murray Chass. “You can’t compare projection systems to literally seeing into the future!” Well I can because this is the internet and on the internet you can advocate for things as crazy as seceding New Hampshire from the Union or SEGA producing Shenmue 3. Also, I need something to compare with the next system.
The BEST system: The BEST system is a bit more realistic. This projection model is top-of-the-line. Using all that math I don’t understand, it provides the most precise predictions possible without any knowledge of the future. I think we can all admit that (Interstellar notwithstanding) there is no way to measure all the random shit that happens in a baseball season. And as someone who watched the Cardinals bat .330 for an entire season with RISP, I know for a fact that the sample size of an entire season isn’t enough to weed out all that random shit.
What the BEST system does, however, is successfully weed out all the random shit in the past stats, and uses that to provide an exquisite shit-free stat line for every player in the upcoming season. The BEST system is so good, its creators boast, that if the 2015 season were to be played 1000 times and the results averaged together, the numbers would be exactly what the BEST system projected for them. This seems like a crazy boast, but the cast of the television series Sliders (which is still running in at least one universe) confirms that it is true. The BEST system is just that good.
Every year, when you run the numbers, the BEST system is going to be named the most accurate projection system. In aggregate, that will be true. But what about each individual player? Sure, the BEST system will be the system most likely to come the closest to the real numbers. But, by design, it will staunchly be unable to identify an outlier. That’s not a bug. It’s a feature of a good projection system.
Remember how I said that playing the season 1000 times would result in averages that equal the BEST system projections? And how great that was? The problem is that 1 of those seasons is going to give you the PERFECT system projections. And then the other 999 seasons are going to drag that pin-point accurate projection straight to the average.
What I’m saying is this: the problem with the BEST system is that it’s incredibly conservative. It will predict a decline from Allen Craig, yes, but not because it knows he will turn into a pumpkin It is because his 2013 was also an outlier. The BEST system will never predict a collapse. Similarly, it will look at everything about Pat Neshek and spit out some mediocre numbers, because of course it will. No one could have seen that coming (and no one should be expected to).
This conservative nature is the problem with any good projection system, because conservative predictions aren’t terribly interesting. With the exception of minor leaguers, the BEST system as described above isn’t going to tell you a lot you couldn’t glean from a glance at the player’s age and MLB stat history. Which is a shame, because developing something like the BEST system that is so (on aggregate) accurate would be an incredible mathematical achievement. It just wouldn’t tell us anything about current MLB players.
This is why the really fascinating stuff in the ZiPS projections for the Cardinals isn’t, say, Matt Holliday’s numbers or Adam Wainwright’s numbers. Someone taking a wild guess or simming the year in MLB: The Show could come up with a triple slash of .275/.348/.456 slash line for Holliday. I don’t mean this a an insult to ZiPS, which of course is way more work than that, and will be more accurate for more players. But a conservative prediction that Matt Holliday will continue a gradual decline is, well, not exactly a revelation. And any good projection system will likely come to a similar, conservative result.
The interesting stuff in the ZiPS are projections from guys like Ty Kelly (.254/.333/.358) or Samuel Tuivailala (3.29 ERA, 28.3 K%). Kelly is a journeyman utility infielder with no MLB time projected to be about as good as Kolten Wong. Tuivailala is a converted position player who rocketed through the system in two years on the strength of a 99 mph fastball. Obviously, a system that identifies guys like these who can be immediately productive at the MLB level would be very valuable. Maybe the BEST system as described above would do that, but the problem is that these projections–which are truly interesting, and the reason I like looking at ZiPS–are the most difficult to verify as reliable. Kelly’s numbers are based on the idea he receives 550+ PAs and god help the St. Louis Cardinals if injuries force the team into that situation.
While I like to look at projections and I respect the hell out of the work that goes into them, I’m sympathetic to the argument baseball old-timers put forward that they are meaningless. The more accurate a projection system gets, the less it tells us that we didn’t already know.
Of course, projection systems published on the internet are mostly created to give us something to talk about in the off-season and I just wrote 1000+ words about them. So maybe I’ve already lost any argument I was trying to put forth.