Yesterday we studied strikeouts/game and runs/game for teams and found a “correlation coefficient,” which is a number between -1 and 1. 1 means there is a perfect correlation, like the temperature in Fahrenheit and the temperature in Celsius. -1 means there is a perfectly negative correlation, like the amount I spend and my checking account balance. And 0 means there is no correlation, like the amount I spend and the temperature in Celsius.
Unfortunately, the number we found was not 1, 0 or -1. It was .54. So what does that mean?
It means it is somewhere in between. Our number shows that strikeouts aren’t everything, but it also shows that they’re something. Can we find something comparable?
We can if we look at a different set of correlations. The most obvious place to start is in another realm of baseball – hitting. If we do the same exercise – compare the runs per game a team scores to their basic stats for 150 recent teams, what kind of correlations do we see?
(The full results are at the bottom. Also, here is a link to the data.)
The strongest is what you might expect – OPS, which has a .96(!) correlation. In fact, it is this crazy high correlation that drives the interest in OPS. The stats which make up OPS – OBP and SLG – also have high correlations: .87 and .92, respectively.
The most widely used traditional stats for evaluating offense fall a little lower down the list. Batting average is .76. Home runs are .70. I even worked out HR/AB and HR/PA and they ranked a little lower: .67 and .65.
We still haven’t found the stats that have a correlation close to the .54 that K/9 has to runs given up by a pitching staff. The stats closest to that level are at-bats, walks and doubles, each of which has a correlation around .60. Converting the last one to a rate statistic, I find that doubles/at-bat has a correlation of .547, which is almost dead on.
So we might want to evaluate pitchers by strikeouts about the same way we evaluate batters by doubles. For instance, given a choice between two players, one who hits a lot of doubles, and one who doesn’t, we probably want the guy with the doubles. We also might mention how many doubles a player has as a data point to demonstrate that they have extra power. Doubles are far from a worthless item to track.
But here is probably what we wouldn’t do. We wouldn’t say a free agent is worthless because he was one of the worst at his position in doubles. We probably wouldn’t comb through an organization’s minor league affiliates and suggest that their hitting philosophy is messed up because none of their teams are hitting a lot of doubles. And we wouldn’t suggest that a team scoring lots of runs won’t be able to maintain its pace because it ranks dead last in the league in doubles.
Yet we do all of that when talking about strikeouts and pitching.
The bottom line is that there isn’t a real clean break point here. Strikeouts are important. They might even be the most important stat we can easily evaluate for pitchers, due to a combination of impact and predictability (though we haven’t studied the latter).
But it is exceptionally easy to get carried away with strikeouts, and I think most of the sabrmetric community has, including me. It may be time to step back and admit what we don’t know. And acknowledge that clean numbers aren’t always so clean.
Below is a complete list of the correlations we have found, both for hitting and for pitching. The hitting ones are compared to runs scored per game, while the pitching ones are compared to the runs given up per game.