Thursday, August 7, 2008

Mistakes Per Inning Pitched

For a long time, I've believed that one of the most valuable statistics in evaluating a pitcher's performance was WHIP. WHIP gives you an idea of just how often a pitcher allows a runner to reach base, which is a great thing to know. There's just one thing that I never liked about it: WHIP uses hits as one of its factors.

It's a widely accepted belief among sabermeticians that once a ball is put in play the pitcher has little control of whether that ball drops for a hit or not. This led to the development of BABIP, or batting average on balls in play, by Voros McCracken in 1999.

Thus, my problem with WHIP. It's a good metric for evaluating how many baserunners a pitcher allows, but you can't really say that it's all the pitcher's fault. Because hits are one of the three factors in WHIP, a good portion of the eventual statistic is due to defense, not pitching.

Despite adjustments to statistics like ERA, which has several "defense independent" statistics like DICE and dERA, there's not really a "defense independent" version of WHIP, a statistic designed, more or less, to show how many baserunners a pitcher allows.

Earlier today when I was at work, I was thinking about baseball statistics, and WHIP in particular. I tried to think of a method that would be simple to caluclate and also produce a defense-independent statistic to help calculate the original intended purpose of WHIP: how many baserunners did a pitcher allow?

At first, what I came up with was exceedingly simple, and based on the formula of WHIP itself:

BB + HBP + HR/IP

This would, indeed, show exactly how many runners reached base independent of the defense. From there, you could subtract that total from the pitcher's WHIP to see how much defense had to do with the amount of baserunners allowed by a pitcher. In that sense, I have achieved the goal.

But that didn't seem quite right to me: obviously, a home run costs the team more than a walk. While a walk allows a baserunner, a home run always scores at least one run. I decided that in my final metric, I would sacrifice the amount of baserunners allowed for instead the number of total bases allowed by pitchers.

This, I believe, is even more important than simply how many baserunners are allowed. Given the choice, any batter would rather have a double, triple, or home run than a walk or a single, because it does more to help the team's quest to win the ballgame. (This is of course with the possible exception of Jason Giambi, who recently shaved his mustache because he hates the Yankees and all their fans.)

In essence, the new metric would find not only how many mistakes the pitcher made, but also the impact of those mistakes on the team. Thus came the name I came up with for the metric: Mistakes Per Inning Pitched. The new formula went as follows:

4(HR) + BB + HBP - IBB/IP

The reason for multiplying homers by four is that a home run nets four total bases. The reason for subtracting IBB is because not doing so wouldn't be true to the name of the metric. The pitcher didn't make a mistake, he meant to walk the batter.

The final result of the formula is the number of batters a pitcher makes a "mistake" pitching to—that is, allows to reach base without intending to do so—per inning pitched. It also puts weight on the impact of those baserunners—as I said, giving up a home run is worse than giving up a walk.

Ideally, a pitcher would want to have a higher K/IP rating than M/IP rating. This would suggest that the pitcher himself has done more to help the team than to hurt the team. As always, the notable exception exists for "trick" pitchers, ie knuckleballers and sinkerballers, as their goal is to induce slow ground balls and lazy flies rather than to get swing and misses, and so they can be successful without high K-rates.

Now, onto some examples:

  • In 2005, Josh Fogg had 0.903 mistakes per inning pitched and 0.501 strikeouts per inning pitched. Looking at this data, you would suspect it would be a rough year for Josh Fogg, and you'd be right: he gave up 27 home runs, walked 53 batters while striking out 85, and finished the year with a WHIP of 1.47 and an ERA of 5.05 in 169 1/3 IP.
  • Last year, Johan Santana allowed a reasonable 0.84 M/IP while posting a 1.073 K/IP line. This is, obviously, a pretty dramatic difference. As you all know, Johan Santana has had a pretty good career, including a good year last year. He did give up 33 home runs, which is what hurt his M/IP the most, but he also walked only 52 batters while striking out 235 in 219 IP. He finished with a 3.33 ERA and a 1.073 WHIP.
  • In 2006, Zach Duke had 0.636 M/IP and 0.543 K/IP. Despite being a pretty mistake-free pitcher, Duke didn't make many batters miss and had a pretty pedestrian year, finishing with a 4.47 ERA, 1.50 WHIP, 68 walks and only 117 strikeouts in 215.3 IP. He had an ERA+ of 99, or 1% worse than the average ERA that year.
When we use this pairing for relief pitchers, however, we find a slightly different story:

  • Last year Jonathan Papelbon, one of the league's finest closers by any metric, had 0.53 M/IP and 1.44 K/IP. He had a good year—if you're into saves, he had 37 of them. He also struck out 84 batters in only 58.3 IP, and he only walked 15 and gave up 5 home runs.
  • On the other side, Joe Borowski, who had a horrible year last year—I don't care if you have 45 saves or not, if you finish the year with a 5.07 ERA and a 1.43 WHIP, you didn't have a good year—had 0.77 M/IP but 0.88 K/IP. By the previous correlations, he should have had a pretty average year as far as other statistics go, and yet he was horrible.
The reason for this is that relief pitchers pitch fewer innings, so every strikeout counts more towards their eventual K/IP rating. Naturally, most relievers will have more strikeouts than mistakes just because they don't have many opportunities to screw up, and they are, after all, good enough to pitch in the majors.

A good reliever will have a larger gap between the two numbers than a good starter will. In the example above, Borowski only had a difference of 0.11, while Papelbon, the better pitcher, had a difference of 0.91. It appears that whereas for a starter K/IP is the more important part of this stat platooning, for a reliever the more important part is M/IP.

Of course, the metric can be used independently. At least so far in my research, it's done a pretty good job of analysing a pitcher's control. That alone is a good thing to know about a pitcher. However, I feel it is most effectively used in combination with K/IP.

As far as I know, this metric is new. I've searched the stats glossaries of numerous sites and publications, and I didn't see this formula in any of them. If it's not new, I'll gladly cop to that.

By the way, I promise you that I'm not living in my mother's basement.

UPDATE: There have been a few questions as to what relevance this metric has. For answers on that, see Oliver Perez: A Shining Example Of M/IP Holding Up On Its Own.

UPDATE 2: Some have asked why I don't include wild pitches in the metric. The reason for this is that it is largely up to the scorer to determine what is a wild pitch and what is a passed ball, placing the catcher at fault. Most pitchers only have about 2-5 wild pitches a year anyway, so it's pretty negligible in the final data.

10 comments:

Thomas said...

If this doesn't get Bill James' attention, nothing will. Good work, that was a really interesting post. And I feel like this is something that could legitimately become part of the official sabermetricians' lexicon.

Thomas said...

Oh, and Kevin Kennedy is a tool.

s1c said...

While I think this has merits, what does it add to the discussion that a look at WHIP, ERA+ or other present statistics provide.

Is a 3-0 fastball down the middle of the plate that gets crushed for a triple a mistake or is a 0-2 pitch that hangs and is doubled a mistake? To me, the one that is doubled is the mistake (the pitcher failed to execute his pitch) but the triple is just one of 6 possible outcomes and neither is figured into your result.

I like the idea, just not really sure if this is or should be the final result.

Nate said...

I will agree with you that it's not a perfect metric. But then: what is? If there was a perfect metric, we'd never have a need to look at anything else. Plus, I'm still thinking about this whole idea. This is just the version I have right now, and it may well get added to later. It's just one of those things that I knew if I didn't publish it, I would put it off and never do it.

As for the "what does it add to the discussion" question: it gives you an idea of a pitcher's overall control. WHIP, like I said, is faulty because hits are part of the metric, and therefore the blame can't be placed solely on the pitcher. ERA+ is simply ERA normalized for league average and ballpark factors, so it's not really addressing the same things as my metric. (Speaking of ERA+ and with a big small sample size alert attached, before Jeff Karstens makes his next start, head over to Baseball Reference and look at his infinite ERA+) What I originally set out to do was create a sort of defense-independent WHIP. As far as that goes, the original metric was sufficient. Like I said, though, I just felt that a home run and a walk are obviously very different outcomes.

Perhaps "mistake" is the wrong terminology for the naming of the metric. Indeed, a pitcher can make a mistake and have the result be a hit. However, a pitcher can also locate a pitch perfectly and still have it result in a hit, so counting all hits as "mistakes" is also faulty. The name is just the best I could come up with other than "Defense-independent WHIP," which makes no sense when you remember what the initials for WHIP stand for.

I appreciate your criticism. I've been waiting for somebody to poke holes in this so that I can possibly work on fine-tuning it.

s1c said...

Actually, I think it is a good metric idea and would add a lot to the discussion on pitchers, but I also think that maybe, while complicating the metric, taking a look at results in the hitter counts vs pitchers counts would need to be considered.

I think what I will try to do is see if I can run some numbers for an entire season this weekend and see how this adds to the pitcher discussion.

Tangotiger said...

Your equation is the same as DICE (or FIP) without the strikeout parameter.

Whether you use 4 and 1 as your coefficients or 12 and 3, that's the same thing. DICE and FIP use 13 and 3.

This proposal therefore is a "FIP lite".

Peter said...

Link to us!

Peter said...

I also think Nate might be on to a better formula, it just needs to be refined.

Nate said...

I noticed a good bit of similarity to DICE/FIP myself, actually. Like I said, I'm still working on this to see if I can improve it at all. I've received a few suggestions in my inbox that I'm going to take into consideration.

What I wonder is: is it such a bad thing that they're similar? When we remove strikeouts from the equation, it's because we're not trying to see how dominant a pitcher is, we're trying to see how "mistake free" he has been on average. In that sense, I think this metric has enough use on its own to use it in addition to DICE or FIP.

Tangotiger said...

It may be. For example, I also use szERA, which is 12-5.4*(K-BB)/PA.

Basically, I only look at the differential of K and walks.

You're looking only at BB and HR to give you some particular insight.

Somewhere in the blogosphere, someone is going to post something like:
(6*HR-K)/PA with various constants around it
and that will describe something else.

Clearly, the subset of something is always designed to show something in particular. It has its limited use, by design. That's neither a good nor bad thing.