SABR-Toothed Triber: You Down with FIP? (And BABIP!)

NFL News: New Josh Cribbs Contract Coming Soon

February 25, 2010

Twitter Explodes Between Cleveland and Adam Schefter Over LeBron

February 25, 2010

Published by Scott Sargent on February 25, 2010

“A great deal of what is perceived as being pitching is in fact defense.” – Bill James, 1988

Last week, we talked about wOBA and how it can help us measure a player’s offensive contributions better than many of the more conventional metrics. This week, we’ll try to tackle pitching, perhaps in a more roundabout manner.

Let me start with a story. The story begins in 2000 and involves a paralegal in Chicago with too much time on his hands. I swear this story has something to do with pitching statistics, and we’ll get to the Cleveland Indians pitching staff in due time. Bear with me…

Our paralegal’s name is Voros McCracken, and he liked to play fantasy baseball. He needed to know whom to draft for his team, but as he looked at the numbers, he couldn’t come up with a way to separate pitching statistics from fielding statistics and plain old dumb luck. And if he couldn’t do that, how could he predict which pitchers would perform well, and which would do poorly? Like I said, he had time on his hands, so he looked to see if a pitcher’s ERA did a good job of predicting future ERA, and he found a ton of variation. For example, when he looked at Greg Maddux, his ERA in 1998 was 2.22 but in 1999 it was 3.57. What gives? Was Maddux a great pitcher in ’98 but just a bit above average in ’99? Voros crunched the numbers, looked at splits, gnashed his teeth, but he couldn’t understand why Maddux allowed so many more hits (and runs) just a year after he had been so dominant. Then, Voros looked at other players, and the same thing happened. Kevin Millwood, Maddux’s teammate at the time, went from being fringe-average in ’98 (4.08 ERA) to a stud in ’99 (2.68 ERA). Weird.

And then Voros asked a critical question: What if a pitcher controls much less than we think? What if, in fact, the pitcher has essentially no control over whether a ball hit into the field of play is an out or not?

So he checked. First he gave Maddux another look. In ’98, when opposing hitters put the ball in play against him, they batted .272. In ’99, they batted .334. Then Voros looked at Kevin Millwood. In ’98, Millwood’s batting average on balls in play (BABIP) was .323 but in ’99 it was .247. In front of the exact same Atlanta Braves defense!

This was, suffice it to say, interesting news, and Voros has since done some seminal work with what he calls Defense Independent Pitching Statistics (DIPS). Basically, he demonstrates that pitchers can control (i.e. repeat with consistency, regardless of luck and defense) just a few things:

What percent of batters he strikes out.
What percent of batters he walks (or hits with a pitch).
At what rate he gives up HRs (this one causes some arguments, maybe we’ll get to that in the comments).

That’s it. Everything else is dependent on defense or luck: Wins, Losses, Runs, Earned Runs, even Hits. None of these stats have been demonstrated to be repeatable skills.

Well, Voros blew some people’s minds. Some people thought he was crazy. It gives too much credit to strikeout pitchers! What about a pitcher who consistently induces weak grounders? It just can’t be true!

But some others thought he might be onto something. So they started working to see if there was a stat or formula that might be able to isolate the things a pitcher can control.

The first stat used to evaluate a pitcher’s performance regardless of defense and luck was called Fielding Independent Pitching, or FIP (I still pronounce the whole thing—“F-I-P”—but I know most people pronounce it like it rhymes with “zip”). And it only takes into account strikeouts, walks, hit batsmen and HRs allowed. Here it is:

[(HR*13 + (BB + HBP – IBB)*3 – K*2)/IP] + 3.2

It looks complex, but don’t get too bogged down in the formula. Basically, the first part—HR*13—says HR are bad; they’re going to raise your FIP. Walks and HBP demonstrate another “negative skill,” so they’re added together. But before those free passes are multiplied by a coefficient, we have to subtract out IBBs, since they are typically managerial decisions and shouldn’t be attributed to a pitcher’s skill. Finally, strikeouts are good! So we subtract off a factor to account for the strikeouts a pitcher accumulates, divide the whole thing by innings pitched, and add a number (usually around 3.2, but it changes from year to year) to make the whole thing comparable to the league’s ERA scale. Tada!! Not so scary.

But what’s the point? Why not just use ERA and call it a day? Well, there are a few points, I suppose. The first is that FIP is NOT a replacement for ERA. ERA is a fine measuring stick of what happened while a pitcher was on the mound. It just combines the parts for which the pitcher was directly responsible and those parts for which he wasn’t. FIP is one tool that helps us solve McCracken’s problem of measuring pitcher performance versus the mirages of luck and defense. Second, FIP does a better job of predicting what a pitcher will do in the future than ERA (this is what stat-heads mean when they say it has a “high r value” or “a strong predictive value”).

As an example, let’s look at some historical data on CC Sabathia:

	ERA	BABIP	BB/9	K/9	HR/9	FIP
2002	4.37	0.29	3.77	6.39	0.73	3.87
2003	3.60	0.29	3.01	6.42	0.87	3.95
2004	4.12	0.29	3.45	6.65	0.96	4.21
2005	4.03	0.30	2.84	7.37	0.87	3.69
2006	3.22	0.31	2.06	8.03	0.79	3.30
2007	3.21	0.32	1.38	7.80	0.75	3.14
2008	2.70	0.31	2.10	8.93	0.68	2.91
2009	3.37	0.28	2.62	7.71	0.70	3.39

There’s a lot of data there, so let me highlight a few points that I find interesting. First, look at the variation in FIP and ERA: the difference between CC’s highest and lowest ERA is 1.67 whereas for FIP, it’s 1.30. I mention this only to demonstrate that FIP is typically more consistent than ERA over a player’s career. (This should be the case, since defense and luck aren’t involved in FIP.) Second, when Sabathia’s FIP was lower than his ERA (like in 2002), it means that he actually pitched better than his ERA would suggest.

So let’s look at 2002 and some of his “peripherals”—a fancy word for those stats in the middle—to see what we find. His BABIP is fairly normal: .290-.300 is around league average, and pitchers don’t have much control over the variation in their respective BAPIPs. His walk rate that year actually looks high, which should surprise us, since walks inflate your FIP. Even more, his strikeout rate is low compared to his career norm. So what gives? Well, look at the HR rate compared to his career norms. He had an exceptionally good year with the long-ball in ‘02, and since HR are so important in the FIP formula, his total FIP was below his ERA, and considerably so.

Now look at his Cy Young season. Believe it or not, CC was unlucky in 2007: his FIP was lower than his ERA. Why? Well, his BABIP was the highest of his career! Yet he somehow pitched his way out of the trouble caused by those extra base hits falling in by cutting his walk rate to the lowest of his career and striking out 7.80 batters per nine innings! And people wanted to give the Cy Young to Josh Beckett! Dopes.

Going from the picture of consistency to…something else, let’s look at Cliff Lee’s numbers:

	ERA	BABIP	BB/9	K/9	HR/9	FIP
2005	3.79	0.287	2.32	6.37	0.98	3.79
2006	4.4	0.309	2.6	5.79	1.3	4.73
2007	6.29	0.313	3.33	6.1	1.57	5.48
2008	2.54	0.305	1.37	6.85	0.48	2.83
2009	3.22	0.326	1.67	7.03	0.66	3.11

Typically, you don’t see so much variation in a player’s FIP from year to year (though, it should be noted that Cliff’s variation in ERA is higher), so let’s look to see if we can figure out what Cliff figured out from 2007 to 2008. First, look at 2007.

Yes, he was awful, but not as awful as his ERA suggests—his FIP was nearly a full run lower than his ERA. Why? His 2007 BABIP was certainly high, at .313. Nevertheless, when you’re walking more than three batters per nine innings and letting up an alarming number of HRs, you’re not going to have a lot of easy outings. Obviously, Lee figured something out in 2008 (Did I need to give you a chart for that? Probably not.) His BABIP was right around normal, but look at how much he cut his walks from the previous year! And now look at HRs! He cut both by a factor of three. When you can do that, you’re going to see some big improvements.

Now look at 2009. He wasn’t so lucky with BABIP: it was the highest of his career. But he struck out more than ever before, had a low walk rate, and continued to minimize his HRs allowed. And that’s what FIP is really all about: your underlying performance can be determined by just a few fundamental things: walks, strikeouts, and HRs.

With that said… How about the current crop of Tribe pitchers? Let’s look at the starters from ’09 (I’ll leave out a few, since there were over 673 starting pitchers on our roster last year):

Name	ERA	BABIP	BB/9	K/9	HR/9	FIP
Carl Pavano	5.37	0.330	1.65	6.30	1.36	4.28
Justin Masterson	4.80	0.318	5.63	7.95	0.83	4.51
David Huff	5.61	0.325	2.88	4.56	1.12	4.69
Aaron Laffey	4.53	0.317	4.36	4.12	0.74	4.76
Jeremy Sowers	5.48	0.297	3.88	3.8	0.84	4.83
Fausto Carmona	6.32	0.330	5.03	5.67	1.15	5.36
Carlos Carrasco	8.87	0.395	4.43	4.43	2.42	7.08

Ummm. Yikes. Well, I guess since we got this far, we might as well go a bit further, but that is one ugly chart.

Let’s start by trying to find some good news. Look at the BABIP column. Remember that BABIPs are generally around .290-.300, and that variations are largely due to luck. Well, last year, our pitching staff was remarkably unlucky. Sowers was the only one to post a reasonably average BABIP; the others ranged anywhere from sorta unlucky (Laffey) to struck-by-lightning-twice-in-one-day unlucky (Carrasco). The other mildly good news is that nearly every pitcher on our team had an FIP below his ERA, which suggests that there is more underlying talent on the pitching staff than was demonstrated last season.

But that’s about it as far as the good news goes. Fausto Carmona walked nearly as many batters as he struck out, and has done so for two consecutive years. For the sake of comparison, in his last good season, 2007, that ratio was 2.25 strikeouts for every walk. Justin Masterson is the only one on the list who strikes out more than six batters per nine innings, but his walk rate (5.63 per 9) is unacceptably high. That weakness is compounded by his obvious difficulty with left handed batters, making him a questionable starter in the long-term picture. David Huff, who has some potential, needs to cut his walks and HRs if he’s not going to miss more bats, and there’s no reason to think he will.

More bad news? Carl Pavano was our best non-Cliff-Lee starter last year. And now he’s going against us with Twins for an entire season. Fun!

Alright, I promise I won’t leave you so down in the dumps. There’s reason for hope. Let’s look at one more chart before we leave you for the week:

	ERA	BABIP	BB/9	K/9	HR/9	FIP	IP
Player A	5.43	0.315	4.07	8.09	1.51	4.97	179
Player B	5.61	0.325	2.88	4.56	1.12	4.69	128.1

Not completely dissimilar. Sure Player A strikes out more people than Player B, but he also allows more HR and walks nearly twice as many batters.

Now if you’ve been reading closely, you’ll recognize Player B. It’s David Huff’s 2009 campaign. Player A? One Clifton Phifer Lee, circa 2004. This isn’t to say that Lee’s career path is typical—it isn’t. But there are some underlying similarities, both in makeup, pure stuff, and some early tough luck. Look, I’m not convinced Huff will become half the pitcher Lee became, but in a season like 2010, when all we have to talk about on the pitching staff is potential upside, it’s a fun thought.

As this post is already getting a bit long-winded, I’ll leave some of the other advanced pitching statistics for another time, but I should mention that not everyone who’s looked seriously at the problem of evaluating pitcher skill believes FIP is the best model. I’ll be honest: I have some issues with completely ignoring batted-ball data, for one, and there is some evidence that pitchers have less control over their HR rates than FIP assumes. xFIP and tRA are two other pitching stats that take different approaches to the skill-versus-luck problem; respectively, they attempt to negotiate the issues surrounding HR rates and batted ball types that some have questioned in the FIP model.

Additionally, Baseball Prospectus just unveiled their new pitching stat, SIERA, which I’ve not yet had time to digest in detail (though I’m simultaneously impressed and nonplussed, if that’s possible). Maybe some time soon, we’ll look at these other approaches, and what they have to tell us about the Indians pitchers.

As always, feel free to ask questions below, and I’ll do my best to point you in a direction of an answer. See you next week!

Thanks again to the guys at WFNY for picking me up as an occasional contributor. The Voros McCracken anecdote is borrowed in part from Moneyball: The Art of Winning an Unfair Game written by Michael Lewis (or by Billy Beane’s evil supercomputers, depending on whom you ask). Much of the research in this series is built on ideas from The Book: Playing the Percentages in Baseball, the ongoing work at FanGraphs, StatCorner, The Hardball Times, and Tom Tango’s blog, and the countless other blogs and books that refuse to stop thinking and arguing about baseball.

—

(AP Photo/David Richard)

18 Comments

boogeyman says:

February 25, 2010 at 2:50 pm

Yuck – math! Everyone is afraid to answer but nonetheless nice work. I’ll admit I didn’t read this whole thing because no matter which kind of math I use when Larry Dolan is part of the equation it always equals mediocre to substandard baseball. It’s just the wrong time for the current Indians management to try and feed the poor struggling fans of Cleveland this load of horse puckey! No matter which stats you want to throw at it the Cleveland Indians will struggle in 2010.
Matt C says:

February 25, 2010 at 2:56 pm

Firejoemorgan.com would like to shake your hand for continuing their good work.

VORP’ed!
Scott says:

February 25, 2010 at 2:56 pm

Great stuff again, Jon.

Any chance that FIP is adjusted for how awful a teams fielding may be? I ask due to the fact that Westbrook, Laffey, Fausto, etc… They’re all ground ball-types that aren’t going to be striking out a lot of batters. Perhaps our BABIP “average” needs to be accounted for as more than just bad luck.
Jon Steiner says:

February 25, 2010 at 3:04 pm

Great question, Scott.

FIP itself isn’t at all concerned with team defense, only the skill of the pitching–it’s defined for that specific purpose. Believe it or not, though, groundball pitchers have an advantage in FIP that strikeout pitchers don’t have: they let up fewer HRs (because you can’t hit a grounder out of the park). But you’re right, groundball pitchers typically have higher BABIPs than flyball/strikeout pitchers for obvious reasons.

I’m putting a piece together on fielding and how it interacts with Runs Allowed. Next week, we’ll either look at defense or batting orders. Any requests?
Rick says:

February 25, 2010 at 3:11 pm

I’m having trouble understanding the defense/luck portion. There are balls that are put in play that would be a hit no matter who is playing defense. They are just hit in the right spot. Also, what about if a manager calls for a shift in the outfield, or brings the infielders in to try and cut off a run and they get beat? Is that the pitcher’s fault?
Scott says:

February 25, 2010 at 3:13 pm

“There are balls that are put in play that would be a hit no matter who is playing defense. They are just hit in the right spot”

Right…and that’s why typical BABIP is about .300. A third of all hit balls (in theory) should be hits.
Jon Steiner says:

February 25, 2010 at 3:35 pm

Rick, you seem to understand it perfectly: you’ve nailed the problem with ERA, essentially. But let me take your question in parts:

“There are balls that are put in play that would be a hit no matter who is playing defense. They are just hit in the right spot.”

Exactly. There is some portion of all batted balls that are going to be hits, no matter the pitcher or defense. This is why BABIPs for pitchers typically have a normal range (.290-.300 is average). This doesn’t mean that a pitcher can’t still get stuck with crummy luck for a year, where more grounders than normal are finding holes. Think about the Maddux and Millwood example at the beginning of the post again: they had the same defense, but random luck changed their BABIPs considerably and in opposite directions.

“Also, what about if a manager calls for a shift in the outfield, or brings the infielders in to try and cut off a run and they get beat? Is that the pitcher’s fault?”

No, it’s not the pitcher’s fault, but ERA includes it (that’s the heart of the matter, really).

If one manager puts on a shift for Hafner, while another doesn’t, and Hafner hits a weak grounder down the 3B line, a pitcher’s ERA may suffer in the first scenario, but not the second. FIP sees balls in play as not controlled by the pitcher. They’re either hits or outs, but the pitcher doesn’t determine which–luck and defense do.

Advocates of FIP (and other defense-independent stats) say that since we don’t really know which pitchers will let up “groundballs with eyes” and which won’t, we shouldn’t use the results of groundballs (or flyballs or line drives) to assess a pitcher’s skill. It’s random for the most part, or at least influenced by someone other than pitcher (i.e. the defense).

If you’re trying to separate defense from luck, that’s a tougher issue (and we may talk in detail about defense sometime soon), but looking at the pitchers’ remarkably high BABIPs from last year tells me either that they had bad luck, bad defense, or both. It also tells me that we have a groundball staff, since groundball pitchers usually have higher BABIPs. Either way though, their underlying skill (FIP) was better than their results (ERA). Like I said, it’s not great news, but they should be able to perform better than last year.

Wow. That was a long response to a simple question. Sorry ’bout that.
Mitch says:

February 25, 2010 at 3:41 pm

All of Cleveland will be focused on the Cavs hopefully through June, followed by LeBronapalooza 2010 in July. IF Lebron leaves, Indians’ pressure will quadruple-not good for this group of young pitchers..
Mitch says:

February 25, 2010 at 4:02 pm

@Jon Steiner

Let’s look at batting order, seems prudent considering Grady will likely not hit leadoff, and Brantley may be in AAA with Branyon addition (with Laporta taking Brantley’s spot in the OF). Asdrubal seems like a classic #2 hitter.
Alex says:

February 25, 2010 at 4:13 pm

I like this series. As a math-y person it’s good to see on one of my favorite sites and looks (to me) simple enough for the average non-math person to understand. Quality work.
Tommy says:

February 25, 2010 at 8:02 pm

I agree with Alex. Please keep these types of articles coming. The intelligent conversation it provokes is good stuff.
JD says:

February 27, 2010 at 9:57 am

I do love these articles, keep ’em coming.

Admittedly though, I’m still on the fence about just how little control pitcher have on BABIP. I understand the theory behind it, and would have to say that on a broad basis I agree with it…but anyone who watched Carlos Carrasco pitch last year knows that almost everything he threw got POUNDED. So wouldn’t some of his ridiculously high BABIP be due to the fact that hey threw a bunch of meatballs up to the plate that got turned into line drives, which far more often wind up as hits than fly balls or ground balls?

Livan Hernandez is another good example; over his last two awful years, he’s been blistered for BABIPs of .347 (2008) and .333 (2009). Not all of that can be attributed to luck/defense, right? Some of that has to do with the fact that he’s just not foolin’ anyone anymore.

One last side note, does that mean our defense stinks given that for many of our pitchers there was a sizeable difference between ERA and FIP?
Jon Steiner says:

February 27, 2010 at 11:11 am

JD,

It’s counter-intuitive for sure, but if we need evidence that Livan Hernandez is a bad pitcher, we don’t need BABIP. Would it surprise you that Randy Johnson’s career BABIP is higher than Hernandez’s?

Look at these BABIPs from last year:

Verlander: .328
Cliff Lee: .326
Lester: .323
Halladay: .313
Grienke: .313

Those are significantly high, and they were posted by 5 of the best pitchers in the AL.

There was an article on BP this week that you might find interesting. It basically argues that what affects a pitcher’s BABIP more than anything is his line drive rate (since 80% of line drives become hits, compared to only 16% of GBs and 14% of FBs). Unfortunately, line drive rate varies incredibly from year to year. That’s why BABIPs are basically random: because line drive rates are. Here’s the link: http://baseballprospectus.com/article.php?articleid=10113 (it’s free, and worth a read if you’re interested in this stuff.)

Glad you asked about defense. We’re going to address that in a future post in detail, but overall, our defense was not as impressive as some people think, especially last year.
SABR-Toothed Triber: Logic on Batting Orders, Sizemore, Branyan & Brantley | WaitingForNextYear says:

March 4, 2010 at 2:01 pm

[…] last two posts focused on explaining some essential offensive and pitching statistics to help us evaluate the 2010 Indians. Rather than introduce another stat this week, I […]
SABR-Toothed Triber: In the Zone with UZR | WaitingForNextYear says:

March 11, 2010 at 1:35 pm

[…] few weeks back, we took an in-depth look at an advanced pitching statistic—FIP. If you’ll remember, much of the discussion was focused […]
SABR-Toothed Triber: Mastering Masterson’s Splits | WaitingForNextYear says:

April 15, 2010 at 1:32 pm

[…] a reminder, FIP is an ERA estimator that attempts to remove defense and luck, so when you’re looking at FIP numbers, just think of […]
David Huff Optioned, but Who’s Taking His Spot? | WaitingForNextYear says:

June 21, 2010 at 12:02 pm

[…] dealt some unfavorable hands. His BABIP is sitting around .328 with his 2010 FIP at 3.91, signaling more “underlying talent” than his player profile would suggest. Not allowing a home run in 25 innings is a start […]
Examining Josh Tomlin's Dominance: Is it Repeatable? | WaitingForNextYear says:

May 18, 2011 at 6:11 pm

[…] a total performance and tend to suggest talent levels more quickly. For example, instead of ERA, we look at strikeouts and walks. Instead of batting average, we look at ISO and OBP. The peripherals appear to do a better job […]