MLB Trade Rumors: Lonnie Chisenhall reportedly available
July 23, 2015Sleeves and some throwbacks: Cavs rumored uniforms for 2015-16
July 23, 2015With the Cleveland Indians continuing their struggles hitting with RISP1 , there has been an added emphasis among analysts and fans to uncover how much of their struggles are luck versus skill. In particular, there has been an emphasis on the amount of hard hit balls. However, there is a possibility that focus is on the wrong set of data, and that we should instead be looking at how often the Indians are making soft contact2 .
Some of the renewed thoughts on hard contact were made after Mark Simon of ESPN sent out the below table that indicated the Indians have the fourth worst batting average when they do hit the ball hard. The table does not go into the details of breaking out players with high pull percentages that hit the ball into the shift (more fielders to pull side would mean lower relative batting average). But, it is still instructive to demonstrate there appears to be some amount of bad fortune at play on hard hit balls this season.
MLB teams ranked by their batting average when recording a hard-hit ball pic.twitter.com/nlH8i3GH3w
— Mark Simon (@MarkASimonSays) July 20, 2015
However, are the hard hit balls the most reliable factor to determining the general success (or failure) of Indians batters this season? To determine the amount of correlation between hard hit balls and the overall BABIP of the batter, I charted the 2015 Cleveland Indians batters hard%3 versus BABIP4 .
I chose BABIP over wRC+ and OPS+ to better control the metrics. While wRC+ and OPS+ take into account factors other than what happens on contact to determine better the overall success of the hitter, those other factors also pollute the data. For instance, as most fans know, Carlos Santana walks to an extreme degree. The walks are counted positively in both wRC+ and OPS+ and make it more difficult to separate out how much his hard contact rate is affecting his overall success. Choosing BABIP allows us to focus on just that contact.
What I expected to see if there is a strong correlation between hard% and BABIP is that the BABIP would rise as hard% rose. Therefore, the scatter plot would show the dots starting in the lower left hand corner (low BABIP, low hard%) and continue to rise to the upper right hand corner (high BABIP, high hard%) with only an outlier or two in the upper left (low BABIP, high hard%) and lower right (high BABIP, low hard%).
To make it easier to see visually, I plotted a regression line on the chart. The dots should more or less follow the line if there is a strong correlation. And, the big red square in the middle is the MLB average.
The chart above is not what I was initially expecting. Ignoring the biggest outliers (Ryan Raburn and Michael Bourn)5 , there seems to be some measure of rising through the chart. But, it takes some eye straining and thought to get there, which it should not.
These unexpected results reminded me of some recent conversations I have had about Carlos Santana. WFNY’s Jacob Rosen described in great detail on Tuesday why there are some unfair criticisms being lobbed on Santana. However, there is also an undeniable drop-off in his power numbers this season. When discussing it earlier, I had noticed that his soft% (soft contact rate) has been rising each of the last four seasons (13.3%, 15.4%, 20%, 22%) and there appeared there might be some interrelationship with his wRC+ (120, 134, 131, 115).
Therefore, it might be possible that hard% was the wrong side of the contact rates to be studying. Turning my attention to the soft%, I charted the 2015 Indians batters in a similar manner with soft% versus BABIP. To make it easier to read, I inverted the soft% numbers on the chart (because a lower soft% is good and a higher soft% is bad), so that the regression line would still rise from the lower left hand corner to the upper right hand corner as it did on the first graph.
Aha! Now, that is the type of inter-dependency that I was expecting. Sure, there are a couple of outliers (Roberto Perez and Lonnie Chisenhall), but the overall scatter plot follows the line.
Now, like Simon’s table, this data simplifies matters and does not truly determine luck versus poor situational hitting (i.e. hitting into the shift, accounting for speedy runners who beat out throws, etc.). But, in these simple terms, the expected BABIP is the regression line, so the dots that sit below the line are hitters who have been fortunate on softly hit balls this season6 And, the dots that sit above the line are hitters who have been unfortunate on softly hit balls this season7 .
Overall, this exercise taught us that hitting the ball softly less often will help a hitter8 , but, more importantly, we learned that obtaining less soft contact is perhaps more important than merely worrying about hard contact.
- 1-for-11 on Tuesday night versus Milwaukee [↩]
- Fangraphs separates hits into three main categories: soft contact, medium contact, and hard contact. [↩]
- Determined by an algorithm that calculates it based on hang time, location and trajectory. [↩]
- Batting average on balls in play. So, it does not count strike outs, walks, or home runs as defenders cannot make plays on them. [↩]
- See bottom of post for full table of player statistics. All from fangraphs.com on July 21 unless noted otherwise. [↩]
- Namely, Roberto Perez might not be as good as his numbers indicate and Lonnie Chisenhall might have been lucky to even have his poor numbers. [↩]
- Jason Kipnis, in all his statistical glory, might actually have had bad luck this season here. In addition, the Indians hitters in general have suffered from this definition of bad fortune as the Indians regression line sits above the MLB average. [↩]
- Look at that hard-hitting analysis that breaks convention! Oh wait…nevermind. [↩]
28 Comments
I am confused by these results. Hard hit balls do not correlate with hits but softly hit balls do correlate (very strongly it appears, almost textbook-like) with outs… is there a third type of hit that I am unaware of? Or does the analysis show softly-hit balls are bad but everything else is luck?
I changed the channel at hard hit balls followed by soft hits sorry just one more example where baseball has gone off the rails. I’m with World Series winning first place manager Mike Scioscia on this all day everyday, old school! This doesn’t mean one discounts all analytics but still it’s called going overboard and I don’t think it’s helping the sport of baseball at all.
Peace!
Give someone some time I’m sure they’ll come up with a third kind of hit ball.
Ahh…I should have clarified that there are medium-hit balls quantified too. I will add that after I respond here. Thank you.
Now, for further explanation on the results.
Hard hit balls do correlate with hits. Batting average on hard hit balls is higher than with soft or medium hit balls. However, as you correctly stated, the soft hit balls turning into outs affects batting average much more than hard hit balls turning into hits.
I also want to state that this is still a general observation / hypothosis at this point. It works for the 2015 Cleveland Indians, but I do still need to look at larger samples of players to see if it will hold up across, which is why I use the word “might” a fair bit above.
No worries, thanks for giving it a chance. Here’s the conclusion statement for you:
Overall, this exercise taught us that hitting the ball softly less often will help a hitter.
I’d guess the third type would be normally hit balls, which are a majority (at least according to that last chart).
Too late 🙂 (see my reply to Hop)
Also, thank you greatly for the feedback. I love deep-diving into the statistics, but want to convey them in a manner that most people can appreciate without needing to do the deep-dive themselves. So, when I don’t, I need to know. Thanks.
You should have put hard% and soft% on the x-axis. Independent variables!
http://cdn.meme.am/instances/54172792.jpg
The hard way is the only way! Cleveland Indians offensive motto for 2015.
Oh, one more follow-on from our Carlos Santana discussion on Jacob’s article. I believe wRC+ does account for walks more than it should. Here is the same chart on soft%, but using wRC+ instead of BABIP. It looks mostly the same except there is a definitive extra outlier right next to Roberto Perez’s dot in the lower right hand side. That dot is Carlos Santana who makes a dramatic shift due to his walks.
https://waitingfornextyear.com/wp-content/uploads/2015/07/July-21-2015-Cleveland-Indians-soft-contact-vs-wRC-plus.jpg
https://s-media-cache-ak0.pinimg.com/236x/bf/6b/cc/bf6bccd4d498b1db2b259a9b9fe1c702.jpg
You know what I know about Carlos Santana? He should be on the trade block.
I am not seeing the correlation of % hard hit balls to hits. Am I misreading? I do not see .avg rising with % hard hit.
To see the direct correlation of hard% to hits, then just the batting average on hard hit balls would need to be shown and then compared with non-hard% batting average.
Mark Simon’s table above shows that batting average on hard hit balls is anywhere from .630-.760, which would be a ridiculously high batting average obviously.
The BABIP shown in the charts is the full BABIP, not the BABIP from hard hits. The reason for that is to see the effect on hard% on the overall batting average of hitters. The conclusion demonstrated that they did not effect BABIP as much as soft%.
BABIP is a measure of hits is it not?
I recall reading a similar article on Fangraphs that also supported the notion that Soft% was inversely correlated to BABIP; however, Hard% didn’t have as strong of an opposite effect. I don’t have the time to research, but my guess is that HR’s are what throw off the correlation since they are not included in BABIP and most HR’s are likely hard hit balls. If there was a stat that calculated Batting Average, while excluding walks and strikeouts but including HR’s, that would likely have a very strong correlation with Hard%.
contact. outs and hits. my point there was that it is including soft, medium, and hard hits.
in the end, it’s more important to just make sure a MLB player is making good contact. they don’t have to kill the ball to improve.
Fangraphs are like the Simpsons. Anytime you think you find something on your own, you can usually find out that they did it first 🙂
and, that is a good point on HRs.
Crowdsourcing at it’s finest, appreciate the Indians flavor here though.
Thanks. And, thanks for reading. I will look for that fangraphs article as I do like to see what others have done.
Well, if Carlos Carrasco is “on the trade block,” then offers for Carlos Santana would certainly be listened to as well.
You would assume wouldn’t you?
Just a thought here- it seems what it really is saying is that an approach that creates more hard contact and more soft contact in equal amounts might be worse than an approach that leaves more in the medium contact range…but…
The other thing is that the BABIP stuff does neglect home runs as well as walks! So Santana is also getting more credit for his home runs with wRC+ than with BABIP.
Especially given that BABIP doesn’t account for home runs, you would expect BABIP to correlate better with soft% than hard%.
So given that BABIP = (H-HR)/(AB-K-HR+SF), we could define a new stat as BABIP+HR = H/(AB-K+SF). This stat would reflect the fact that hard hit balls that are home runs should probably count for something when evaluating how much Hard% should count. I did a brief calculation for Santana and his BABIP is 0.250, but his BABIP+HR is 0.280. Bit of a difference, eh? My guess is hard% would correlate much better with BABIP+HR. And soft% would correlate less well…
Agreed and Jon Jensik did bring up the HR point above. I don’t have time to do a full recalculation on the graphs, but when I had looked briefly at what it might do, it would correlate a bit better for some players (such as Santana and any in the upper left), but it would actually push many players away from the line (anyone near or below it). Since the plot has a pretty equal distribution, I’m not sure it would show all that much better.
Makes sense. Thanks for the response. I think it would also make the soft% correlation much less clear though. And the value of a home run is quite high. So that might call the result (about soft contact vs hard contact) into question more than the better correlation with hard%, as in, hard% would be slightly better correlated, soft% would be less correlated, and we would have a much different argument here. But it’s still an interesting question and cool piece! thanks for that and definite shoutout to Mr. Jensik!
Glad you enjoyed it. And yes, there are always more wrinkles that can be thrown into these discussions, which is what makes them so interesting.
For this case, I was focusing more on the value of increasing BABIP, if you want to administer the true value in HR, then you would need to include all extra base hits. So, perhaps a SLG% on BIP plus HR.