Jun 03

Predicting No-Hitters: a Fool’s Errand?

I’ve written at length (here and here) about Josh Beckett’s no-hitter and its place in Los Angeles Dodgers history.  In researching those articles, I began to wonder if there was anything related to offensive trends, particularly league-wide, that would be predictive in determining potential no-hitters.  In other words, if I were to crunch some numbers, could I lay a bet in Vegas on the over/under for no-hitters in a season and never work another day in the rest of my life?

Remember, all hard work begins with an end goal of sustained laziness.

My premise is a simple one, actually:  is there any correlation between batting average and/or strikeout percentage (strikeouts / plate appearances) that would help predict the frequency of no-hitters?  I chose these two because of two reasons:  they’ve both been in the news lately because of the spate of shutouts across the Majors, and both seemed logical choices for assisting in a no-hitter.  Poor hitting teams are probably not likely to, you know, get hits, and according to every journalist out there strikeouts are bad, bad for offense, and bad for the game.

NOTE:  I am a horrible statistician.  I never took one class, constantly screw up terms, and probably mean average (a little statistics joke).  If we learned nothing else from X-Files, trust no one.

Before beginning, however, I’d like to define the thresholds for my three key figures in this post:   batting average, strikeout percentage, and no-hitter frequency.  The number of teams has changed drastically over the years, especially when we’re considering a data set that comprises 144 seasons (1871 to 2014), so using count of no-hitters per season doesn’t buy me anything.  Frequency of a no-hitter occurring based on games played, however, works well and can be tracked over time much easier.

Avg / Std Dev K% / Std Dev Freq % / Std Dev
.263 / .013 10.90 / 4.58 .14 / .15

Baseline Figures

In my mind, the standard deviation can define levels of offensive ineptitude.  In this way, we would expect the following to be normal: a batting average that falls between .249 and .276; a strikeout percentage that falls between 6.32 and 15.48; and a frequency percentage that falls between 0 (actually the true value is -0.1, but unless Jim Joyce miscalled a few too many close ones at first, I’m not sure how this is possible) and .29.  Anything that falls outside of those ranges, we can safely consider atypical.

When first beginning my examination, I came up with this little gem of a table, where I calculated the odds for a given quantity of no-hitters occurring in a season[i].  I do not include 2014 in these tables since the season hasn’t completed.

Total No Hitters (Season) Occurrences % of Total
0 32 22.37
1 39 27.08
2 25 17.48
3 26 18.18
4 10 6.99
5 4 2.80
6 4 2.80
7 3 2.10

No-Hitter Events by Season (Rounded Up)

Unfortunately, this method of sorting out seasons with an unusually high quantify of no-hitters doesn’t make much sense given how frequency is more relevant than quantity.  So, here’s an updated table based on frequency.

Frequency % Range (Season) Occurrences % of Total
0.00 32 22.37
0.01 – 0.09 37 25.87
0.10 – 0.19 38 26.57
0.20 – 0.29 18 12.59
0.30 – 0.39 13 9.09
> 0.40 5 3.50

No-Hitter Frequency by Season (Rounded Up)

First, then, I should begin my examination by looking into the 18 seasons where the frequency was above what we would normally expect.

Year No-Hitters Freq % Avg Avg Rank K% K% Rank
1880 4 1.18 .245 9 7.96 94
1882 3 0.52 .248 19 4.99 119
1908 6 0.48 .239 3 N/A N/A
1917 6 0.48 .249 21 9.37 74
1898 4 0.43 .271 110 N/A N/A
1876 1 0.38 .265 93 2.88 126
1888 4 0.37 .239 2 4.94 121
1991 7 0.33 .256 45 15.17 30
1990 7 0.33 .258 58 14.87 34
1892 3 0.33 .245 10 8.45 87
1884 5 0.32 .243 5 3.78 124
1905 4 0.32 .248 18 N/A N/A
1951 4 0.32 .261 73 9.71 70
1956 4 0.32 .258 61 12.07 59
1916 4 0.31 .248 17 10.31 65
1962 5 0.31 .258 57 14.09 40
1969 6 0.31 .248 20 15.16 31
1968 5 0.31 .237 1 15.85 23

Above Normal Frequency of No-Hitters

The first thing that becomes painfully obvious with the table above is that 11 of the 18 seasons (61.1%) came during the dead-ball era and another three came during the 1960s, an era so dominated by pitching that in 1969 MLB actually lowered the pitching mound from 15” to 10” just to help the struggling offenses.  But even using these high frequency seasons as our data set, I’m left with 11 of 18 (61% again) seasons that fall below our .249 threshold for batting average, and only two of those were outside of the dead-ball era, and only one season that is above the strikeout percentage threshold.

What if I take another tack?  What if I approach this question by examining only those seasons that fall below the batting average threshold?  This time, though, I’m going to filter out dead-ball era seasons.

Year No-Hitters Freq % Avg Avg Rank K% K% Rank
1968 5 0.31 .237 1 15.85 23
1967 4 0.25 .242 4 15.92 21
1972 3 0.16 .244 7 14.80 35
1965 3 0.18 .246 12 15.70 24
1963 3 0.19 .246 13 15.34 29
1969 6 0.31 .248 20 15.16 31
1966 1 0.06 .249 22 15.44 27

Below Normal Batting Average Seasons

Well, this certainly clears things up doesn’t it?  I went from examining the dead-ball era and some of the 60s to almost the entirety of the 60s.  Still, in these seven seasons, only two (those I reviewed previously) were above normal for the frequency of no-hitters, which works out to 28.57% of the representative sample.  If I add back in those dead-ball era seasons, the numbers jump to 11 of 22 seasons (50%), but I might as well be discussing eras of baseball at that point rather than low batting averages.

Moving on to strikeout percentage, is there any way to determine an increased frequency in no-hitters based on an above expected range?  The short of it is no.  Without adding another long table to this post, there are 25 seasons where the K% has been above 15.48% and in those 25 seasons, there was only 1 season (1968) where the frequency of no-hitters crossed over what we would expect.  The 25 seasons are as follows:  1964-65, 1967-68, 1987, and 1994-2013.  In 2012, there were seven no-hitters, and the frequency percentage was exactly at 0.29, which we’ll call as a hit, which makes it two of 25 (8%).

So what does all of this mean?  In my mind, it means I shouldn’t be selling the house and moving to Vegas to become a professional gambler.  The problem with an exercise like this is that a no-hitter is just as much about luck as it is about pitching dominance.  A lot can happen in a game that would determine if a no-hitter is thrown, such as a quirky bounce of the ball, shadows on the field, sun changing positions, wind, fielders shifted one way or the other (a more modern problem admittedly), and good old human error as in an official scorer’s judgment.  That’s just off the top of my head.

In conclusion, I don’t see high strikeouts and poor hitting (at a league level) to be any indicator of a no-hitter.  That’s not to say there isn’t a better metric out there, which I’ll keep digging if only to strike it rich, but these two are definitely not the ones.  A better exercise, though, for seeing the impact of hitting and strikeouts on expected outcomes would be to broaden my search to include games with 2 or fewer hits.  More data means I might be able to determine if there is any true correlation.  This, however, is an exercise for a different time.  That’s another question to answer and outside the scope of this article.

[i] I computed my list based on the list of no-hitters on ESPN’s site with an additional no-hitter listed from Baseball Reference that brought my total to 273, eight fewer than referenced on the Baseball Reference web site.  For the purposes of this breakdown, I didn’t find the discrepancy large enough to warrant the extra time comparing the two lists.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>