Well, of course he *could.* But what we (and by we, I mean baseball nerds with way too much time on our hands) want to know is, how *likely* is it that Mauer will hit .400?

The conventional wisdom is that it’s no longer possible to hit .400–or at least it’s much more difficult than it used to be. The absence of any .400 hitters since Ted Williams would seem to confirm that diagnosis. But John Bonnes, the Twins Geek, has a provocative post up today arguing that in fact, it’s getting easier to hit .400. His evidence is simply that of all the players who have come close to .400 since Williams, the majority have done it in the past 15 years. On top of that, some–like George Brett and Tony Gwynn–have come within a few hits of the achievement. On the basis of this observation, the Geek says that maybe Mauer has more of a shot than we think.

This is, on the face of it, intriguing evidence. But the Geek is a numerically astute guy, and so I was a little disappointed that he didn’t mention a sabermetric classic on this topic: the late Paleontologist Stephen Jay Gould’s essay on the dissappearance of the .400 hitter (that’s not a link to the actual essay, which I couldn’t find online). Gould was arguing against people who thought that the decline of .400 hitting was due to the declining quality of hitters. Gould argued that paradoxically, the decline of .400 hitting was due to the fact that all players were actually getting *better. *Because the general level of play was higher, it was more difficult for any player to be so far ahead of the pack that they could hit .400.

Gould supported this argument by showing that batting averages had become less *variable. *That is, there are both fewer really good hitters *and* fewer really bad hitters, becaue everyone is more bunched together within a narrower of batting averages. When he did this analysis originally, back in the 1980’s, he painstakingly put together the data by hand, while he was laid up in bed recovering from an illness. But today, of course, we have the statistics at our fingertips. So, using data from the Lahman database, I thought I’d extend Gould’s analysis and see what it has to say about the Twins Geek’s hypothesis.

All the graphs below are based only on players who meet the modern definition of qualifying for a batting title: 3.1 plate appearances per team games played. This wasn’t actually the rule used prior to 1957, but I applied it anyway for simplicity.

First off, here’s a picture that demonstrates what I mean when I talk about the decreasing variation in batting averages. It’s a comparison of the distribution of batting averages in 1900 and in 2000.

You can see that the hitters in 1900 were more spread out than the hitters in 2000. There are more hitters with really high averages, but also more hitters with really low averages. Even though the average hitter had a higher batting average in 1900 (signified by the peak of the curve being farther to the right), there were still more hitters down around the .200 mark (the red line is above the black line at the left end.) Back in 1900, a “good glove, no hit” infielder could still find a starting job in a way he couldn’t today.

To get a general idea of how batting average has become less variable over time, we can look at the standard deviation of batting average by season. The standard deviation essentially measures how spread out the distribution of batting averages is. (Technically, it measures the average distance from the mean.) The higher the standard deviation, the more spread out the batting averages are. See this graph, which is adapted from one that Gould originally produced:

You can see that up through the early 1980’s, when Gould’s analysis was done, batting average was becoming less and less variable. This happened even as the average *level* of batting average bounced around between “pitcher-friendly” and “hitter-friendly” eras:

But if you look at those graphs, you’ll notice that something happened in the ’90’s: batting averages went up overall, and the *variation in averages also went up.* Whether that was because of expansion, or the steroid era, or whatever, I can’t say. But that’s not what I’m interested in explaining. I want to know how easy it is to hit .400 these days. Higher batting averages + more variability should equal a better chance of a .400 hitter. But how much better?

Fortunately, it’s possible to get an answer to this question that’s at least reasonably precise. If you look back at the first graph, you’ll see that the batting averages of all the qualifying hitters in any one season approximate a bell curve, or what’s called a normal distribution. And the nice thing about things that are normally distributed is that we can predict the probability that a normally distributed variable will take on any particular value. If we know what the average of all batting averages is, and we know the standard deviation of batting averages, we can predict the probability that a particular player will hit .400 or above.

This means that we can predict the probability of hitting .400 in each year. In order to smooth out year-to-year fluctuations in the mean and standard deviation of batting averages, I took the average of the previous five years. Then I calculated, for each year, the probability that some hitter would hit .400 or better that year. For comparison, I also calculated the probabilities for hitting .380 and .390. Keep in mind this is the probability of *any *hitter getting to .400 (based on the number of people who qualified for the batting title that year), not the probability of any one *particular* hitter doing it.

The first thing I have to say here is that Twins Geek was really onto something. The chances of somebody hitting .400 jumped up in the last 15 years, to levels not seen since before World War II.

That’s the good news for Joe Mauer. The bad news is that this trend seems to have reversed itself, and things are back to the way they were in the 1980’s. If you look at the charts above, you’ll see that this is not necessarily because averages have come down overall (although they have, some), but because they’ve become less variable, more bunched together.

The other bad news for Joe, of course, is that even in the batting bonanza of the late 1990’s, the chances that anyone would hit .400 were never even 0.5%. It’s just a really hard thing to do. Which is all the more reason that despite all that I’ve said here, I’m going to keep on watching and rooting for Mauer along with the Twins Geek.

Holy LORD, is this a great post. How do I not know about this blog?!? I need to talk to you about some if this math too.

THANKS!

John

By:

Twins Geekon July 19, 2009at 2:23 am

[…] About ← Could Joe Mauer hit .400? […]

By:

Hitting the big time « Away Games: A Minnesota Twins Blogon July 21, 2009at 3:35 pm

Good analysis – one of the better statistical posts I have seen in a while.

I would point out that the standard deviation since about 1960 looks pretty flat. I am not sure there is anything there other than statistical noise. I am puzzled by the dramatic drop in the last graph given the less dramatic changes in the other two.

It also appears that steroids may have done a lot more than add home runs to the mix. They may have been the cause of a lot of that burst of offense. Of course that widens the group of suspects for steroid abuse. The question may be how likely is someone to hit .400 without steroids to help.

By:

TTon July 21, 2009at 11:09 pm

[…] over my post on Joe Mauer and hitting .400, it occurred to me that there was one other question I should have addressed. I noted how unlikely […]

By:

Aesthetics and statistics, or, more fun with the binomial distribution « Away Games: A Minnesota Twins Blogon July 22, 2009at 4:11 am