Posted by: Peter | July 22, 2009

Aesthetics and statistics, or, more fun with the binomial distribution

For years now, we’ve heard people complaining about how sabermetrics and data analysis are taking the fun out of the game, and ruining it for everyone who just wants to enjoy the fun and the myth and the romance of it, without all those grubby numbers.

But to me, the objective and aesthetic sides of appreciating the game are linked. Below are a couple of examples of ways that examining the numbers helps me the beauty of this game, beyond the wins and losses.

Beating the odds

Looking over my post on Joe Mauer and hitting .400, it occurred to me that there was one other question I should have addressed. I noted how unlikely it is, in any given season, that somebody will hit .400. But we might also ask: what are the chances that, in all the years since Ted Williams did it, nobody got to .400 even once? To answer this just means combining all the year-by-year probabilities to figure out the chances that someone, sometime after 1941, would get to a specified batting average. Without going into the boring details, here’s the answer:

probs1941

What I love about this graph is how serendipitously wonderful it is, in terms of the aesthetics of being a baseball fan. Mathematically, there’s nothing special about the number .4, it’s just a number. But to baseball fans, with our oddly numerological fixations, it seems like a magical threshold. So what good fortune for it to turn out that, when you run the probabilities, things turn out like they do above. To hit .380 is hard, but someone was liable to do it sometime since 1941, just by chance. To hit .390 is much harder, improbable even–but the fact that someone has managed to do it is merely surprising, not awe-inspiring. And on the other side, to hit .410 or .420 is so outlandishly improbable that we’ll almost certainly never see it (although on outlandish improbability, see below.)

But .400 is something else altogether. It’s sitting right there on the edge of possibility. Maybe someday, someone will get there–but if they do, it will mean literally “beating the odds”, doing something truly improbable. In other words, the “statistical significance” of .400, as we say in the jargon of data analysis, is commensurate with the emotional significance we place on it as fans.

And now for something completely differentimprobable

Finally, I want to return to the wonderful Stephen Jay Gould one more time. In another of his writings on baseball, Gould noted that most of the feats we celebrate in the game are actually not that statistically surprising–with one important exception:

Nothing ever happened in baseball above and beyond the frequency predicted by coin-tossing models. The longest runs of wins or losses are as long as they should be, and occur about as often as they ought to. Even the hapless Orioles, at 0 and 21 to start this season, only fell victim to the laws of probability (and not to the vengeful God of racism, out to punish major league baseball’s only black manager).

But “treasure your exceptions,” as the old motto goes. There is one major exception, and absolutely only one—one sequence so many standard deviations above the expected distribution that it should not have occurred at all. Joe DiMaggio’s fifty-six–game hitting streak in 1941. The intuition of baseball aficionados has been vindicated. Purcell calculated that to make it likely (probability greater than 50 percent) that a run of even fifty games will occur once in the history of baseball up to now (and fifty-six is a lot more than fifty in this kind of league), baseball’s rosters would have to include either four lifetime .400 batters or fifty-two lifetime .350 batters over careers of one thousand games. In actuality, only three men have lifetime batting averages in excess of .350, and no one is anywhere near .400 (Ty Cobb at .367, Rogers Hornsby at .358, and Shoeless Joe Jackson at .356). DiMaggio’s streak is the most extraordinary thing that ever happened in American sports. He sits on the shoulders of two bearers—mythology and science. For Joe DiMaggio accomplished what no other ballplayer has done. He beat the hardest taskmaster of all, a woman who makes Nolan Ryan’s fastball look like a cantaloupe in slow motion—Lady Luck.

Here’s another graph to put this one in perspective. Let’s take a couple of hypothetical hitters. Assume that the “true” talent level of one is that of a .300 hitter, while the other is really a .350 hitter. The latter would be a very, very, good hitter, of course, which should make it easier to sustain a hitting streak.

Now assume these guys get five official at-bats, every single night. This isn’t very realistic either, in that it doesn’t account for walks, and injuries, and low-offense games, and so on. So again, this assumption should make it easier to sustain a streak.

So given these assumptions, what’s the chance of sustaining a long streak? Again, it’s simple probabilities, calculated with the binomial distribution. If you’re a true .300 hitter, your chance of getting a hit is .3 for any given at bat. The chance of getting at least one hit over 5 at-bats is 1-(.7*.7*.7*.7*.7)=0.83, or 83% (.7 is the chance that you won’t get a hit on one at bat, and you multiply those probabilities to get the chance of going hitless over all five at-bats.) The chance of getting a hit in back-to-back games is 0.83*0.83=0.69 or 69%, and so on.

So without further ado:

probhitstreak

This is what Gould was talking about. You’re getting into crazy territory after 30 games, much less 50. That streak isn’t just improbable. It’s wildly, absurdly, inhumanly impossible. No-one should ever have been able to do it. (It’s a 0.003% chance even for the .350 hitter, if you’re wondering.) And so most likely, nobody will ever do it again.

I have to admit, I didn’t care that much about Joe DiMaggio before I learned about all this. I mean, I’m not a Yankee fan, and all this happened before I was born–hell,  before my father was born. But looking at those mind-boggling numbers, I find myself impressed that anyone could do what he did. This isn’t about Moneyball–hitting streaks don’t mean much, in terms of wins and losses. Rather, it goes to the part of the game that’s just beautiful and wonderful, and not entirely rational. The part, that is, that makes us want to watch it.


Responses

  1. Yeah, and Dimaggio (I think) essentially did it twice. In the minors, in 1933, he had an even longer hitting streak – 61 games. And that was in the PCL, which (again, I think) was might close to major league quality about then.

    He was either VERY lucky, or there’s something there we’re missing.

  2. Do you have an RSS feed? I would like to add you to our blog feed at MNGameDay.com (and on my site).

  3. Ok, so I’m not sure exactly what you are measuring. Are you saying that there is a 0.003% chance that a .350 hitter would have this streak on any random 56 game stretch? Or in their career? I’d like to know what the chances that someone would have done it in the history of baseball.


Leave a reply to lookatthosetwins Cancel reply

Categories