Over at FanGraphs, Carson Cistulli has a nice couple of posts where he looks back to some Bill James player forecasts from 1993 and 1994, in an attempt to see how the predictions of player value held up. Tom Tango adds some additional insight based on running a regression of career WAR on the “dollar value” James assigned to players. Tango ends up using a polynomial fit of the data, resulting in the following equation:
This is good stuff (although I’m not sure why Tango decided to fix the intercept at 0 in that regression). However, I felt like both posts could have really benefited from some kind of graphical display of the data. Especially because I think that looking at a graph changes one’s interpretation of the results somewhat. Fortunately, Carson was nice enough to supply his data. So here are the players from 1994:
The color codes reflect James’s letter grade, or “Z” for ungraded players. The line is a locally-weighted regression (“LOESS“) line that shows the general trend.
What does this picture say? Well, the way I read it, it tells us that for the top prospects–those with A or B ratings–James’s dollar values really tell us something. For players who got valuations over $20 or so, there is a clear positive relationship between the dollar ratings and eventual career WAR. What’s more, there is a clear floor under these players: beyond the $25-30 threshold, none of them ended up being total busts.
On the other hand, it doesn’t look like the dollar values below $20 really contain any information. Among all those C and D grade prospects, James’s ratings don’t help us distinguish between surprise stars like Jim Edmonds (valued at $15, 68.1 career WAR) and total washouts like Stanton Cameron (also $15, 0 WAR). If I run a linear regression using only the players valued over $20, I get:
I think this is probably the more useful relationship, since it throws out all the fringe prospects who aren’t really adding any information.
I can only add my ringing endorsement to Tango’s call for forecasters to release more of this kind of information. I’d love to know whether this pattern is a fluke of this data, or of Bill James as a forecaster, or whether it generalizes. If forecasts are useful for top prospects but not marginal ones, that would be very important to know.
