Evaluation of 2009 Pitcher Forecasts

Dash Davidson, Peter Rosenbloom and Jared Cross*

Is our projection system smarter than a monkey?

In order to answer this question for our projection system as well as several others we compared the accuracy of each system’s  2009 pitcher projections to Marcel the Monkey’s projections.  We have previously examined hitter projections and you can see that study here.

We looked at a pool of 512 pitchers (all pitchers with a minimum of 10 IP and prior MLB stats or 25 IP and no prior MLB stats so long as 2 or more systems provided a projection).  When systems did not provide projections for some of these 512 players that system was assigned the Marcel projection for those players and limited to beating Marcel (or losing to Marcel) on the projections it did make.  Of these 512 pitchers, 51 of them did not have prior major league stats and, as a result, all received the same projection from Marcel:

Joe Average, 25 IP, 4.50 ERA, 6.84 K/9, 3.6 BB/9, 1.08 HR/9, 1 W, 0 Sv

Oliver was the most comprehensive system with projections for 510 of the 512 players.  Steamer only projected 43% of the players (covering 60% of the innings) and Sporting News only projected 28% of the players (42% of the innings).  Cairo, Chone, Zips and Pecota all projected at least 95% of the innings.

To make these results more compelling, and to give a sense of whether the differences between these systems are meaningful, we’re presenting the success of each system relative to Marcel as a W-L record over a 162 game schedule.

ERA Projections

System W L W% z score
ZiPS 100 62 0.618 3.01
CAIRO 95 67 0.585 2.18
Chone 91 71 0.564 1.62
Steamer 90 72 0.558 1.47
PECOTA 88 74 0.541 1.05
Oliver 82 80 0.503 0.08
Marcel 81 81 0.500
Fantistics 80 82 0.492 -0.20
Sporting News 73 89 0.454 -1.18

This is a big win for ZiPS and a very strong showing for CAIRO.  Steamer and Pecota beat Marcel soundly and Oliver and Fantistics hold their own with the monkey.  Not a great year for Fantistics.

How should you interpret these standings?  The W-L records show you how well each system did when compared to the Marcel system (the baseline projection system).  How likely is it that ZiPS was really no better than Marcel and simply got really lucky this year?  Just as likely as it is that a 100-62 team was really just a .500 club that got lucky.  Not very likely.

For the statistically inclined, what I did was calculate a z-score for the hypothesis that each system was equally as good as Marcel and matched that up with a Z-score for the hypothesis that a team is really a .500 team given a certain winning percentage.

None of the systems demonstrated a statistically significant ability to project the ERAs of the 51 players without major league stats.  Granted, 51 players may simply be too small of a sample in which to expect significant results when predicting ERA particularly for guys who didn’t throw that many innings.  This is why I am excited about Oliver and Pecota projecting several years forward in time.  These multi-year projections might prove useless but, without them, I think it will be difficult to discern which systems have the best grasp on minor league performances and scouting.

So, ZiPS wins big.  Does this mean that you should simply use ZiPS pitching projections and you’ll dominate your fantasy league?  ERA is, of course, only one of five categories in most fantasy leagues that also rank teams in strikeouts, wins, saves and WHIP.  Standings Gain Points (or SGPs) is a system used to determine how valuable a pitcher was across all five categories.  How good were these system in projecting total fantasy value?

SGP Projections

System W L W% z score
Fantistics 87 75 0.536 0.92
Marcel 81 81 0.500
Steamer 81 81 0.498 -0.06
Sporting News 81 81 0.498 -0.06
PECOTA 73 89 0.453 -1.19
CAIRO 70 92 0.430 -1.79

Only Fantistics beats Marcel.  In fairness, only Fantisitcs and Sporting News are aimed primarily at the fantasy audience.  The PECOTA projections used here is from their weighted mean spreadsheet and not from their depth charts which are better tailored to the fantasy game.  Oliver, Chone and ZiPS aren’t shown here since they didn’t predict saves, one of the components of SGPs.

IP Projections

System W L W% z score
Fantistics 96 66 0.595 2.41
Sporting News 93 69 0.572 1.82
Pecota Depth Charts 91 71 0.564 1.62
Community 81 81 0.500

Looking here only at the systems that try to predict playing time we see the main reason Fantistics is so successful at predicting fantasy value.  Just as it did for hitters, Fantistics excelled at projecting playing time for pitchers.  The “Community” line represents the fan projected playing time that Marcel relied on.

Rounding out the fantasy stats, let’s look at wins, saves and WHIP

Win Projections

System W L W% z score
Fantistics 102 60 0.627 3.23
PECOTA 95 67 0.586 2.20
Sporting News 92 70 0.568 1.74
Steamer 91 71 0.564 1.64
CAIRO 84 78 0.518 0.45
Chone 82 80 0.504 0.11
Marcel 81 81 0.500
ZiPS 80 82 0.491 -0.22

Another big win for Fantistics, PECOTA looks very strong here as well.  Not Marcel’s finest hour.

Before looking at saves I offer the following disclaimer:  Steamer left projecting saves to the last minute (right before Dash and Peter and the rest of the Steamers were departing for Florida to tune up for high school baseball season).  So, to the extent that Steamer had a system for projecting saves it was “Hey, Peter, come up with some save numbers for the guys you think are closers.”  “Okay.”  How did he do?

Save Projections

System W L W% z score
Sporting News 99 63 0.610 2.80
Fantistics 98 64 0.602 2.60
PECOTA 92 70 0.571 1.80
Steamer 91 71 0.562 1.57
Marcel 81 81 0.500
CAIRO 71 91 0.437 -1.60

Not bad, Peter.  Not bad.  He didn’t keep up with the experts at Sporting News and Fantistics but he did beat Marcel.

WHIP Projections

System W L W% z score
ZiPS 98 64 0.605 2.66
Steamer 94 68 0.583 2.11
PECOTA 91 71 0.561 1.55
Chone 90 72 0.557 1.46
Oliver 89 73 0.552 1.32
Fantistics 85 77 0.527 0.68
CAIRO 83 79 0.511 0.29
Sporting News 82 80 0.507 0.18
Marcel 81 81 0.500

This impressive showing by ZiPS is no surprise after it’s dominance in ERA.  My understanding is that Marcel regresses hits for pitchers no more than any other counting stat.  This makes it an easy target when it comes to WHIP.

Now, that we’ve covered the fantasy baseball categories, let’s to figure out what makes ZiPS so good at projecting ERAs by looking at the components of ERA.

K/9 Projections

System W L W% z score
Oliver 92 70 0.571 1.79
PECOTA 92 70 0.568 1.73
Chone 89 73 0.552 1.33
CAIRO 85 77 0.525 0.64
Steamer 85 77 0.524 0.60
ZiPS 82 80 0.505 0.13
Marcel 81 81 0.500
Sporting News 78 84 0.481 -0.48
Fantistics 74 88 0.455 -1.16

Marcel divides this category nicely into two groups: formula based systems above and rotisserie experts below.  Oliver and PECOTA take this category and we’re no closer to figuring out what ZiPS is doing better than everyone else.

Strikeout rate projections provide a better basis on which to judge how well these systems evaluate minor league stats.  Comparing the systems that project a wider pool of players to each other we get:

System            Correlation with actual K/9

Oliver 0.535
CAIRO 0.433
PECOTA 0.409
ZiPS 0.210

Oliver does thrive at applying MLEs to determine strikeout rates.

BB/9 Projections

System W L W% z score
Marcel 81 81 0.500 0.00
ZiPS 81 81 0.499 -0.01
Oliver 81 81 0.498 -0.06
Steamer 79 83 0.487 -0.34
PECOTA 76 86 0.468 -0.82
Chone 69 93 0.428 -1.83
Sporting News 68 94 0.422 -1.99
Fantistics 65 97 0.402 -2.50
CAIRO 32 130 0.197 -7.71

Two things jump out at us here.  First, none of the systems beat Marcel!  Okay, ZiPS and Oliver are essentially tied with Marcel but no one beats it.  Second, what the heck happened to CAIRO?  CAIRO excelled at projecting ERAs but how did it manage that with these walk projections?  I went back to the original CAIRO spreadsheet to make sure I hadn’t messed up the data somehow but I can’t find anything wrong with it.  Cairo simply had very extreme BB/9 projections.  Here are the distributions of BB/9 projections for Marcel and Cairo side by side:

BB/9 Percentile Marcel CAIRO
97.5% 2.24 BB/9 1.15 BB/9
90% 2.74 1.77
75% 3.13 2.31
50% 3.47 2.88
25% 3.75 3.53
10% 4.07 3.98
2.5% 4.42 4.57

The good news for CAIRO?  This didn’t seem to effect their ERA projections as much as I would have expected.  According to the equation for FIP every addition BB/9 should increase ERA by a third of a run.  And, controlling for K/9 and HR/9 an additional walk lead to an increase of 0.31 to 0.49 in ERA for the other systems.  CAIRO ERAs only increased by 0.19 for every additional BB/9.

HR/9 Projections

System W L W% z score
PECOTA 97 65 0.600 2.54
CAIRO 96 66 0.590 2.28
Chone 95 67 0.587 2.21
ZiPS 94 68 0.580 2.04
Oliver 85 77 0.525 0.63
Marcel 81 81 0.500
Steamer 78 84 0.481 -0.48
Fantistics 57 105 0.352 -3.78

PECOTA, CAIRO, Chone and ZiPS all win big here.  Steamer used a Marcel-like system to project HR/9 but for 2010 we’ll be using FB% and LD% to project HR’s and, hopefully, this will be a major area of improvement for us.  HR/9 is likely an afterthought for Fantistics since it’s not typically a fantasy category.  Sporting News doesn’t even bother to project it.  Fantistics, along with ZiPS, projected HR/9 most aggressively (the broadest distribution) while Marcel and Steamer, limited by their ignorance of batter ball data, projected the narrowest ranges.

Now let’s wrap K/9, HR/9 and BB/9 into a statistic for overall pitching goodness, I’ll call ~ FIP.  I’d use FIP but not all of these systems projected IBB and HBP so I just left them out and calculated:

~ FIP = (HR*13+BB*3-K*2)/IP

~ FIP Projections

System W L W% z score
Marcel 81 81 0.500
ZiPS 79 83 0.489 -0.29
Steamer 78 84 0.479 -0.54
PECOTA 77 85 0.478 -0.56
CAIRO 76 86 0.468 -0.83
Oliver 75 87 0.465 -0.88
Chone 72 90 0.442 -1.47

That’s right, none of these systems projected real ~FIP better than Marcel’s ~FIP.  In fact, if Marcel had simply made these ~FIP predictions its ERA predictions, only Cairo and ZiPS would have beaten Marcel in predicting ERA.  Actually, in light of Tango’s recent blog entry, I should point out that Marcel would do better than three of these systems (and do better itself!) if it just relied on it’s BB and K predictions and calculated ERA = 5.4+ 3*(BB-K)/IP.

For Steamer, I think the big lessons from all of this are:

  1. Use FB% and LD% to project HR rates.
  2. Don’t use a component ERA formula that doesn’t work as well as FIP.
  3. Just let Peter project saves.
  4. All of these systems have weaknesses.

With some much needed tinkering we hope Steamer will be significantly improved for the 2010 season.  We plan to run this analysis again after the 2010 season to see if our changes paid off.  Along those lines, I am going to try to collect projections (ideally, with MLBAM IDs) before the season starts so that I’ll have the results right after the season ends.  If you have projections with MLBAM IDs that you want included in next year’s analysis we’d be happy to oblige.

Comments (5)

  1. MP

    Can you do an analysis of the 2010 season — preferably including a composite analysis of the 2009-2010 seasons? Thanks!

  2. J. Cross

    We’ll have an analysis of the 2010 season up pretty soon (early March) and, hopefully, a composite analysis a little later on.

  3. MP

    Cool. Thanks. Would also be interested in comparing sites that project playing time, such as Rotoworld.

  4. Pingback: 2010 Forecast Evaluations (Part I) | Community – FanGraphs Baseball

  5. bablachE

    click to view gucci handbag outlet , for special offer


Leave a Comment

Your email address will not be published. Required fields are marked *