## Evaluation of 2009 Pitcher Forecasts

Dash Davidson, Peter Rosenbloom and Jared Cross*

Is our projection system smarter than a monkey?

In order to answer this question for our projection system as well as several others we compared the accuracy of each system’s 2009 pitcher projections to Marcel the Monkey’s projections. We have previously examined hitter projections and you can see that study here.

We looked at a pool of 512 pitchers (all pitchers with a minimum of 10 IP and prior MLB stats or 25 IP and no prior MLB stats so long as 2 or more systems provided a projection). When systems did not provide projections for some of these 512 players that system was assigned the Marcel projection for those players and limited to beating Marcel (or losing to Marcel) on the projections it did make. Of these 512 pitchers, 51 of them did not have prior major league stats and, as a result, all received the same projection from Marcel:

**Joe Average, 25 IP, 4.50 ERA, 6.84 K/9, 3.6 BB/9, 1.08 HR/9, 1 W, 0 Sv**

Oliver was the most comprehensive system with projections for 510 of the 512 players. Steamer only projected 43% of the players (covering 60% of the innings) and Sporting News only projected 28% of the players (42% of the innings). Cairo, Chone, Zips and Pecota all projected at least 95% of the innings.

To make these results more compelling, and to give a sense of whether the differences between these systems are meaningful, we’re presenting the success of each system relative to Marcel as a W-L record over a 162 game schedule.

**ERA** **Projections**

System |
W |
L |
W% |
z score |

ZiPS | 100 | 62 | 0.618 | 3.01 |

CAIRO | 95 | 67 | 0.585 | 2.18 |

Chone | 91 | 71 | 0.564 | 1.62 |

Steamer | 90 | 72 | 0.558 | 1.47 |

PECOTA | 88 | 74 | 0.541 | 1.05 |

Oliver | 82 | 80 | 0.503 | 0.08 |

Marcel |
81 |
81 |
0.500 |
— |

Fantistics | 80 | 82 | 0.492 | -0.20 |

Sporting News | 73 | 89 | 0.454 | -1.18 |

This is a big win for ZiPS and a very strong showing for CAIRO. Steamer and Pecota beat Marcel soundly and Oliver and Fantistics hold their own with the monkey. Not a great year for Fantistics.

How should you interpret these standings? The W-L records show you how well each system did when compared to the Marcel system (the baseline projection system). How likely is it that ZiPS was really no better than Marcel and simply got really lucky this year? Just as likely as it is that a 100-62 team was really just a .500 club that got lucky. Not very likely.

For the statistically inclined, what I did was calculate a z-score for the hypothesis that each system was equally as good as Marcel and matched that up with a Z-score for the hypothesis that a team is really a .500 team given a certain winning percentage.

None of the systems demonstrated a statistically significant ability to project the ERAs of the 51 players without major league stats. Granted, 51 players may simply be too small of a sample in which to expect significant results when predicting ERA particularly for guys who didn’t throw that many innings. This is why I am excited about Oliver and Pecota projecting several years forward in time. These multi-year projections might prove useless but, without them, I think it will be difficult to discern which systems have the best grasp on minor league performances and scouting.

So, ZiPS wins big. Does this mean that you should simply use ZiPS pitching projections and you’ll dominate your fantasy league? ERA is, of course, only one of five categories in most fantasy leagues that also rank teams in strikeouts, wins, saves and WHIP. Standings Gain Points (or SGPs) is a system used to determine how valuable a pitcher was across all five categories. How good were these system in projecting total fantasy value?

**SGP Projections**

System |
W |
L |
W% |
z score |

Fantistics | 87 | 75 | 0.536 | 0.92 |

Marcel |
81 |
81 |
0.500 |
— |

Steamer | 81 | 81 | 0.498 | -0.06 |

Sporting News | 81 | 81 | 0.498 | -0.06 |

PECOTA | 73 | 89 | 0.453 | -1.19 |

CAIRO | 70 | 92 | 0.430 | -1.79 |

Only Fantistics beats Marcel. In fairness, only Fantisitcs and Sporting News are aimed primarily at the fantasy audience. The PECOTA projections used here is from their weighted mean spreadsheet and not from their depth charts which are better tailored to the fantasy game. Oliver, Chone and ZiPS aren’t shown here since they didn’t predict saves, one of the components of SGPs.

**IP Projections**

System |
W |
L |
W% |
z score |

Fantistics | 96 | 66 | 0.595 | 2.41 |

Sporting News | 93 | 69 | 0.572 | 1.82 |

Pecota Depth Charts | 91 | 71 | 0.564 | 1.62 |

Community |
81 |
81 |
0.500 |
— |

Looking here only at the systems that **try** to predict playing time we see the main reason Fantistics is so successful at predicting fantasy value. Just as it did for hitters, Fantistics excelled at projecting playing time for pitchers. The “Community” line represents the fan projected playing time that Marcel relied on.

Rounding out the fantasy stats, let’s look at wins, saves and WHIP

**Win Projections**

System |
W |
L |
W% |
z score |

Fantistics | 102 | 60 | 0.627 | 3.23 |

PECOTA | 95 | 67 | 0.586 | 2.20 |

Sporting News | 92 | 70 | 0.568 | 1.74 |

Steamer | 91 | 71 | 0.564 | 1.64 |

CAIRO | 84 | 78 | 0.518 | 0.45 |

Chone | 82 | 80 | 0.504 | 0.11 |

Marcel |
81 |
81 |
0.500 |
— |

ZiPS | 80 | 82 | 0.491 | -0.22 |

Another big win for Fantistics, PECOTA looks very strong here as well. Not Marcel’s finest hour.

Before looking at saves I offer the following disclaimer: Steamer left projecting saves to the last minute (right before Dash and Peter and the rest of the Steamers were departing for Florida to tune up for high school baseball season). So, to the extent that Steamer had a system for projecting saves it was “Hey, Peter, come up with some save numbers for the guys you think are closers.” “Okay.” How did he do?

**Save Projections**

System |
W |
L |
W% |
z score |

Sporting News | 99 | 63 | 0.610 | 2.80 |

Fantistics | 98 | 64 | 0.602 | 2.60 |

PECOTA | 92 | 70 | 0.571 | 1.80 |

Steamer | 91 | 71 | 0.562 | 1.57 |

Marcel |
81 |
81 |
0.500 |
— |

CAIRO | 71 | 91 | 0.437 | -1.60 |

Not bad, Peter. Not bad. He didn’t keep up with the experts at Sporting News and Fantistics but he did beat Marcel.

**WHIP Projections**

System |
W |
L |
W% |
z score |

ZiPS | 98 | 64 | 0.605 | 2.66 |

Steamer | 94 | 68 | 0.583 | 2.11 |

PECOTA | 91 | 71 | 0.561 | 1.55 |

Chone | 90 | 72 | 0.557 | 1.46 |

Oliver | 89 | 73 | 0.552 | 1.32 |

Fantistics | 85 | 77 | 0.527 | 0.68 |

CAIRO | 83 | 79 | 0.511 | 0.29 |

Sporting News | 82 | 80 | 0.507 | 0.18 |

Marcel |
81 |
81 |
0.500 |
— |

This impressive showing by ZiPS is no surprise after it’s dominance in ERA. My understanding is that Marcel regresses hits for pitchers no more than any other counting stat. This makes it an easy target when it comes to WHIP.

Now, that we’ve covered the fantasy baseball categories, let’s to figure out what makes ZiPS so good at projecting ERAs by looking at the components of ERA.

**K/9 Projections**

System |
W |
L |
W% |
z score |

Oliver | 92 | 70 | 0.571 | 1.79 |

PECOTA | 92 | 70 | 0.568 | 1.73 |

Chone | 89 | 73 | 0.552 | 1.33 |

CAIRO | 85 | 77 | 0.525 | 0.64 |

Steamer | 85 | 77 | 0.524 | 0.60 |

ZiPS | 82 | 80 | 0.505 | 0.13 |

Marcel |
81 |
81 |
0.500 |
— |

Sporting News | 78 | 84 | 0.481 | -0.48 |

Fantistics | 74 | 88 | 0.455 | -1.16 |

Marcel divides this category nicely into two groups: formula based systems above and rotisserie experts below. Oliver and PECOTA take this category and we’re no closer to figuring out what ZiPS is doing better than everyone else.

Strikeout rate projections provide a better basis on which to judge how well these systems evaluate minor league stats. Comparing the systems that project a wider pool of players to each other we get:

**System Correlation with actual K/9**

Oliver | 0.535 |

CAIRO | 0.433 |

PECOTA | 0.409 |

ZiPS | 0.210 |

Oliver does thrive at applying MLEs to determine strikeout rates.

**BB/9 Projections**

System |
W |
L |
W% |
z score |

Marcel |
81 |
81 |
0.500 |
0.00 |

ZiPS | 81 | 81 | 0.499 | -0.01 |

Oliver | 81 | 81 | 0.498 | -0.06 |

Steamer | 79 | 83 | 0.487 | -0.34 |

PECOTA | 76 | 86 | 0.468 | -0.82 |

Chone | 69 | 93 | 0.428 | -1.83 |

Sporting News | 68 | 94 | 0.422 | -1.99 |

Fantistics | 65 | 97 | 0.402 | -2.50 |

CAIRO | 32 | 130 | 0.197 | -7.71 |

Two things jump out at us here. First, none of the systems beat Marcel! Okay, ZiPS and Oliver are essentially tied with Marcel but no one beats it. Second, what the heck happened to CAIRO? CAIRO excelled at projecting ERAs but how did it manage that with these walk projections? I went back to the original CAIRO spreadsheet to make sure I hadn’t messed up the data somehow but I can’t find anything wrong with it. Cairo simply had very extreme BB/9 projections. Here are the distributions of BB/9 projections for Marcel and Cairo side by side:

BB/9 Percentile |
Marcel |
CAIRO |

97.5% | 2.24 BB/9 |
1.15 BB/9 |

90% | 2.74 | 1.77 |

75% | 3.13 | 2.31 |

50% | 3.47 | 2.88 |

25% | 3.75 | 3.53 |

10% | 4.07 | 3.98 |

2.5% | 4.42 | 4.57 |

The good news for CAIRO? This didn’t seem to effect their ERA projections as much as I would have expected. According to the equation for FIP every addition BB/9 should increase ERA by a third of a run. And, controlling for K/9 and HR/9 an additional walk lead to an increase of 0.31 to 0.49 in ERA for the other systems. CAIRO ERAs only increased by 0.19 for every additional BB/9.

**HR/9 Projections**

System |
W |
L |
W% |
z score |

PECOTA | 97 | 65 | 0.600 | 2.54 |

CAIRO | 96 | 66 | 0.590 | 2.28 |

Chone | 95 | 67 | 0.587 | 2.21 |

ZiPS | 94 | 68 | 0.580 | 2.04 |

Oliver | 85 | 77 | 0.525 | 0.63 |

Marcel |
81 |
81 |
0.500 |
— |

Steamer | 78 | 84 | 0.481 | -0.48 |

Fantistics | 57 | 105 | 0.352 | -3.78 |

PECOTA, CAIRO, Chone and ZiPS all win big here. Steamer used a Marcel-like system to project HR/9 but for 2010 we’ll be using FB% and LD% to project HR’s and, hopefully, this will be a major area of improvement for us. HR/9 is likely an afterthought for Fantistics since it’s not typically a fantasy category. Sporting News doesn’t even bother to project it. Fantistics, along with ZiPS, projected HR/9 most aggressively (the broadest distribution) while Marcel and Steamer, limited by their ignorance of batter ball data, projected the narrowest ranges.

Now let’s wrap K/9, HR/9 and BB/9 into a statistic for overall pitching goodness, I’ll call ~ FIP. I’d use FIP but not all of these systems projected IBB and HBP so I just left them out and calculated:

~ FIP = (HR*13+BB*3-K*2)/IP

**~ FIP Projections**

System |
W |
L |
W% |
z score |

Marcel |
81 |
81 |
0.500 |
— |

ZiPS | 79 | 83 | 0.489 | -0.29 |

Steamer | 78 | 84 | 0.479 | -0.54 |

PECOTA | 77 | 85 | 0.478 | -0.56 |

CAIRO | 76 | 86 | 0.468 | -0.83 |

Oliver | 75 | 87 | 0.465 | -0.88 |

Chone | 72 | 90 | 0.442 | -1.47 |

That’s right, none of these systems projected real ~FIP better than Marcel’s ~FIP. In fact, if Marcel had simply made these ~FIP predictions its ERA predictions, only Cairo and ZiPS would have beaten Marcel in predicting ERA. Actually, in light of Tango’s recent blog entry, I should point out that Marcel would do better than three of these systems (and do better itself!) if it just relied on it’s BB and K predictions and calculated ERA = 5.4+ 3*(BB-K)/IP.

For Steamer, I think the big lessons from all of this are:

- Use FB% and LD% to project HR rates.
- Don’t use a component ERA formula that doesn’t work as well as FIP.
- Just let Peter project saves.
- All of these systems have weaknesses.

With some much needed tinkering we hope Steamer will be significantly improved for the 2010 season. We plan to run this analysis again after the 2010 season to see if our changes paid off. Along those lines, I am going to try to collect projections (ideally, with MLBAM IDs) before the season starts so that I’ll have the results right after the season ends. If you have projections with MLBAM IDs that you want included in next year’s analysis we’d be happy to oblige.