Blog Posts

Evaluation of 2009 Pitcher Forecasts

Dash Davidson, Peter Rosenbloom and Jared Cross*

Is our projection system smarter than a monkey?

In order to answer this question for our projection system as well as several others we compared the accuracy of each system’s  2009 pitcher projections to Marcel the Monkey’s projections.  We have previously examined hitter projections and you can see that study here.

We looked at a pool of 512 pitchers (all pitchers with a minimum of 10 IP and prior MLB stats or 25 IP and no prior MLB stats so long as 2 or more systems provided a projection).  When systems did not provide projections for some of these 512 players that system was assigned the Marcel projection for those players and limited to beating Marcel (or losing to Marcel) on the projections it did make.  Of these 512 pitchers, 51 of them did not have prior major league stats and, as a result, all received the same projection from Marcel:

Joe Average, 25 IP, 4.50 ERA, 6.84 K/9, 3.6 BB/9, 1.08 HR/9, 1 W, 0 Sv

Oliver was the most comprehensive system with projections for 510 of the 512 players.  Steamer only projected 43% of the players (covering 60% of the innings) and Sporting News only projected 28% of the players (42% of the innings).  Cairo, Chone, Zips and Pecota all projected at least 95% of the innings.

To make these results more compelling, and to give a sense of whether the differences between these systems are meaningful, we’re presenting the success of each system relative to Marcel as a W-L record over a 162 game schedule.

ERA Projections

System W L W% z score
ZiPS 100 62 0.618 3.01
CAIRO 95 67 0.585 2.18
Chone 91 71 0.564 1.62
Steamer 90 72 0.558 1.47
PECOTA 88 74 0.541 1.05
Oliver 82 80 0.503 0.08
Marcel 81 81 0.500
Fantistics 80 82 0.492 -0.20
Sporting News 73 89 0.454 -1.18

This is a big win for ZiPS and a very strong showing for CAIRO.  Steamer and Pecota beat Marcel soundly and Oliver and Fantistics hold their own with the monkey.  Not a great year for Fantistics.

How should you interpret these standings?  The W-L records show you how well each system did when compared to the Marcel system (the baseline projection system).  How likely is it that ZiPS was really no better than Marcel and simply got really lucky this year?  Just as likely as it is that a 100-62 team was really just a .500 club that got lucky.  Not very likely.

For the statistically inclined, what I did was calculate a z-score for the hypothesis that each system was equally as good as Marcel and matched that up with a Z-score for the hypothesis that a team is really a .500 team given a certain winning percentage.

None of the systems demonstrated a statistically significant ability to project the ERAs of the 51 players without major league stats.  Granted, 51 players may simply be too small of a sample in which to expect significant results when predicting ERA particularly for guys who didn’t throw that many innings.  This is why I am excited about Oliver and Pecota projecting several years forward in time.  These multi-year projections might prove useless but, without them, I think it will be difficult to discern which systems have the best grasp on minor league performances and scouting.

So, ZiPS wins big.  Does this mean that you should simply use ZiPS pitching projections and you’ll dominate your fantasy league?  ERA is, of course, only one of five categories in most fantasy leagues that also rank teams in strikeouts, wins, saves and WHIP.  Standings Gain Points (or SGPs) is a system used to determine how valuable a pitcher was across all five categories.  How good were these system in projecting total fantasy value?

SGP Projections

System W L W% z score
Fantistics 87 75 0.536 0.92
Marcel 81 81 0.500
Steamer 81 81 0.498 -0.06
Sporting News 81 81 0.498 -0.06
PECOTA 73 89 0.453 -1.19
CAIRO 70 92 0.430 -1.79

Only Fantistics beats Marcel.  In fairness, only Fantisitcs and Sporting News are aimed primarily at the fantasy audience.  The PECOTA projections used here is from their weighted mean spreadsheet and not from their depth charts which are better tailored to the fantasy game.  Oliver, Chone and ZiPS aren’t shown here since they didn’t predict saves, one of the components of SGPs.

IP Projections

System W L W% z score
Fantistics 96 66 0.595 2.41
Sporting News 93 69 0.572 1.82
Pecota Depth Charts 91 71 0.564 1.62
Community 81 81 0.500

Looking here only at the systems that try to predict playing time we see the main reason Fantistics is so successful at predicting fantasy value.  Just as it did for hitters, Fantistics excelled at projecting playing time for pitchers.  The “Community” line represents the fan projected playing time that Marcel relied on.

Rounding out the fantasy stats, let’s look at wins, saves and WHIP

Win Projections

System W L W% z score
Fantistics 102 60 0.627 3.23
PECOTA 95 67 0.586 2.20
Sporting News 92 70 0.568 1.74
Steamer 91 71 0.564 1.64
CAIRO 84 78 0.518 0.45
Chone 82 80 0.504 0.11
Marcel 81 81 0.500
ZiPS 80 82 0.491 -0.22

Another big win for Fantistics, PECOTA looks very strong here as well.  Not Marcel’s finest hour.

Before looking at saves I offer the following disclaimer:  Steamer left projecting saves to the last minute (right before Dash and Peter and the rest of the Steamers were departing for Florida to tune up for high school baseball season).  So, to the extent that Steamer had a system for projecting saves it was “Hey, Peter, come up with some save numbers for the guys you think are closers.”  “Okay.”  How did he do?

Save Projections

System W L W% z score
Sporting News 99 63 0.610 2.80
Fantistics 98 64 0.602 2.60
PECOTA 92 70 0.571 1.80
Steamer 91 71 0.562 1.57
Marcel 81 81 0.500
CAIRO 71 91 0.437 -1.60

Not bad, Peter.  Not bad.  He didn’t keep up with the experts at Sporting News and Fantistics but he did beat Marcel.

WHIP Projections

System W L W% z score
ZiPS 98 64 0.605 2.66
Steamer 94 68 0.583 2.11
PECOTA 91 71 0.561 1.55
Chone 90 72 0.557 1.46
Oliver 89 73 0.552 1.32
Fantistics 85 77 0.527 0.68
CAIRO 83 79 0.511 0.29
Sporting News 82 80 0.507 0.18
Marcel 81 81 0.500

This impressive showing by ZiPS is no surprise after it’s dominance in ERA.  My understanding is that Marcel regresses hits for pitchers no more than any other counting stat.  This makes it an easy target when it comes to WHIP.

Now, that we’ve covered the fantasy baseball categories, let’s to figure out what makes ZiPS so good at projecting ERAs by looking at the components of ERA.

K/9 Projections

System W L W% z score
Oliver 92 70 0.571 1.79
PECOTA 92 70 0.568 1.73
Chone 89 73 0.552 1.33
CAIRO 85 77 0.525 0.64
Steamer 85 77 0.524 0.60
ZiPS 82 80 0.505 0.13
Marcel 81 81 0.500
Sporting News 78 84 0.481 -0.48
Fantistics 74 88 0.455 -1.16

Marcel divides this category nicely into two groups: formula based systems above and rotisserie experts below.  Oliver and PECOTA take this category and we’re no closer to figuring out what ZiPS is doing better than everyone else.

Strikeout rate projections provide a better basis on which to judge how well these systems evaluate minor league stats.  Comparing the systems that project a wider pool of players to each other we get:

System            Correlation with actual K/9

Oliver 0.535
CAIRO 0.433
PECOTA 0.409
ZiPS 0.210

Oliver does thrive at applying MLEs to determine strikeout rates.

BB/9 Projections

System W L W% z score
Marcel 81 81 0.500 0.00
ZiPS 81 81 0.499 -0.01
Oliver 81 81 0.498 -0.06
Steamer 79 83 0.487 -0.34
PECOTA 76 86 0.468 -0.82
Chone 69 93 0.428 -1.83
Sporting News 68 94 0.422 -1.99
Fantistics 65 97 0.402 -2.50
CAIRO 32 130 0.197 -7.71

Two things jump out at us here.  First, none of the systems beat Marcel!  Okay, ZiPS and Oliver are essentially tied with Marcel but no one beats it.  Second, what the heck happened to CAIRO?  CAIRO excelled at projecting ERAs but how did it manage that with these walk projections?  I went back to the original CAIRO spreadsheet to make sure I hadn’t messed up the data somehow but I can’t find anything wrong with it.  Cairo simply had very extreme BB/9 projections.  Here are the distributions of BB/9 projections for Marcel and Cairo side by side:

BB/9 Percentile Marcel CAIRO
97.5% 2.24 BB/9 1.15 BB/9
90% 2.74 1.77
75% 3.13 2.31
50% 3.47 2.88
25% 3.75 3.53
10% 4.07 3.98
2.5% 4.42 4.57

The good news for CAIRO?  This didn’t seem to effect their ERA projections as much as I would have expected.  According to the equation for FIP every addition BB/9 should increase ERA by a third of a run.  And, controlling for K/9 and HR/9 an additional walk lead to an increase of 0.31 to 0.49 in ERA for the other systems.  CAIRO ERAs only increased by 0.19 for every additional BB/9.

HR/9 Projections

System W L W% z score
PECOTA 97 65 0.600 2.54
CAIRO 96 66 0.590 2.28
Chone 95 67 0.587 2.21
ZiPS 94 68 0.580 2.04
Oliver 85 77 0.525 0.63
Marcel 81 81 0.500
Steamer 78 84 0.481 -0.48
Fantistics 57 105 0.352 -3.78

PECOTA, CAIRO, Chone and ZiPS all win big here.  Steamer used a Marcel-like system to project HR/9 but for 2010 we’ll be using FB% and LD% to project HR’s and, hopefully, this will be a major area of improvement for us.  HR/9 is likely an afterthought for Fantistics since it’s not typically a fantasy category.  Sporting News doesn’t even bother to project it.  Fantistics, along with ZiPS, projected HR/9 most aggressively (the broadest distribution) while Marcel and Steamer, limited by their ignorance of batter ball data, projected the narrowest ranges.

Now let’s wrap K/9, HR/9 and BB/9 into a statistic for overall pitching goodness, I’ll call ~ FIP.  I’d use FIP but not all of these systems projected IBB and HBP so I just left them out and calculated:

~ FIP = (HR*13+BB*3-K*2)/IP

~ FIP Projections

System W L W% z score
Marcel 81 81 0.500
ZiPS 79 83 0.489 -0.29
Steamer 78 84 0.479 -0.54
PECOTA 77 85 0.478 -0.56
CAIRO 76 86 0.468 -0.83
Oliver 75 87 0.465 -0.88
Chone 72 90 0.442 -1.47

That’s right, none of these systems projected real ~FIP better than Marcel’s ~FIP.  In fact, if Marcel had simply made these ~FIP predictions its ERA predictions, only Cairo and ZiPS would have beaten Marcel in predicting ERA.  Actually, in light of Tango’s recent blog entry, I should point out that Marcel would do better than three of these systems (and do better itself!) if it just relied on it’s BB and K predictions and calculated ERA = 5.4+ 3*(BB-K)/IP.

For Steamer, I think the big lessons from all of this are:

  1. Use FB% and LD% to project HR rates.
  2. Don’t use a component ERA formula that doesn’t work as well as FIP.
  3. Just let Peter project saves.
  4. All of these systems have weaknesses.

With some much needed tinkering we hope Steamer will be significantly improved for the 2010 season.  We plan to run this analysis again after the 2010 season to see if our changes paid off.  Along those lines, I am going to try to collect projections (ideally, with MLBAM IDs) before the season starts so that I’ll have the results right after the season ends.  If you have projections with MLBAM IDs that you want included in next year’s analysis we’d be happy to oblige.

Quantifying Carroll

Projecting playing time accurately is one of the keys to predicting fantasy baseball value and forecasting injury risk is a large part of that.  Rick Wilton recently wrote, and Tango blogged about the need for an actuarial table for injuries.

With these thoughts in mind and armed with Josh Hermsmeyer’s Injury Database I decided to evaluate the success of Will Carroll’s Team Health Reports.  Each year, Carroll puts out Team Health Reports prior to the season in which he assigns players green, yellow and red lights based on their level of injury risk.  Will’s THR’s are based on an actuarial table with 12 factors including position, age, team, body mass, injury history, recovery time, role or position change and conditioning.

So, should fantasy players stop at Carroll’s red lights?

2009 Results:



Avg DL days

Avg DL stints































Yes.  For hitters, it looks like green and yellow light players are similarly risky but a
red light player spends almost twice as much time on the disabled list.  For starting pitchers, only green lights look safe and by safe I mean “still projected to spend more time on the DL than the average hitter.”  Notice that not only are starters in each group far riskier than hitters in that same group but starters are also more likely to be red-lighted (34% of pitchers to 26% of hitters). 

So, can Carroll’s expertise help us project playing time?

Marcel used community projected playing time (an average of fan projections).  I regressed  actual 2009 plate appearances against the communities projected playing time and a “Red Light?” variable that had a value of 1 or 0. 

Real PA = 206 + 0.613*Community Forecast – 52.1*Red

According to the regression (with a sample of 252 hitters from 2009), you subtract 52 plate appearances (with a standard error of 22 PA) when you see a red light.  A larger data set would allow us to narrow down the size of the red light effect but with a p-value of 0.02, I’m fairly confident that there is one. Is this just because fans are hopelessly optimistic whereas an expert in projecting playing time is essentially taking all of this actuarial data into account?

In our forecast evaluations, Fantistics proved to be the expert on playing time.  Do red lights add information to their playing time projections?

Real PA = 132 + 0.652*Fantistics – 36.6*Red

From Fantistics, we subtract 37 PA for a red light (with a standard error of 21 PA and p = 0.09).  It looks like there might be room to combine Fantistics eerie ability to see into the minds of MLB managers with Carroll’s actuarial table and project playing time even more accurately.  (Note: Peter Rosenbloom has already been playing around with the 2010 THR’s in hopes of using them to improve the upcoming Steamer Projections.)

What about the playing time projections in Baseball Prospectus’s depth charts from last year? They’re already taking Carroll’s actuarial data into account, perhaps.

Real PA = 188 + 0.630*BP – 46.6*Red

From Baseball Prospectus’s depth charts we subtract 47 PA’s for a red light (with a standard error of 22 PA and p = 0.03)

Finally, I wanted to see whether red hitters underperformed their projections.  Although, I wish I had a larger data set here, it looks like they did, or more precisely, yellow and green players outperformed their projections.

Injury Rating


Projected OPS*

Actual OPS*









*weighted by 2009 PA


A red rating foretold an OPS loss of 22 points relative to green/yellow (with a standard error of 12 OPS points, p = .06).  Breaking this down by the number of 2009 DL stints instead of Carroll’s injury rating we see:

DL stints




















Clearly, we shouldn’t pay too much attention to the third and fourth rows of this table given the sample size but it seems worth noting that players who had one DL stint ended up 8 points below their projections, while players who were (serious) injury-free outperformed their projections by 20 points.  I’m assuming that the effect of DL time on performance has been studied before but I can’t find anything like that.  I could argue that players who are performing badly are more likely to be placed on the DL but that doesn’t explain the red light effect.  Does this effect hold for previous years? This might be worthy of further investigation.

Evaluation of 2009 Hitter Forecasts

Dash Davidson, Peter Rosenbloom and Jared Cross*

In creating Steamer Projections we analyzed how best to use historical statistics to predict future ones. We broke batting and pitching performance into a series of components and used regression analysis to find the most effective way to combine previous years’ performances and the league average (regression to the mean) to predict future performance.

Now that the season is over and we have concrete data on how our system has performed, we need to find out in what areas our projections were accurate, and in what areas we need to improve.  We intend to add park factors and age effects to our system for the 2010 season but are there other improvements we need to make?

The best way to figure this out is to compare and contrast the results of our system to those of other, “rival”, systems.  One method of comparison was already done for us.  For our analysis, we examined key rate statistics for both hitters and pitchers: OPS, ERA; and also key counting statistics for both of them: Wins, Ks, IPs, Hrs, RBIs, PAs. We also decided to analyze our data from a purely fantasy baseball oriented standpoint, choosing the most prudent fantasy-oriented stat, SGPs (Standings Gain Points) and seeing which of the eight systems offered the best projections for fantasy domination, and how and why.

The forecasting systems:

Marcel –  The monkey.  Uses three years worth of data, weighing recent years more heavily, adjusting for age and regressing to the mean.  Marcel uses the same weights and age adjustments for each component.  Marcel projected rate stats and used community forecasts for playing time.

Steamer - Our system.  Steamer forecasts used the last 3 years worth of data for hitters and pitchers.  Our 2009 projections did not utilize park factors or minor league statistics.  Steamer is siimilar to marcel except a) although we always weigh recent seasons more heavily we have different weights for each component and regress some components more heavily than others, and, b) we did not take aging into account.  Also, Steamer only projected rate stats (this includes things like “stolen base attempts per times on base” and RBI/AB), we used Pecota’s depth chart projected playing time.

PECOTA - Pecota not only creates a projectiong for a player X, it retrospectively creates projections for the most X-like players in history and adjusts X’s projection based on how all of the X-like players performed relative to their would be projections.  Fancy.  The projections we used were from’s PECOTA weighted mean spreadsheet not their fantasy baseball depth charts.

CHONE – CHONE uses 4 years of data for hitters and 3 years for pitchers.  It utilizes batted ball data (the numbers of line drives, pop ups etc.), minor league statistics, batter’s weight and adjusts for league, park and age.

ZiPS – Like CHONE, ZiPS uses 3 years of data for pitchers and 4 years of data for most hitters (3 years for players under 24 and over 38).  It also uses minor league statistics and park factors but has different aging curves for different player types.  Uses GB/FB and handedness to project pitcher’s BABIP.

Sporting News – Sporting News publishes a widely used fantasy baseball guide each year.  Although we don’t know this to be true, we suspect that their projections are created by an expert rather than a formula.  This is the un-Marcel.

Fantistics- We analyzed Fantistics on the advice of Eric Mulkowsky who said that this system was particularly good in projecting playing time.

If we run a similar analysis in future years we would include OLIVER, CAIRO, Baseball Info Solutions (Bill James) and possibly Baseball HQ and other projection systems in the comparison.

Missing data – 475 hitters had 50 or more PA in 2009.  465 of these hitters had projections from each of the big 3 (chone, pecota and zips).  438 were projected by Marcel.  We looked at these 438 hitters.  Projection systems that projected fewer players (Steamer Projections and Sporting News were the main guilty parties) were given the Marcel projection for that player.  This allowed for a comparison of all 438 hitters across systems.  Sporting News and Steamer only projected about 270 players each.  Systems could beat the monkey so long as the projetions they actually made were better than Marcel.


We’ll explain the details below but we should point out that we relied heavily on the methods outlined in the following forecast evaluation studies:

Nate Silver’s analysis of his 2007 hitter projections

Tom Tango’s comments on Nate Silver’s work and the ensuing thread.

An article from from March of 2009 that’s no longer available.

Let’s start with hitters and our measure of fantasy baseball goodness:

Hitter SGPs

System Avg SGP Stdev SGP R with actual RMSE* Uniqueness

(R with Avg)

2008 4.19 3.48 0.653 2.67
Marcel 4.64 2.31 0.697 2.52 0.961
Steamer 4.90 2.53 0.696 2.53 0.961
PECOTA 4.51 2.87 0.657 2.65 0.949
Chone 5.33 2.32 0.687 2.56 0.934
ZiPS 5.17 2.77 0.688 2.55 0.945
Sporting News 5.23 3.11 0.707 2.49 0.963
Fantistics 5.42 3.24 0.723 2.43 0.950
Avg Projection 5.02 2.60 0.729 2.41
Actual (2009) 4.25 3.52

We looked at root mean square error (RMSE) in addition to correlation (R) because correlation mostly tells us whether a system has players projected in the correct order.  A system could have wildly over or under estimated the variance in true performance levels and still be well correlated with actual results.  Such a system would have a high RMSE, however.  Before finding the root mean square errors, all of the projections were “normalized” meaning that each system’s projections were multiplied by a the ratio of the average actual SGPs to the average predicted SGPs so that the normalized systems all projected an average of 4.25 SGPs.  We did this so that systems weren’t overly punished for missing league offensive levels or being optimistic/pessimistc.  (We do the same normalization later on when looking at the RMSE in OPS.)

Since Marcel is smarter than most monkeys we know, we added a considerably dumber monkey, “2008”, who expects every player to perform at exactly the same level he did the year before.  This dumber monkey indeed struggled while Marcel finished in the middle of the pack.

Also worth noting, each of projection systems has a smaller standard deviation across their projections than the standard deviation of actual results from 2008 or 2009.  This is as it should be.  The projection systems are trying to forecast true talent whereas the variance in actual results is a combination of the variance in true talent and the variance in luck.

You’ve probably also noticed that Fantistics beat the pants off the other systems with Sporting News finishing 2nd.  Perhaps it shouldn’t be surprising that projection systems aimed exclusively at winning fantasy baseball leagues did the best at projecting fantasy baseball value.

And, if you want the best linear equation of these systems for projecting actual 2009 SGPs:

Actual = 0.527*Fantistics + 0.409*Chone + 0.243*SportingNews – 0.761            (R2 = 0.547)

While Sporting News projected SGP’s better than Chone, Chone added more information to Fantistcs because Chone was the most unique system while Sporting News was the least unique system.

What systems were the most similar?  Chone was the most similar to Zips (R = 0.935).  Steamer was most similar to Marcel (R = 0.954).  Pecota wasn’t that similar to any system but closest to Chone (0.902) and Zips (0.903).  Surprisingly, Fantistics was most similar to Steamer (0.916) and Sporting News (0.913) and Sporting News was most similar to Marcel (0.948).  Most dissimilar systems?  Chone and Fantistics (0.856).

How robust was this result?  Could Fantistics just have gotten lucky?  One quck and dirty way to analyze this is to split the data into halves and see whether we would have come to the same conclusions looking at either half.  We assigned a random number to each player and split our 438 players into two subset of 219 players based on their random number.

System R (subset 1) R (subset 2)
2008 0.635 0.670
Marcel 0.668 0.726
Steamer 0.680 0.712
PECOTA 0.639 0.675
Chone 0.679 0.695
ZiPS 0.672 0.705
Sporting News 0.694 0.721
Fantistics 0.708 0.738
Avg Projection 0.711 0.748

We could certainly come to different conclusions about the relative quality of Marcel, Steamer and Sporting News based on which half we look at.  Neither our 2008 Monkey nor Pecota looks good in either half and Fantistics wins both halves by a solid margin and for both halves we’d be best off taking the average of the projections.

In order to see why Fantistics was successful we need to look at how well each system did in projecting several other metrics.

Hitter OPS

System Avg OPS Stdev OPS R with actual RMSE* Uniqueness (R with avg)
2008 0.763 0.183 0.401 1.84
Marcel 0.775 0.066 0.590 1.62 0.965
Steamer 0.780 0.066 0.567 1.66 0.955
PECOTA 0.763 0.084 0.564 1.66 0.926
Chone 0.770 0.075 0.638 1.55 0.954
ZiPS 0.770 0.083 0.623 1.57 0.964
Sporting News 0.781 0.074 0.568 1.65 0.945
Fantistics 0.787 0.086 0.583 1.63 0.925
Avg Projection 0.775 0.071 0.624 1.57
Actual (2009) 0.769 0.116 -

Note, these numbers (with the exception of stdev) are all weighted by 2009 PA.

While SGP’s may be the most meaningful metric for fantasy players, OPS is likely seen as more important by sabermeticians.  And, here, Chone dominates with ZiPS coming in 2nd.  Even more impressively, Chone actually beats the average projection.  Fantistics and PECOTA get points for being the most unique systems.  Steamer doesn’t do that well here, beating only PECOTA and there only by a nose.  We have some serious work to do for the 2010 version.  All systems badly beat our 2008 Monkey but only Chone and ZiPS beat Tango’s monkey.

Chone and Zips are similar to each other here (R = 0.956) which makes sense given their methodologies.  The simple systems, Marcel and Steamers are also similar (0.970).  Fantistics and Sporting News are both actually both most similar to Marcel (0.883 and 0.937, respectively).

And, if you want the best equation to project 2009 OPS:

ActualOPS = 0.716*Chone + 0.199*ZiPS + 0.081        (R2 = 0.414)

Although this equation doesn’t do much better than simply using Chone.

Looking at the same two subsets we used for SGPs we get:

System R (subset 1) R (subset 2)
2008 0.455 0.354
Marcel 0.607 0.571
Steamer 0.569 0.565
PECOTA 0.581 0.550
Chone 0.668 0.604
ZiPS 0.640 0.605
Sporting News 0.589 0.543
Fantistics 0.636 0.533
Avg Projection 0.642 0.602

I think this suggests the the difference between Chone and Zips might not be significant given our sample size.  Chone and Zips are the top 2 systems for both subsets and beat Pecota by a healthy margin for both pools of players.  I would feel reasonably confident in saying that Chone and ZiPS are ahead of the pack right now but not at all confident in saying that Chone is better than ZiPS.

So, if Fantistics was only in the middle of the pack in projecting OPS, how did it dominate SGP’s?

Hitter PA

System Avg PA Stdev PA R w/ actual
2008 367 215 0.642
«Marcel» 413 130 0.657
«Steamer» 419 133 0.666
PECOTA 416 150 0.565
Chone 486 95 0.580
ZiPS 457 135 0.583
Sporting News 442 158 0.694
Fantistics 448 181 0.721
Avg Projection 440 123 0.732
Actual (2009) 384 203

Ok, so Chone and ZiPS aren’t really trying to project playing time and, despite their excellence in projecting hitter quality (as evidenced by OPS) don’t do well here.  Pecota doesn’t try to project playing time in their weighted mean forecasts but does in their depth charts (used by Steamer).  The community forecasts that Marcel used do reasonably well, but not as well as the fantasy basebal gurus in projecting playing time.  Limiting this to the systems that try to project playing time (and using their proper names this time) we have:

System R with actual
Community Forecasts 0.657
Pecota Depth Charts for Fantasy 0.666
Sporting News 0.694
Fantistics 0.721

Fantistics does really excel at projecting playing time.  One advantage they present is that they update their projections throughout the offseason and these playing time projections are from immediately prior to the start of the season.  Looking again at the two subsets:

System R Subset 1 R Subset 2
Community Forecasts 0.629 0.683
Pecota Depth Charts 0.662 0.671
Sporting News 0.709 0.683
Fantistics 0.717 0.727

Fantistics wins both subsets.  They look like the authority on playing time.

In our quest to win our fantasy baseball leagues, we also need to project run production (R and RBI) and stolen bases.  Let’s look at how each system did at projecting these, independent of playing time.

(R + RBI)/PA

System Avg Stdev R w/ actual
2008 0.244 0.085 0.410
Marcel 0.246 0.030 0.606
Steamer 0.254 0.032 0.587
PECOTA 0.245 0.040 0.603
Chone 0.251 0.036 0.642
ZiPS 0.250 0.041 0.634
Sporting News 0.251 0.037 0.548
Fantistics 0.253 0.043 0.553
Avg Projection 0.250 0.034 0.645
Actual (2009) 0.241 0.051

Avg and R are weighted by 2009 PA

This is a bit surprising.  It looks like Fantistics was succesful in projecting SGPs, in spite of missing on Runs and RBIs.  Chone and Zips show up on top here so perhaps successfully projecting R and RBI might hinge on successfully projecting hitter quality as much as anything else.  Anyway, chalk up another win for Chone.

Looking at the two subsets:

System R (subset 1) R (subset 2)
2008 0.444 0.381
Marcel 0.621 0.592
Steamer 0.590 0.584
PECOTA 0.621 0.586
Chone 0.649 0.636
ZiPS 0.645 0.624
Sporting News 0.564 0.537
Fantistics 0.590 0.514
Avg Projection 0.663 0.625

Chone and ZiPS finish one and two in both subsets.  Fantisitcs looked to have a few fluke values that are drastically affecting their overall results in this category and created their wildy different results across subsets.


System Avg Stdev R w/ actual
2008 0.0172 0.0216 0.774
Marcel 0.0167 0.0130 0.811
Steamer 0.0169 0.0132 0.809
PECOTA 0.0167 0.0160 0.837
Chone 0.0167 0.0155 0.848
ZiPS 0.0172 0.0176 0.842
Sporting News 0.0172 0.0165 0.825
Fantistics 0.0168 0.0179 0.795
Avg Projection 0.0170 0.0151 0.847
Actual (2009) 0.0163 0.0180

Avg and R are weighted by 2009 PA

Another win for Chone but this one is close with Chone, Zips, Pecota and Sporting News all in the same neighborhood.

Here the standard deviations point to something that Steamer is doing wrong.  Steamer, like Marcel, regresses to the league average, and gives everyone stolen bases.  Do we need to use speed scores?  Dash calculated 2008 speed scores (using Bill James’s 5 factor system) to see whether they would have added information to our SB projections but, given our small sample size, they didn’t make a statistically signficant improvement.  We’ll be looking for ways to improve our SB projections for the upcoming season.  Will age adjustments do the trick?

Looking at the two subsets:

System R (subset 1) R (subset 2)
2008 0.802 0.748
Marcel 0.819 0.802
Steamer 0.829 0.789
PECOTA 0.851 0.823
Chone 0.831 0.859
ZiPS 0.839 0.842
Sporting News 0.846 0.803
Fantistics 0.814 0.775
Avg Projection 0.856 0.83

The two subsets look pretty different in this case leading us to believe that our one year sample might not be enough to confidently draw conclusions about which systems are the best at projecting stolen bases.  We might be able to distinguish the top 4 from the rest.


We have put some effort into improving our HR predictions for the upcoming season and Greg Rybarczyk was even kind enough to send us hittracker data which we haven’t figured out how to utilize to our advantage yet.  Anyway, we wanted to take a look at how we did projecting HR/AB.

System Avg Stdev R w/ actual
2008 0.0301 0.0238 0.621
Marcel 0.0313 0.0122 0.717
Steamer 0.0319 0.0126 0.713
PECOTA 0.0307 0.0149 0.733
Chone 0.0310 0.0150 0.752
ZiPS 0.0313 0.0159 0.749
Sporting News 0.0319 0.0140 0.720
Fantistics 0.0330 0.0170 0.730
Avg Projection 0.0316 0.0139 0.755
Actual (2009) 0.0320 0.0189

Avg and R are weighted by 2009 PA

Chone and Zips win again here with Chone edging out ZiPS.

It makes sense, perhaps, that Marcel would have a low standard deviation in HR rates since it regresses all components equally but why does Steamer have a low standard deviation in projected HR’s?  Are we not being aggressive enough in our HR projections?

One way to analyze this is to graph actual HR/AB v. predicted HR/AB for each system and look at the slope of each line.  The slope should, in theory, be 1.

System Slope
Marcel 1.023
Steamer 0.988
PECOTA 0.887
Chone 0.909
ZiPS 0.853
Sporting News 0.894
Fantistics 0.800

Actually, by the look of this data, Steamer is being just as aggressive as it should be.  One way to interpret this data is that when Fantistics projects a player to have 50% more HR’s than a peer, he turns out to only have 0.800*50% = 40% more.  Fantistics may be overly aggressive in projecting HR’s.  Still, a weak performance by Steamer projecting individual HR/AB which suggests that we need to regress to a height/weight mean instead of a league mean in our projections.

Also, we have to take our victories where we can get them and Steamers edged out Sporting News (you’d need to see the next decimal place but trust me on this one) in projecting the overall HR rate for this group of players.

Looking at the two subsets:

System R (subset 1) R (subset 2)
2008 0.641 0.600
Marcel 0.743 0.687
Steamer 0.734 0.689
PECOTA 0.751 0.714
Chone 0.778 0.724
ZiPS 0.780 0.715
Sporting News 0.760 0.677
Fantistics 0.747 0.712
Avg Projection 0.778 0.729

ZiPS wins one and Chone wins one but it’s crowded at the top and the evidence might not be compelling enough to say that ZiPS and Chone are better at projecting HR’s than Pecota or Fantistics.


It’s hard to know exactly what to take from this.  Chone and Zips seem to stand out in projecting hitter quality and they have somewhat similar methodologies which gives some hints about how to make good forecasts.  Fantistics succeeds in projecting SGP’s best based largely on its success in projecting playing time which suggests, perhaps, that other systems haven’t put enough thought into how best to project playing time.

It’s worth noting, also, that for the fantasy player, not all playing time is created equal.  If Jose Reyes and a #6 hitter are both projected for the same number of plate appearances and the same number of SGP’s, you’re probably better off taking Reyes.  When he’s playing, he’s getting more plate appearances and when he’s not, he’s on the DL and you can play someone who, although they might project to zero SGP’s over replacement, is better than an empty slot.

Up next: Pitchers