The Steamer Guide to Fantasy Draft Prep

Warning: What follows is a ridiculously Steamer-centric guide to your fantasy baseball preparation this year.

If you’re looking for a fantasy baseball provider with the ability to customize rules to fit your league… and provide sortable Steamer projections check out Rotovalue.

If you’re looking for for a fantasy baseball cheatsheet based purely on Steamer Projections, go to Last Player Picked’s Price Guide and select Steamer as your data source or select one of Brian Jenner’s files from our downloads page and upload it to the Last Player Picked site.

For a cheatsheet based on Steamer/Zips rate stats, Fantistics/Rotochamp playing time projections and the Point Shares scoring system go to Razzball.

Of course, you probably already know you can find Steamer on Fangraphs and that you can see how Steamer compares to other systems and get your hands on some nifty drafting software on the Rotochamp site.

Good luck!

Yu Darvish and Updated Projections

We’ve updated our playing time projections and posted new sheets. The new downloads include a number of players who were missing from our initial forecasts including Yu Darvish.

Yu Darvish 2012 Projections

SystemIPWLERAWHIPK/9BB/9Value
RotoChamp2001593.331.178.11.8$15
ZIPS1941373.621.257.82.1$11
Fantasy4111961383.561.188.12.3$12
CAIRO1901473.441.257.12.7$12
Steamer1691284.011.347.42.9$9

It’s worth noting that Steamer initially projected an ERA of 4.11 for Darvish until we informed Steamer that Darvish’s fastball has averaged 95.4 mph in spring training. That lowered his ERA to 4.01.

Why are the other systems so much more optimistic? A 4.01 ERA while starting for Texas is an excellent performance but are we underestimating his greatness?

Projecting Playing Time

As we get ready to revisit Steamer’s 2012 playing time projections, I thought I should take a look at which systems had the most success projecting playing time last year. Many thanks to Rudy Gamble of Razzball and Mike Spiher of Rotochamp who made this possible by supplying the data.

For all of the comparisons below, I adjusted each system’s projections so that the average projection of the system was the same as the average actual playing time of the group so that the overly optimistic system’s weren’t penalized for wearing rose-colored glasses.

RMSE in 2011 plate appearance projections:

The first two columns show the root mean square error in projections for the set of players who were projected by all of the systems and the column labelled “Starters” is limited further down to those players projected to have 400+ plate appearances by Fantasy411 (which is an average of 14 systems). I added this criteria in order to look at the group of players (230 of them) that fantasy players really care about.

The third and fourth columns include players who were only projected by some of the systems. Here systems without a projection for a given player were assigned a forecast of 0 PA for that player. As before, the Starters column only includes players expected to receive more than 400 PA by Fantasy411.

The Community forecasts excelled across the board, only getting narrowly edged out by Rotochamp among Starters who were projected by all of the systems. I can’t really brag about Steamer’s success because these projections were done right before the start of the season with the terrific benefit of being able to make comparisons with all of the other forecasts and tweak accordingly. We could have simply used the Community forecasts but we thought we could do even better. We didn’t.

I’ve highlighted two systems, “Fantistics 03/05″ and “Fangraphs 03/05″, because these projections came from earlier in the off season — March 5th to be exact. The Fangraph fan projections take a big hit when the study is opened up to all players since many players didn’t have Fangraph projections at this point, however, if we limit our focus to those who were projected by all of the systems, the fans look very impressive. Looking at Fantistics improvement between the 5th and 22nd of March, I suspect that Fangraph fan projections from closer to the start of the season may be comparable in accuracy to the Community projections.

Would it be possible to improve on the Community projections?

One possibility I explored was whether these projections would be improved by factoring in injury expert Will Carroll’s Traffic Lights. The following table shows the number of days spent on the disabled list in 2011 by players who were given green, yellow and red lights in his Team Health Reports:

Green players were certainly healthier than red players but as it turns out the Community knew better than to project high playing to to injury risks:

The players who were given the highest playing time projections from the fans are ALL green and yellow. Fans properly adjusted hitter projections for injury risk. Similarly, while older players got to the plate less, this wasn’t lost on the fans whose projetions could not be improved by factoring in age.

Projecting Innings Pitched
Is there’s no such thing as a durable reliever?

Let’s start by looking at days on the disabled list by traffic light.

Now, let’s see which systems had success projecting reliever innings.

Notes: Here “relievers” are defined as players that Steamer expected to pitch in relief prior to the season since I didn’t want to give these systems the benefit of hindsight. The player pool is limited to those players who received a health report.

That’s right, the Traffic Lights did a better job of forecasting relief innings than any of the projection systems. Put another way, none of the fantasy experts did very well. Most of them did roughly as well as “Mean”, a system that projects the same number of innings for every relief pitcher in this group. Why is this?

Notes: In the interest of fairness (fairness in terms of degrees of freedom, that is) I insisted that the Traffic Lights make the same deduction, 7 innings as it turns out, from green to yellow as from yellow to red. The Lights project 58 IP for a green pitcher, 51 for a yellow and 44 for a red. Also, I put an asterisk next to the Rotochamp line because they were unduly hurt by missing projections for a number of these players.

Let’s look at Community projected IP and actual IP while grouping pitchers by projected IP rounded to the nearest 10 innings:

The relievers who were expected to throw the most innings fell well short. This was true for relievers who were projected to be elite (based on Steamer ERA projections) and true for relievers given expected to be healthy (green lights). It begs the question: Are there any relievers who should be projected to throw more than about 60 IP? Here’s my challenge to anyone reading this: make a list of the 20 pitchers that you expect to throw the most relief innings in 2012 and post it in the comments section. Whoever gets the most innings wins.

If the Community had made such a list prior to 2011, they would have projected an average of 77 IP from a group that would go on to average 55 (25%ile: 33, 50%ile: 56, 75%ile: 66). Only three players in this group of 20 exceeded their IP projection. One was Alexi Ogando who was moved into the rotation and did what he could for the group average. The others were Craig Kimbrel and Jonny Venters. The Community top 10 and top 40 underachieved by similar amounts and this shortfall wasn’t unique to the Community projections, check other systems and you’ll see roughly the same thing. Forecasters overestimated their ability to determine which relievers would throw the most innings.

Starting Pitchers

Here the systems did much better and the Community did the best. All of the systems did much better than than the Traffic Lights alone and better than a simple system based on Traffic Lights and an ERA projection. The later Fantistics projections did much better than the early Fantistics projections and the early Fangraph Fan projections didn’t project enough pitchers to compete. It looks like there’s more to learn about starting pitchers than hitters during spring training. I looked for ways to improve upon the Community projections and, initially, thought that I’d found something when I saw this:

The relationship between the Community forecast’s residuals and Steamer’s projected ERA was certainly statistically significant (a two-tailed p-value of of .0045) and significant in size (knock off 20 innings for each point of ERA) but the relationship wasn’t robust when I fiddled with the player pool and it looks like this relationship could be explained by a non-linear relationship between projected IP and actual IP. I’ll need to revisit this with more data in hand.

There was also too little data to say for certain whether pitcher playing time forecasts could be improved by using either Will Carroll’s traffic lights or Jeff Zimmerman’s DL projections. The Traffic Lights had the more compelling statistical case and the larger effect size (add 8 innings for a green light and subtract 8 innings for a red light) and I suspect that both systems carry some information that the fans and forecasters aren’t fully incorporating.

So, there’s more work to be done here but I think I’ve seen enough to adjust Steamer’s 2012 playing time projections. We should have new downloads with updated playing time projections along with a handful of new players (including Yu Darvish!) ready late tonight or tomorrow.

Last Player Picked (updated)

Update: Thanks to Brian Jenner we now have .csv files for Last Player Picked tailored to the ESPN and Yahoo! position eligibility.

If you haven’t been there already, you should really check out Mays Copeland’s terrific website that creates fantasy baseball cheat sheets.

Steamer Projections for lastplayerpicked.com

Last Player Picked allows you to select from among several data sources or even upload your own projections. You can select your league’s categories or enter your league’s point system. On our downloads page you will now find .csv files that can be uploaded to Last Player Picked to create a Steamer-based cheat sheet specific to your league scoring system.

Let me know if you need more categories or find any bugs.

Nate Silver’s Confidence

Update: Here’s Nate Silver’s thoughtful analysis of accuracy of his projections from last week. Nate also points out that the errors aren’t necessary normally or even symmetrically distributed as I assumed in the analysis below.

On his fivethirtyeight blog, Nate Silver has made 55 projections for 2012 Republican Primaries, each one including a projected mean and a 90% confidence interval.

Earlier in the primary season, Tom Tango pointed out that Nate was too accurate, meaning that his error bars had been wider than needed to that point.

Now that there’s more data, I’ll revisit the question: Are Nate’s confidence intervals too wide? Too narrow?

It turns out that 50 of his 55 projections (darn close to 90%) are within his 90% confidence intervals. Of course, the errors are correlated. For instance, if you were too bullish on Ron Paul in Virginia it implies that you were too pessimistic about Mitt Romney. To get a better sense of the distribution of Nate Silver’s misses I followed Tango’s lead and calculated a z-score for each of his predictions. Here’s a histogram showing the distribution of the z-scores:

This looks fairly reasonable. You can see the two largest misses from Massachusetts and Virginia at the extremes.

So, how accurate is Nate Silver? Quite possibly, just as accurate as he says he is.

Download the data set used in the above analysis:
Nate Silver 2012 Primary Forecasts (56).

Stretching Out Chris Sale

Right now, we’re hedging our bets on Chris Sale, projecting that 29% of his innings come in relief. We have him throwing 129 IP with a 3.60 ERA, 9 wins and 15 holds. In this role he generates an impressive 9.8 K/9 with his 94.5 mph fastball. This line give him $9.30 of fantasy value making him our 72nd ranked pitcher.

But what if you don’t think Sale will make the rotation?

We give Sale 64 IP as a set-up man, Steamer forecasts a 3.34 ERA, a superb 10.5 K/9 and a fastball averaging 95 mph. Unfortunately, even with a handful of saves this line only provides $3.00 of fantasy value, making him roughly the 112th best pitcher available.

And if we let him face 850 batters over 30 starts and 200 innings? His strikeout rate falls to 9.5 K/9 and his ERA ticks up a bit to 3.74. He also wins 13 games, however, and throws his fastball at 94.3 mph. This makes him a $15.60 player, 22nd in our pitcher rankings.

Of course, Steamer doesn’t know how well his fastball will hold up over a full season and doesn’t know about his excellent changeup. But, based on what it does know, Steamer is bullish on Mr. Sale.

2012 Steamer Hitter Projections

Update: Version 3.0 of the pitcher projections is out. We adjusted Steamer to account for the lower run environment over the last couple of years.

The 2012 Steamer Hitter Projections are now available for download.

We’ve also uploaded version 2.0 of our pitcher projections. This version includes the projected fastball velocities that Steamer uses in its pitcher forecasts. For pitchers with prior MLB experience we generated these numbers using pitch f/x data from Fangraphs and Peter deserves a great deal of credit for laboriously typing in fastball velocities from the 2012 Minor League Baseball Analyst for the guys who haven’t yet made it to the show.

Enjoy and thanks in advance for your input!

Steamer Pitchers 2012 v 1.0

The initial run of the 2012 Steamer Pitcher Projections is now available for download. We’ll update our playing time projections more than once in the coming weeks and update our projections based on trades and signings. We’ll announce new downloads and new posts on our new twitter feed: @steamerpro.

We’re constantly working to perfect our system and find stats that are left unprojected and overlooked by the rest of the baseball world and the other projection systems so let us know what you’d like to see. Hopefully we can incorporate your ideas to further sculpt the system.

P.S. Hitter projections should be done within the week

Full Steam Ahead

Matt Swartz recently tested the 2011 player projections. Steamer held its own among the batter forecasters, landing in the middle of a tightly clustered pack with the other sabermetric systems. Steamer differentiated itself with its pitcher projections, however, which stood out from the pack and were the most accurate by each of Matt’s three measures.


This is the second straight year that Steamer’s pitcher projections came out on top as evidenced by MGL’s analysis of the 2010 projections. Our biggest improvement, between 2010 in 2011, was actually in our hitter projections which, after some careful tweaking, fared considerably better in 2011.

So, where do we go from here?

This year, we’ll be leaving our pitcher and hitter algorithms largely intact. In order to make our projections all the more useful to fantasy players, we’re aiming to provide our projections earlier and to have the most accurate playing time forecasts in the business.

Stay tuned!