Feb 17

## Position Players: a Little More Average

Matt Wieters (32) swings the bat, probably hitting the ball in the process.

Author’s note: I began this journey attempting to determine approximately how much a team should spend on starting pitching relative to the overall team budget. The first two posts are here and here with an additional article here. I have as of yet been able to find any kind of answer, and honestly, I’m probably further away from a legitimate answer than when I first started. Oh, and there’s also the little matter of being way off with my numbers yesterday, which was so egregious that I had to update the post because I’m too dumb to perform basic math. But, I keep trying.

As my daughter sings, quoting Daniel Tiger, “Keep trying, you’ll get better. Try. Try. Try.”

In a previous post I discussed how the average salary for a starting pitcher has grown by nearly 454.35% since 1985, outpacing what the average Major League team has spent for total salary in the same given timeframe (363.7%). After adjusting salary figures to 2015 dollars using the Consumer Price Index, starting pitchers have jumped from an average of around $1.1M in 1985 to nearly$6M in 2013.

In relation to other positions, is that atypical? I didn’t have an answer in the previous post, but I have one now.

Methodology

I gathered my data using Sean Lahman’s database in R, which includes salary figures up to and including 2013. There is a newer version for Access and in csv with 2014, but since I’ve been using 2013 as my cutoff point in earlier posts, I’m sticking with it. Adding an additional year didn’t seem all that important.

For this post I gathered all the players who had played a position in the field from 1985 to 2013, separated them into position groups, and performed some dplyr, mean magic. For those players listed with multiple positions in the field, Ben Zobrist for example, I used the position where they appeared the most.

That’s it.

Play the Field

Before gathering the data and running the numbers, I assumed that catchers were likely to be poorly compensated compared to the other position groups. I was biased. Like NFL running backs, nobody in his or her right mind would pay a position that takes such a physical beating big bucks except for the rare Buster Posey or Yadier Molina. These are transcendent talents, MVP types, who deserve big bucks.

Nobody pays Jose Lobaton $3M as insurance for Wilson Ramos’ inevitable injury. GMs might do that with starting pitchers—smart teams like the Yankees take a flyer on Scott Baker for$1.5M or Gavin Floyd for $4M—but Neal Huntington isn’t sitting in his PNC Park office devising ways to overpay Francisco Cervelli. I underestimated just how poorly catchers have fared. Since 1985, catchers (the guys that are poetically referred to as “field generals” and represent well as real-life managers) have seen the average salary increase from$1.03M to $2.7M in 2013, or right around what starting pitchers averaged in 1998. First basemen haven’t averaged less than$2.8M since 1996. Catchers overall average salary has increased by 165.48% in the last 30-years. The cumulative rate of inflation has increased 120% in the same time period. For a highly skilled job that only a few people on the planet are capable of doing at such a level, outpacing inflation by 33% isn’t exactly making little kids give up their dream of being investment bankers for shin guards and a mitt. It’s a thankless job behind the dish.

Out on Spotrac there are only five catchers with an average annual salary above $10M. Compare that to second base (six), shortstop (seven), third base (seven), or first base (a whopping 14 with eight players averaging more than$20M a year). Catchers have to feel like second class citizens when the infielders dine together, and while they’re not exactly forced to carb load at buffets or feast at McDonald’s to make ends meet, images of a hobbled Jake Taylor from Major League in a Mexico hotel still come to mind.

Even second basemen, a group I thought would follow closely behind catchers, saw their average increase by 239.1% since ’85. They were third lowest with center fielders (neither Mike Trout’s extension had been negotiated yet nor Jacoby Ellsbury’s deal with the Yankees had been signed) being second lowest.

Well, here, in tabular format is the increase in average salary over the years across all positions, presented in 1000s:

 Year C 1B 2B SS 3B LF CF RF OF 1985 1028.15 1296.47 1000.17 913.94 1036.36 1054.62 1074.98 1051.24 1058.75 1986 751.56 1132.31 812.59 790.86 1095.68 974.54 927.39 1082.48 995.16 1987 841.57 1238.46 805.71 720.41 881.45 974.81 864.01 991.68 949.06 1988 737.02 1190.78 792.94 868.18 976.21 964.34 851.00 1236.86 1013.14 1989 761.59 1340.19 778.17 934.18 874.11 1044.81 1188.23 1017.69 1074.95 1990 702.39 1451.87 1055.44 919.10 770.14 1100.40 1058.42 1120.80 1093.78 1991 943.82 2305.63 1323.72 1492.67 1205.21 1809.95 1759.62 1984.08 1853.74 1992 1174.10 2857.89 1683.42 1257.02 1415.05 2060.09 1921.20 2238.63 2095.36 1993 1090.12 2062.14 1468.68 1460.51 1669.39 1556.17 1857.96 1900.73 1762.65 1994 1133.72 2219.48 1465.21 1519.08 1763.64 1745.65 2078.83 1994.49 1919.48 1995 1027.43 2673.50 1257.13 1761.61 1577.06 1706.44 1769.38 1888.81 1790.90 1996 1114.58 2775.78 1409.47 1852.43 1512.56 1874.23 1924.78 2144.24 1980.93 1997 1396.93 3968.70 1837.51 1752.79 1971.74 2092.04 2033.59 2223.42 2124.60 1998 1247.41 3368.30 1810.55 1526.04 2010.20 2149.51 2565.53 1967.56 2209.02 1999 1529.94 3074.27 2318.23 1744.69 2154.05 2445.37 2113.07 3042.51 2536.76 2000 2107.40 4690.18 3050.42 2517.91 2368.18 2783.98 2612.51 4245.62 3211.35 2001 2109.25 4528.74 3054.90 3495.07 2509.80 3818.40 3020.20 4006.72 3610.47 2002 2503.09 5267.33 2018.84 4023.06 3446.74 4169.60 2917.23 4712.38 3954.39 2003 2605.07 5049.74 2055.66 4321.11 3189.12 4438.54 3382.56 5326.22 4403.79 2004 2228.22 4983.67 2051.49 2876.17 3432.22 3712.69 3522.98 4419.46 3897.82 2005 2521.38 4410.33 2269.13 2997.25 4049.87 3647.19 4007.00 4708.53 4120.56 2006 2660.64 4275.45 2399.61 3441.42 4712.17 3917.14 3480.35 4552.58 3963.89 2007 2511.26 5217.77 2407.17 4002.36 4398.48 4667.68 4185.91 3296.03 4042.52 2008 2494.25 5985.38 2631.44 3838.70 4829.44 4420.94 2754.63 4758.37 3988.74 2009 2428.54 5867.76 2640.78 3481.00 5025.94 4669.66 3238.05 4582.75 4178.14 2010 2170.81 6499.72 3607.18 3355.17 4357.31 3784.13 3961.51 4004.26 3902.70 2011 2177.06 6207.74 3235.61 2883.77 4575.96 4607.30 3069.23 4969.16 4281.58 2012 2359.67 5224.50 3322.54 2867.97 4154.01 4243.03 2914.37 4981.50 4028.01 2013 2729.52 5728.06 3618.97 3774.48 4575.73 4282.33 3645.41 5459.32 4411.58

Average Positional Player Salary in MLB Since 1985

If you needed further evidence that there was collusion in baseball in the mid-80s, the table above certainly offers a bit of insight. The average salary for position players decreased by 13.1% from 1985 to 1987. Only third basemen and right fielders saw an increase in average salary in 1986 (even starters saw the average salary drop 7.16%).

Speaking of right fielders, they were the only position group to outpace the average team spending increase, coming in just below the standard set by the starting pitchers. Right fielders increased by 419%. Some of those big money right fielders back in ’86-87 had names like Dave Winfield, Dale Murphy, and Jesse Barfield. In 2015 dollars, each of those gentlemen earned at or just above $4M dollars. Compared to the top earners of today, namely Giancarlo Stanton and Matt Kemp, that’s not all that overwhelming, but Winfield, Murphy, and Barfield were earning roughly 276% more than the average. Kemp is earning 266% more than the average right fielder so it’s comparable (still obscene). Of course I’m discussing outrageous amounts of money. I haven’t yet looked into relievers, though if I guessed I’d imagine their salaries might be in the same ballpark (percentage-wise) as third and first basemen (314% or thereabouts). Matt Wieters photo credit: Matt Wieters via photopin (license) Feb 16 ## Pitchers & Teams: Starting with Average Cliff Lee was one of the highest paid starting pitchers in 2013. In 1985, the owners probably would have paid him bupkis. I haven’t abandoned the idea of attempting to identify what teams should spend on starting pitching. Or if it’s even possible. Or if there’s any reason to question the whys and hows of Major League teams in regards to overall spending and maybe I should just wait for spring training and be distracted with actual baseball. If you’re interested, the first two posts are here and here (with a quasi-related article here). Why so much time between posts? Grad school. Also, I’m trying to account (ba-da-bum) for a sizable gap in knowledge needed to discuss this issue and my current level of knowing—or, in another word, studying. Price of Pitching I began reading Pay Dirt by Quirk and Fort, which discusses a great many things about the business of professional sports (it’s in the book’s title and everything– Pay Dirt: the Business of Professional Team Sports) but one section in particular discusses the average salary of MLB players and why they make so much. I won’t go into all the details, but I was curious how the rise in average salaries across positions matched up against the increase in overall team salaries over the years. Was there a position that benefited more than all the others? What position was neglected? In essence, is there some hidden value? Since I’m currently working on starting pitchers and had that data handy, I thought I’d look here first. Adjusting dollar figures to their 2015 equivalents, using the consumer price index1, I noticed that starting pitchers have fared well compared to overall team spending. Since 1985, starting pitchers have seen their average salary increase by 454.35%, from slightly over$1M to slightly under $6M in 2013. In comparison, average team spending has increased by 363.7%. In 1985, the average team spent$22.17M to the $102.8M spent in 2013. If you like visuals (of course you do), here are a few line graphs to illustrate the findings: Average Salary for Starting Pitchers Since 1985 Average Team Spending Since 1985 There’s a legitimate argument to be made to exclude data earlier than 1990-91. In 1987 the owners were found guilty of collusion, conspiring to suppress the cost of free agents, so any associated money for players is sketchy at best. It would likely take 3-4 years after the decision for the numbers to shake out, and you absolutely see a huge jump in both team spending and the average starting pitcher salary in 1991 (62.8% and 32.5% respectively). There’s another sizable jump in 1992 as well with team spending jumping up 27.6% from the year prior and starting pitchers averaging 23.6% more. I keep the information because I’ve introduced it in earlier posts. If we understand it’s there, we can at least deal with it. Since I’m discussing the percentages, here’s a line graph with both starting pitchers and team spending: Yearly Salary % for Starters and Teams From 1994-95, there’s a 12.4% drop in average salary for starters, which I found odd considering that in 1994 Jimmy Key was the max earner in my data, making$5.35M ($8.55M in modern day dollars) while in 1995 David Cone made$8M ($12.4M) in the last of a four-year deal he’d signed with the Royals in 1992. He started that season with the Royals, was traded to the Blue Jays in April, then traded in late July to the Yankees. 1995 is largely the reason we think of Cone as being a hired gun. In terms of actual dollars, the drop from 1994 to ’95 was about three-hundred thousand dollars in current dollars. That’s a sharp decline, but perhaps it’s related more to my data and the players listed. Perhaps someone was hurt. 1995 was the rookie years for Hideo Nomo, Andy Pettite, Steve Sparks, and Ismael Valdez. Heck, 1995 was the season the Mets debuted Bill Pulsipher and Jason Isringhausen, raising my hopes to unreasonable levels that once Paul Wilson arrived they’d be unstoppable. Lesson learned: don’t believe the hype. Public Enemy. Because. Anyway, salaries jumped back to 1994ian levels by ’97 and have typically increased every year since. In comparison, average team spending has been much steadier over the years. There were large spikes in 1997, 1999, and 2001, but average spending has increased by around 2-3% every year since ’01. It is interesting, however, to see how starting pitchers are used less and less but make more and more (showing at least in one way that teams see the importance of starters). In 1985 (with four-man rotations no less) starters averaged 140 2/3 innings pitched (with the median coming in at 148 1/3) while in 2013 that number had dropped to 113 2/3 (116.5 median). That’s not a bad life is it? Make 454.6% more while pitching nearly 24% fewer innings. Innings isn’t the only difference. Complete games have dropped from an average of 3.5 in ’85 to 0.5 in ’13 and games started from 21.5 to 18.7. Bah. I’m not here to talk about the good old days or try to argue something else entirely. Just an observation. My curiosity has gotten the better of me, and in posts to come I’ll look at the other positions and see how they compare. I have forgotten my original question, and I’m not refusing to give an answer. I just need to figure out how to answer it with some certainty. 1. I’m sure there’s a better way and my methodology is flawed. Alas. I’m not a finance expert, so if there’s a better method I’m open to hearing, learning, and applying it. Feb 08 ## An Addition – 1993 Cleveland Indians As well as using 18 starting pitchers in a season, the 1993 Cleveland Indians were also struck by tragedy in spring training when pitchers Steve Olin, Tim Crews, and Bob Ojeda were involved in a fatal boating accident that left Olin and Crews dead from “blunt force trauma to the head” and injured Ojeda. A very good write up of the events can be found here, as well as links to an interview with former manager Mike Hargrove and a transcript of an ESPN “Outside the Lines” retrospective on the accident. I missed that in my article from Friday. It would be disingenuous to suggest this is directly related to Friday’s article, or, more to the point, that somehow the boating accident in spring training in any way attributed to the various starters used, but with a little more research, I could have included the story into the particulars. Feb 06 ## Texas Rangers Use a Lot of Starters Yu Darvish is barely related to this article at all. Here’s a picture anyway. Working on my articles (parts one and two) where I begin to make sense1 of what teams are spending on starting pitching I noticed that I had some data that needed to be cleaned. More specifically, I noticed that for one reason or another the 1985 and 1987 Texas Rangers didn’t have any money apportioned for their bullpen. Well, after a quick check on Fangraphs, I can assure you that the Rangers teams in both ’85 and ’87 had relievers, used them quite frequently, and even had a few players with saves and everything. The way I defined a starting pitcher needed to be reworked, which will play into later posts, but the Texas Rangers’ starters did not toss 162 complete games. What I did notice, however, is that in both ’85 and ’87 the Rangers used a lot of starting pitchers. The team used 14 starters in 1985 and 12 in 1987. Wow. In a 162-game season, the Rangers were basically trying a new starter every 11.5 to 13.5 games. They weren’t, certainly. People make spot starts from time to time, but those seemed like a lot of pitchers to begin a game for a team with winning aspirations.2 Is it a lot, though? Just because I think that sounds high, maybe using 12-14 starters in a season is something all teams do. So, I decided to look into it just a little bit. This is for both us. I needed a break from trying to figure out rotation salaries. You needed a break from reading all of that. This isn’t completely off topic. I did technically come across this while researching the other articles. The first thing I realized is that teams evaluate a lot of young players as the seasons draw to a close and the rosters expand in September. So, to keep things relatively equal in data to be evaluated (or to not penalize good teams in pennant chases for not using additional starters down the stretch) I removed all of the players that made their debut in September of that particular year. I then looked at the numbers again, looking at things like median, average—you know, the typical stuff that one looks at in a situation like this. 12 starters isn’t all that high. Heck, since 1985 teams have used an average of 10 starters per season with the middle 50% coming in between nine and 11. With a standard deviation of 2.09, the ’87 Rangers were fairly typical. Not particularly noteworthy except that year Charlie Hough started 40 games, which is the last time a Major League starter started 40 or more games and with all probability will be the last. That was neat. I like unique things like that. Makes these Easter egg hunts all the more worth it. The ’85 Rangers were still shy of two deviations from the mean. Going by the 68-95-99.7 rule that states 68.27% of all data is less than one standard deviation from the mean, 95.45% is less than two, and 99.73% is less than three, neither the ’85 nor the ’87 teams are all that particularly interesting. Using 12 and 14 starting pitchers in a season is greater than average (even greater than the Rangers’ team average over that time), but that doesn’t place them in extraordinary company or anything. If we want to discuss an extraordinary use of starting pitching we can look to the 1993 Cleveland Indians. They used 18 starters that season, nearly four deviations from the mean, and it probably won’t shock you to learn that they finished 76-86 on the season. Two years later they would win 100 games, but in the early 90s they were churning through starters and suffering through sub .500 records. On that ’93 team, they gave starts to a 37-year old Bob Ojeda, a 36-year old Matt Young, and a 35-year old Mike Bielecki. Since 1985 there have been 20 occurrences where a team has used 15 or more starters in a season, and Texas has three of those. In both 2003 and 2004 they used 16 starters while in 2008 they used 15. The Kansas City Royals also managed to use 15 or more starters three times: In 1992 and 2003 the October Wunderkinds used 15 starters in each while in 2006 they used 17. The Royals are one of four teams that have used starters at or in excess of three standard deviations from the mean: the ’93 Indians, the 1996 Pittsburgh Pirates with 17, the 2003 Cincinnati Reds also at 17, and the 2006 Royals. Out of 828 individual team seasons, that sort of thing happens 0.48% of the time. Only the Indians topped 70 wins out of those teams. If you’re wondering what teams averaged the most and fewest starters over the 29 years dating from 2013 back to ’85, I have that information as well. Technically, the Colorado Rockies have averaged 11 starters, which led this group, but since they began play in 1993 the set of data isn’t entirely equivalent. For 29 years, Texas has averaged 10.97 starters with the New York Yankees and the Royals tied for second at 10.83. The Atlanta Braves averaged the fewest with 8.28. The latter definitely speaks to the quality of their starters but also their durability. The table below lists the values for all of the teams over the years. For clarity’s sake, I cleaned up the data so that all iterations of the Angels, Marlins, and Expos/Nationals are categorized by franchise.  Team Min 1st Quadrant Median Mean 3rd Quadrant Max Range ARI 6 8.75 10 9.312 10.25 11 5 ATL 6 7 8 8.276 9 12 6 BAL 8 9 11 10.59 12 14 6 BOS 7 9 10 10.31 11 13 6 CHA 6 8 9 9.414 11 14 8 CHN 7 8 9 9.759 10 15 8 CIN 6 9 10 10.17 11 17 11 CLE 6 9 10 10.72 12 18 12 COL 8 10 11 11 12 15 7 DET 6 9 10 10.34 12 15 9 HOU 6 8 9 9.103 10 14 8 KCA 7 9 11 10.83 12 17 10 LAA 6 8 9 9.552 10 16 10 LAN 6 8 9 9.103 10 12 6 MIA 8 9 10 9.857 11 12 4 MIL 6 9 10 10.07 11 13 7 MIN 7 8 10 9.517 11 13 6 NYA 7 9 11 10.83 12 15 8 NYN 6 8 10 9.862 11 13 7 OAK 6 9 10 9.724 11 13 7 PHI 7 9 10 10.34 12 15 8 PIT 7 9 10 10.28 11 17 10 SDN 7 9 10 10.41 12 15 8 SEA 5 9 10 10.14 12 15 10 SFN 6 8 10 9.655 12 15 9 SLN 6 8 9 9.655 10 14 8 TBA 6 7.75 10 10 12 14 8 TEX 7 9 11 10.97 12 16 9 TOR 7 9 10 10.07 12 13 6 WAS 7 9 11 10.62 12 14 7 Summary of Starters Used by Major League Franchises Since 1985 Also, here’s a colorful boxplot of this same information. I’m doubling up because I thought you deserved a graphic. Also, the colors remind me of Otter Pops, and that makes me happy. That’s about all I have for right now. This post was more a diversion than analysis. It was interesting. Maybe one day soon I’ll try and determine if the use of more starters means anything or not. Does average starter age correlate to number of starters used or career WAR, FIP, whatever? I don’t know. There’s probably a lot that can be done with this information. Just not today. Yu Darvish photo credit: Texas Rangers Pitcher Yu Darvish via photopin (license) 1. Hasn’t happened yet. 2. Texas finished 62-99 in 1985 and 75-87 in 1987, so maybe survival was more interesting by the end of the season. They did finish 87-75 in ’86, so it’s not like they went into ’87 as a rebuild. Feb 02 ## Starting Pitching, the Mets, and Budgets Matt Harvey is awesome. Here is an obligatory picture. In the first post in this series, I explained that because of an article for District on Deck I started questioning the logic of spending big on starting pitching. Of course, that led me into wondering what teams should spend on starting pitching or if there’s any real answer to that question, and that led me here, to the second part of an undetermined number of posts meant to find some sort of answers. I’m not naïve enough to believe that I’m going to figure this out. These franchises have teams of business analysts, economists, statisticians, lawyers, and spiritual counselors to facilitate a long term strategy. Me? I’m some guy who’s spending the offseason learning a few things, abusing a few terms, and otherwise making a big mess of an incredibly complex subject. I forgive you if skip out on these posts. My feelings won’t be hurt. If you stay, however, we’ll spend some time working through fun things like the Gini coefficient, Lorenz curves, more boxplots, and snarky comments about the Mets.1 Data Before I begin with any critical examination, let me first explain my methodology. I gathered all my figures from two places: Sean Lahman’s database and Fangraphs. Lahman’s database, available in R as well, includes salary information dating back to 1985 while Fangraphs allows me to pull the fWAR for teams dating back to baseball’s beginnings. I won’t go back that far because that would be silly and it doesn’t match up with the salary information previously mentioned, but there you are. I’m looking at data from 1985 to 2013. 29 years of data should reveal some trends I hope. Also, in using Lahman’s database, it became apparent that for players that were traded in season, the salary information for the traded player was not listed for the acquiring team. This made it much easier on my part to not have to determine a way to prorate that salary across service time (dealing with the possibility of injuries being a big hurdle), but it does leave the actual salaries not 100% accurate in terms of what teams paid. The difference is huge for the people who cut the checks, but for this examination it won’t matter. Why not 2014? 2014 is available over at Spotrac and was handy for looking at the Max Scherzer article I wrote, but I decided against using it here. Lahman’s data does not include 2014 yet, so I wanted to keep the source consistent. I also calculated the sum of total team salary and rotation salary. That’s it. Team salary is obvious. Rotation salary includes every player who started at least one game for the team that season. It doesn’t really matter if it was just one start or 32 starts; the player’s salary is included in the total. Also, a more worrisome issue, the total does not include any players that missed the entire season. For example, the Twins paid Scott Baker$6.5M in 2012 even though he missed the season due to TJ surgery. His salary is not included in this examination, but there’s little to be done about that. I wanted to be honest with you. You deserve to know. Our relationship should be built upon trust after all.

New York, New York

Before attempting to find trends across all of baseball, I wanted to examine a team I’ve some familiarity with over the years: the New York Mets. Frequent readers of my blog (thanks, dad!) will recognize my proclivity in discussing the Mets, so choosing them should come as no surprise here. I thought it would be interesting to work out my ideas with the team I’ve grown up with, and in this way, before attempting to find some common understanding for all of baseball I’d find it with the Mets first.

Intuitively this approach makes sense as well. If I’m going to test a theory shouldn’t I construct that theory first? I’ll use the Mets to do this, spend lots of words in the process, and I’ll probably mention a few economic principles that don’t necessarily apply. Doesn’t that sound like fun? You bet it does.

 0% 25% Median Mean 75% 100% 0.092 0.253 0.279 0.284 0.34 0.45

Summary of Mets Rotation / Total Salary Since 1985

I’ll start the discussion with the table above. It’s a fairly straightforward table with the minimum, maximum, median, mean, and interquartile ranges for the percentage of team dollars allocated to starting pitchers since 1985. This doesn’t really mean much in and of itself. This table neither tells us why in 2011 the Mets allocated just 9% percent of team salary to the starters nor why in 1990 about 45% went to Dwight Gooden, Bob Ojeda, David Cone, Ron Darling, Sid Fernandez, and Frank Viola.

We’re not ending here. This is simply the beginning.

Here’s a scatterplot of the percentages with a lines across representing the average and the interquartile range. This is a nice visual. For the most part, in the last 29 years the Mets have neither blown the budget on starting pitching nor pinched their pennies. This is relative to internal team philosophy and not associated with the larger world of MLB, but we’ll get to that in time. For now, just from the numbers above and the scatterplot below, the Mets appear to be a team that budgets 25-34% of their dolla, dolla bills toward the starters.

I have a few questions that immediately come to mind, which I’ll work through. One thing that I notice from the information presented is that I can’t determine what kind of philosophy the Mets have embraced over the years. Do they value hitting over pitching? Is what they spend on starters in line with what other teams spend? Also, the most immediate question that comes to mind is this: was it worth it?

Did increasing or decreasing the percentage of money directed towards starting pitching have any effect on win shares? I suspect that it doesn’t. I suspect that there’s very little correlation at all, but it’s easy enough to find out: 0.217. The closer to 1 the stronger the correlation, and our value of 0.217 might as well be nonexistent. I created a scatterplot and it looked like Braille. This is not the variable you’re looking for.

FIP was even worse. It was -0.089. I even went so far as to calculate the average age of the starters for each season and see if there was any correlation between average age and fWAR (there wasn’t), percentage of monies (nope), and FIP. Well, with FIP there was a correlation coefficient of .587, which still doesn’t mean all that much but it’s the best I’ve seen so far. Maybe there’s something there with age that I’m overlooking. I’ll keep that one in the dataset just in case.

Looking at the rest of MLB since 1985, the Mets have remained in the same vicinity as what other teams are budgeting towards starting pitching. Here’s the summarized data from the other franchises:

 0% 25% Median Mean 75% 100% 0.034 0.209 0.270 0.271 0.332 0.705

Summary of MLB Rotation / Total Salary Since 1985

My eyes nearly popped out of my head when I saw that a team spent nearly 70.5% of their payroll on starters. The 2013 Tigers? No. That number belongs to the 1987 Rangers, which makes sense because my data currently has their bullpen having 0 dollars spent. Eh. I need to clean up the data in a few places then. Anyway, judging by the summarized data, I can at least begin to see the Mets have allocated more towards starting pitching than the rest of MLB since 1985. That’s something. I don’t know why they do what they do—did they have more of the top pitchers over that time? Did they employ the most expensive starters in the game? Did they play a premium to pitch in NY?

Still lots of questions.

To determine the Mets philosophy towards filling out a 25-man roster, I’ll just retrace my steps and look at the percentage of funds spent on the position players. Here are the min, max, median, mean, and interquartile ranges:

 0% 25% Median Mean 75% 100% 0.396 0.5021 0.564 0.567 0.597 0.780

Summary of Mets Position Players / Total Salary Since 1985

The scatterplot further illustrates the summarized data above. There were peaks when the Mets allocated large portions of their budget toward position players. Notably the team directed 77% of their budget in 1985 into the pockets of position players such as Gary Carter, Keith Hernandez, and George Foster. 1986 was more of the same with 70.5% going to the men shagging fly balls and fielding grounders. Similar to the scatterplot from the rotation, in 1990 the team allocated less than 40% of the budget to fielders while in 2011 the team directed 78% to the likes of David Wright, Jose Reyes, and Jason Bay. That year Bay topped 100 games (reaching 123) for the only time with the Mets. Reyes won the batting title. R.A. Dickey won the Cy Young. Other than Wright missing 1/3 of the season due to a stress fracture in his lower back and the team losing 85 games overall, things weren’t so bad. Dickey’s pitching was amazing to watch.

I digress.

I’ll end here for now. I still have lots of questions without too many answers, and in the next post I’ll see if I can clear away all of this confusion.

Matt Harvey photo credit: chrisswann26 via photopin cc

1. I don’t know when it became necessary to watch sports, have an MBA, and study statistics to write a blog but it’s reaching that point. Sorry for my whining interlude.

Jan 26

## Profligate GMing: How Much Is Too Much?

I’m working on an article for District that discusses the Nationals recent signing of Max Scherzer and how their starters’ salaries are beginning to get expensive. In short, the premise of the article is to discuss how much ball clubs, particularly the Nats, are spending in real dollars per win share produced by their starters. It’s interesting but due to word count limitations and Fansided’s policy not to bore the hell out of their readers, I couldn’t get too in depth. Fortunately, I have no such “No Boredom” policy at the Natty.

In addition to encouraging yawns, I’d also like to stress that a lot of what I’m doing comes from fumbling my way through statistics and R. It’s a certainty that I will misuse terms, fail to understand the basics of statistical analysis, and include the wrong charts to support my arguments.  You’ve been warned.

Setup

How much should teams spend on starting pitching? Scherzer signed for 7-years/$210M, but because of how the money is being paid by the Nats (with$105M deferred from 2022-28) the deal actually comes out to around $191M in present-day value. In 2020 and 2021, the Nats will be paying him$35M or so. That’s pretty extreme. Will he be worth it? By 2020-21, there’s a 99.99% chance that 35 million dollars will look every bit as awful as it looks right now, but by how much?

Others have looked into the expected return on Scherzer, notably SI’s Jay Jaffe, and he definitely answers Scherzer’s expected breakeven date (if any). Super. The Nats will be overpaying for an aging starter by ‘21, but how does that expensive starter affect the rest of the Nats plans for filling out the rotation? In short, what I’m attempting to identify is how much do teams typically spend on their 5-man rotation.

Is there a certain percentage of a team’s payroll that should be allocated to starting pitching? Where does it reach a point of diminishing returns? Philadelphia spent a whopping 71.5 million on starting pitching in 2014 as per Spotrac, and for that bill they received the second lowest return by fWAR (7.6) in the Majors. Cliff Lee missed 110 games; A.J. Burnett realized he left his heart and control in Pittsburgh; and Cole Hamels pitched well but not $22.5M well and the team sort of stunk and finished last in the East. Relying on aging pitchers, no matter the bona fides, is a minefield of risk and reward. Caveat emptor since I’m exhausting my basic Latin. Numbers Are Fun Using Spotrac, it’s a simple enough operation to determine how much teams were spending on starting pitching over the last few seasons. In 2013-14, the median cost for a starting rotation in the Major Leagues came in around 28-29 million with a few cheapskates like the Marlins ($3.1M and $3.8M) and the Indians ($5M in ‘14) and a few high rollers like the Giants ($70.3M and$64.2M) and the Dodgers ($74.2M and$77.4M). The median is a nice, comfortable neighborhood. For all of those big bucks spent, the median cost of a win came in at 2.3 and 2.6 million over the last two years.

Spending more didn’t necessarily make a team better while outfitting a staff with young, affordable talent didn’t make them demonstrably worse. Jose Fernandez in 2013 is a great example. Teams that finished at .500 or above did pay more for their starting five over the two years with the median at $29.8M and$37.3M respectively with an a win costing roughly the same, between $2.3M and$3.2M. That didn’t necessarily shock me. I expected the better teams to have staffs with older, established pitchers.

The important point here is that these are the median values. Some teams paid much more to fill out their respective rotation and associated cost per win while Cleveland paid less than 300K per win in 2014. Also, note that I’m discussing win shares and not digits in the standings.

None of this, though, really means anything. It’s a certainty that owners and team executives would love to replicate the Indians success from last year, pay around \$5M collectively for five starters, and finish above .500. Then again, every fan would want their team to spend like the Nats, forget about trading either Jordan Zimmermann or Doug Fister, extend Stephen Strasburg, and make it rain Benjamins in D.C. Heck, while they’re at it, why not let Fister walk and sign David Price next offseason too?

It really doesn’t work like that.

To begin answering the question of where’s a good place to be, cost-wise, when planning a pitching staff, it might be worth it to look at what actually keeps the costs under control and why Jon Lester and Scherzer both got paid.

The Middle

It likely won’t come as any great surprise to you that talent isn’t evenly distributed across the 30 big league clubs. Each pitching staff doesn’t employ precisely one Corey Kluber, one Garrett Richards, one Anibal Sanchez, etc. While every team certainly wishes it had a roster of All Star to solid hurlers, the ability to do so is subject to identifying and coaching talented players, physical development, opportunity, and a little luck. Richards’ knee injury proves that sometimes bad luck derails even the best seasons. This also isn’t an exhaustive list. Nutrition, psychological hang ups, etc.

Following the same logic, since talent isn’t exactly even across MLB, production is also not equitably distributed. This point is obvious, but I’m making it. There’s a reason that every year we read Top 10 lists that rank the best pitching staffs and fans secretly envy the Dodgers for their pitching riches. Here’s a bar graph to illustrate the point regarding production.1

R has a nifty function cut to handle this sort of job, and taking all the data from 1985 for qualified starters, it’s easy enough to see how starting pitchers have fared. 50.4% of qualified starters produced a season worth between 0 and 3 bWAR (see footnote for why I switched to bWAR for this discussion). 66.7%, or 2/3 of all seasons fell between 0 and 4, and nearly 79% between 0 and 5. Baseball-Reference defines 5+ as an All Star quality season while 0 of course is replacement level. So, 79% of all starters who actually qualified for the ERA title dating back nearly 30 years are replacement level to borderline All Star with just over 1/3, or 37.6%, at 2 bWAR or below.

*Update

Inspired by a paper I’m reading on the economics of baseball between 1985-2002, I’ve decided to include a boxplot of all seasons by bWAR since 1985. From this graph you can see the interquartile range for all qualified starters in a given year along with the mean. It also shows a few extreme outliers…aka Dwight Gooden in 1985.

Interquartile Range of Qualified Starters by bWAR Since 1985

Want to know why a guy like Scherzer is sleeping atop c-notes? His last two seasons where he’s produced 6.7 and 5.5 bWAR have occurred 14.2% of the time. That doesn’t even account for all the seasons of spot starts and breakdowns due to injury. If you’re betting a guy can stay healthy, like you do with all pitchers, you might as well try to hire the ones that give you a chance for something extraordinary.

By this logic, then, wouldn’t it be in a team’s best interest to sign the best free agent pitchers, budget be damned?

Presuming the team is owned by the Guggenheim Partners, this strategy might work if it wasn’t for things like age and decline. The table below will help explain what teams might expect from their aging free agent starters.

 Age Count Mean Median 19 8 2.750 2.70 20 29 3.231 2.90 21 21 2.400 2.40 22 58 2.751 2.35 23 131 2.797 2.70 24 184 2.929 2.75 25 238 3.107 2.90 26 258 3.034 2.90 27 240 2.753 2.60 28 227 2.925 2.70 29 192 2.919 2.80 30 183 2.765 2.50 31 179 2.825 2.40 32 134 2.682 2.45 33 114 2.583 2.50 34 94 2.767 2.70 35 71 2.698 2.40 36 52 2.842 2.85 37 36 2.988 2.55 38 24 3.395 3.30 39 20 3.155 2.85 40 22 3.618 3.50 41 16 2.481 2.50 42 8 3.050 2.15 43 4 1.675 1.50 44 5 2.480 2.40 45 3 1.866 2.00 46 3 0.433 0.30 47 1 0.900 0.90

Mean & Median bWAR by Age

This list includes both mean and median just to show I’m not hiding anything. The median is nice because it dismisses the outliers, and we can look into the data with a general idea of how the middle succumbs to age.

Regardless of which figure you flavor the argument with, or if you believe in this sort of thing at all, both mean and median show that production typically peaks around ages 25-26, stays strong through age 29, then begins to decline. Want to know at what age most free agent pitchers will be signing that lucrative contract?

Since 1985, the average age for a pitcher starting 60% of his appearances is 23.7 years old. That doesn’t even include an innings limit. It doesn’t matter, though, since even with an innings limit the age jumps to 23.9. Depending on six to seven years of team control, seven the way teams manipulate service time now, and a pitcher will probably hit free agency as he turns 30 or 31, or just as he enters those decline years.

What about the extremely productive starters, the ones what produce at an All Star or greater level? How is production affected by age? I’m glad that you asked. Here’s another table for all pitchers who’ve produced one season of greater than 5 bWAR since 1985.

 Age Count Mean Median 20 4 5.650 4.6 21 15 2.773 3.4 22 29 3.424 3.7 23 53 3.994 3.8 24 80 3.930 4 25 106 4.240 3.95 26 97 4.280 4.1 27 95 4.004 4.1 28 93 4.246 4.3 29 88 3.832 3.9 30 84 3.806 3.9 31 91 3.788 3.4 32 69 3.657 3.4 33 57 3.318 3 34 52 3.723 3.15 35 44 3.634 3.5 36 36 3.256 3.15 37 29 3.314 2.9 38 22 3.623 3.65 39 18 3.300 3.05 40 18 4.028 4.2 41 14 2.779 2.55 42 7 3.229 2.4 43 4 1.675 1.5 44 4 2.500 2.15 45 2 2.400 2.4 46 2 -0.200 -0.2

Mean & Median bWAR by Age by Starters > 5

From this table, their production remains much higher overall, but the decline still kicks in, though not until age 31.

Filling out a rotation full of big money free agents, no matter how talented, is great in theory, but as you’ve probably heard before, you’re paying for what that player has done in the past. Going forward, it’s not worth it. If a team can get similar, slightly reduced, value from a young pitcher at a cheaper cost, it just makes sense.

The Verdict

Just from these two examples, we can see that it is extremely difficult to hire every mega superstar pitcher due to scarcity, and it’s foolish to try to do so since you’re paying for a pitcher’s best early years. Maybe as a team executive, you’re hoping for the next Roger Clemens, but what happens if you end up with Ubaldo Jimenez or Ben Sheets? Take a look at Bret Saberhagen. Once he hit 31, he hurt his shoulder and was never the same again. He was an amazing pitcher that got bit by injury.

Like I said earlier, all pitchers are injury risks, so my point isn’t Saberhagen was injured because he pitched into his 30s. The point is that 30 years of history has shown that aging pitchers (aging players for that matter) will decline.

I’ll come back later to discuss the cost of pitching staffs since 1985, percentage of team payroll, and production.
Max Scherzer photo credit: Keith Allison via photopin cc

1. I’m switching gears here and using Baseball-Reference’s version of WAR only because the gathering of that data for next point—starting pitching by age—was easier from their site and I wanted to remain consistent. Also, the use of 1985 is rather arbitrary but it’s the earliest date in Sean Lahman’s database for salary information, so I’m sticking with it for the remainder of this exercise.

Jan 14

## Billy Beane, the A’s, and Knowing Things

Josh Donaldson (20) was traded to the Toronto Blue Jays in November of 2014. He was pretty good.

Maybe it shouldn’t come as any great surprise that Oakland GM Billy Beane and his staff of brainiacs is reshaping the A’s roster rather than developing. It is. A little. It shouldn’t be, though, if we were paying attention. Since Beane took over as the Oakland GM after the ’97 season, the A’s have climbed from a 65-97 team to perennial contender, finishing .500 or above 12 times and winning the West six times.

That doesn’t make the A’s super special. The Angels have also won the West six times in that same span, but while the Angels haven’t had a payroll below 100 million since 2005 (and averaging 150.8 million since 2012) the A’s have done it by spending on average 56.6 million over the last 15-years. This isn’t about economics. Don’t worry. I’m not going to argue Beane should win a Nobel for achieving similar results while spending half of what the Angels did over the same 15-year period.

No. The A’s tailspin last year was too fresh. Once leading the West by as many as six games as of June 19th, the A’s hit rock bottom and ended up 10-games back, barely eking out the second wild card over rival Seattle. The A’s had the stink of a loser amongst a crowd that likes plucky upstarts and good stories. It was only fitting the Royals came from behind in the wild card game. The A’s were losers.

That just proved it.

Trading Josh Donaldson seemed extreme. Even now, seeing what I’ve seen, I don’t really like the move all that much because Donaldson has been so good since becoming a full time player for good in mid-August of ’12. I don’t care that he’s 29. Entering his first year of arbitration eligibility (as a Super 2) he’ll still essentially be playing for free. Hey, the A’s got Brett Lawrie and Sean Nolin. Whatever.

I’ll just move along.

Then Beane traded Brandon Moss to Cleveland, Jeff Samardzija to the White Sox, and Derek Norris to the Padres. Yoenis Cespedes was jettisoned last season to bring in Jon Lester who signed with the Cubs. Jed Lowrie signed with the Astros, so the starting shortstop was now gone and the A’s still intended to play Eric Sogard.

It’s all good, though. Beane traded for Ike Davis.

So as the baseball world gushes over that trickster Beane for trading for Ben Zobrist and Yunel Escobar and restocking his farm system with live, young arms, it struck me completely by accident that it shouldn’t been tricky at all. The A’s were never that bad that they needed to be dismantled.

Run Differential – Again

As I’ve spent the past few days revisiting the year of the original Jurassic Park and The Sandlot I came across something that was sort of surprising: based on a run differential of +157, the A’s were expected to win 99 games last year. I’m sure it’s been mentioned elsewhere, and I just never caught it. There’s a lot to read. I consume information anymore yet rarely retain it. The modern curse.

99 games is one more than the Angels won last year. To be 11 games short of expectation is extreme. The A’s were damn unlucky.

In fact, as I’ve been waxing pathetic about the 1993 Mets for underachieving based upon expectations (here and here), I failed to notice that the 2014 A’s had the 15th worst residual (difference between actual winning percentage and expected winning percentage) mark since the beginning of the 20th century. Out of 2400 different teams, the A’s were the 15th worst, historically snake bitten.

I re-ran the numbers, determining a new exponent to use in the Pythagorean expectation formula based on all data since 1962. In this scenario, I used 1962 as my delineation point since that was the first year that both leagues played a full 162 game schedule. My new exponent dropped to 1.856, making my formula appear as thus:

$\frac{runs scored^{1.856}}{runs scored^{1.856} + runs allowed^{1.856}}$

The new formula didn’t change the A’s win expectations very much, but excluding all the early 20th century teams from my list bumped the A’s all the way to 8th in the unlucky list. For fun, I’m including the bottom ten (the Bin Ten?) that failed to meet expectations:

 Year Team Run Diff W-L Expected W-L Residuals 1993 Mets -72 59-103 73-89 -0.0887 1986 Pirates -37 64-98 77-85 -0.0798 1984 Pirates 48 75-87 87-75 -0.0747 1967 Orioles 62 76-85 88-73 -0.0740 1975 Astros -47 64-97 76-86 -0.0708 1999 Royals -65 64-97 75-86 -0.686 2006 Indians 88 78-84 89-73 -0.0678 2014 A’s 157 88-74 99-63 -0.0675 1972 Orioles 89 80-74 90-64 -0.0669 1993 Padres -93 61-101 71-91 -0.0642

Great Expectations – Fail

That’s a fairly startling list. The 2014 A’s outscored their opponents by 157 runs, and that’s by far and away the most until way down in my list I hit the 1990 Yankees in the 48th spot, a team that won 91 games with a run diff of 162. Okay, so wow. The A’s ran into some bad mojo last year and things went sideways.

There’s an argument to be made here that Beane and the A’s acted hastily, but to call the A’s unlucky and end it at that isn’t much. Even if it’s kind of historic in its magnitude doesn’t mean all that much. I guess I worry about this nifty little experiment of mine defining an argument rather than using it for support. It’s like that saying about when your only tool is a hammer everything looks like a nail.

My first thought was close games. You can’t outscore teams by that many runs and win all the close games too. It doesn’t really work like that. Going over the game logs at retrosheet.org, the A’s were tied with Toronto as the fourth worst team in the Majors last season by winning percentage in one-run games. The A’s were 21-28 with a winning percentage of .429. That’s bad, sure, but not epically bad. Seattle was 18-27 with a WP of .400 and finished one-game back of the A’s.

In the larger context of things, the A’s 21-28 is fairly tame as far as losing one-run games is concerned. Dating back to 2000 (stopped my search here since it was the first Beane administered team that finished first) the A’s WP of .429 sits 83rd worst out of 450. Okay, bottom 20% but not a real explanation.

That’s all teams, however. The A’s outscored their opponents by a lot. Certainly, they’re unique in that regard.

Eh. For all teams that have outscored their opponents by 100+ runs in a season—teams that are all good teams with pennant aspirations—the 2014 A’s were eighth worst out of 84 teams. Bad, sure, but Atlanta won 101 games in 2003 and owned a worse winning percentage (.405) in one-run games. Since 2000, Oakland was the only team to outscore their opponents by 150 or more runs and fail to win 90. The 2003 Houston Astros team were the next closest to achieve the feat, managing to outscore their opponents by 128 runs and winning only 87 games. That team was expected to win 94 games by run differential and finished a measly one-game back from the Cubs in the Central.

In all of the aughts, there are only six out of 84 teams (or 7%) to win fewer than 90 games while outscoring their opponents by 100 or more runs. It’s an incredibly difficult thing to do. It’s certainly not due exclusively to one-run games.

You also don’t remake a team based on something like that. Records in one-run games are essentially meaningless in determining quality of a ball club. A 15-14 win looks the same as a 1-0 when collating data. For instance, between 2005 and 2009, the Arizona Diamondbacks never once failed to win more one-run games than they lost. In those five seasons, they were an incredible 141-106 in close games (a .571 winning percentage) while having an overall record of 395-415 (.488).

In 2013, the A’s went 30-20 in one-run games, and since 2000 the team has nearly an identical winning percentage in one-run games, .540, as they do in winning percentage overall, .545. Sometimes things happen.

Widening the criteria a bit, I looked at the W-L record where the score was decided by either one or two runs. In that case, the A’s were 10-13 in games decided by two runs, bringing their overall total to 31-41 (.431). That winning percentage was the second lowest for teams scoring over 100 runs. Only the 2012 Cardinals had a worse WP (.394). That team lost to the World Champion Giants in the NLCS. If you’re wondering if it makes much difference if I just look at winning clubs, not necessarily large run differential clubs, the answer is no. For all teams that finished at .500 or above since 2000, the A’s were the fourth worst team in terms of WP in games decided by one or two runs.

Expanding this out further, games decided by three or fewer runs put the A’s at the top of the list. They were 14-13 overall in these games, making their total record 45-54 for a WP of .455. It’s not unprecedented to not do well in games decided by three or fewer runs and still win in droves. It is uncommon, however, to see a team outscore their opponents by so many runs and have a losing record. The 2007 Yankees did it. They managed to outscore their opponents by 191 runs yet go 36-42 in games decided by three or fewer.

Another thing you typically don’t see is a team win 90+ games while having a losing record in games decided by three runs. It does happen, as recently as 2013 when the Dodgers won 92 games while going 7-13 under these conditions, but they also went 25-21 in one-run games and 26-15 in two-run games. Playing sub-.500 ball in both one and two run games has happened just 15 times for teams winning at least half their games. Only three of those teams—the 2007 Yankees, the 2001 Cardinals, and the 2012 Rays—won 90 or more games and only the 2008 Indians (an 81-81 team) had a losing record in games decided by three runs. The A’s WP of .519 in three run games was the second worst under these conditions.

There are a few ways to look at this, and we can debate whether the A’s were unlucky or just not good enough. I simply found the exercise interesting. I ran the numbers through R, identifying a fairly tenuous relationship between winning percentage and winning percentage in games decided by three or fewer runs. The correlation coefficient was .55, but that was against teams that sat at .500 or better. For all teams since 2000, champions and dregs, the coefficient jumped up to .684. Still not terribly exciting. There is a stronger relationship, though, for wins and WP in games decided by three or fewer: .848. Ideally we’d want that number as close to 1 as possible, but that’s what I’m working with.

Well, here we go. After running a linear regression against predicted wins based upon the A’s actual winning percentage in games decided by three or fewer runs, the A’s were . . . surprise, a 74 win team last season.

Just for fun, I’m adding in a table of what last year’s teams won, were expected to win based upon run differential, and what they would win based on record in three run games:

 Team R RA Wins Exp Wins <= 3 Runs ARI 615 742 64 67 70 ATL 573 597 79 78 80 BAL 705 593 96 94 90 BOS 634 715 71 72 76 CHA 660 758 73 71 75 CHN 614 707 73 70 76 CIN 595 612 76 79 75 CLE 669 653 85 83 88 COL 755 818 66 75 61 DET 757 705 90 86 91 HOU 629 723 70 71 70 KCA 651 624 89 84 94 LAA 773 630 98 96 92 LAN 718 617 94 92 88 MIA 645 674 77 78 80 MIL 650 657 82 80 80 MIN 715 777 70 75 76 NYA 633 664 84 77 90 NYN 629 618 79 82 75 OAK 729 572 88 99 74 PHI 619 687 73 73 73 PIT 682 631 88 87 86 SDN 535 577 77 75 83 SEA 634 554 87 91 82 SFN 665 614 88 87 91 SLN 619 603 90 83 94 TBA 612 625 77 79 72 TEX 637 773 67 67 75 TOR 723 686 83 85 82 WAS 686 555 96 97 90

Wins and Expected Wins

Most of these are relatively close except for Oakland. Based upon their record in games decided by three runs or less, they should have won 14 fewer games, which was by far and away the worst amongst the 30 teams.

Here’s a fancy scatterplot to say thanks for reading:

I’m not here to explain how things went wrong or why they went wrong or when they did. For all I know, things went exactly as Beane intended them to go because he secretly coveted a corner infield of Lawrie and Davis, which sounds every bit like the traveling vaudeville show in White Christmas.

My whole point is that we shouldn’t be surprised by Beane and the remodel. This team was good enough to win 100 games and then it happened.

Josh Donaldson – photo credit: Keith Allison via photopin cc

Jan 12

## Pythagoras, Bill James, and the Mets

Yesterday I was working with simply using a linear regression to determine expected wins based on run differential (runs scored minus runs allowed). What about Bill James’ Pythagorean expectation? So, just to be thorough (sort of) I went ahead and looked at the difference between what the ’93 Mets should have won based on James’ formula and the 59 games the team actually won.

Oy vey.

By Pythagorean expectation, the ’93 team fell from the 36th worst team in Major League history1 to the 12th worst, even accounting for all the number wonkiness from the 19th century clubs. The good news is that there are two teams from the 20th century worse in this regard: the 1905 Chicago Cubs squad that won 92 games was seventh and the 1911 Pittsburgh Pirates that won 85 games. Of course, those are the Cubs that featured the famed trio of Tinker, Evers, and Chance along with Mordecai Brown. The Pirates featured Honus Wagner, Max Carey, and Fred Clarke.

Either by linear regression or PE, the ’93 team should have won 73 games. The residual (recall that residual is the error from the expected win total based on run differential to what actually occurred) based on Pythagorean expectation grew a little worse, however, as it dropped from -0.0849 to -0.0851.

Here’s an updated scatterplot to be thorough:

I even took this a step further and determined a new exponent to use in James’ formula that would more closely align with the actual winning percentages over the years2. For this pass, I eliminated all the teams prior to 1900 since there were fewer games, too much data needed to be cleaned, and honestly, I figured 19th century teams wouldn’t really be of much value. Anyway, after running a linear regression to determine the new exponent, I came up with 1.861.

So, my new formula looks like this:

$\frac{runs scored^{1.861}}{runs scored^{1.861}\ +\ runs allowed^{1.861}}$

How did the ’93 Mets do? Well, since this is a post about the 1993 Mets and I’m all about schadenfreude, the 1993 team bottomed out.  They were, based on the difference between what they should have won and what they actually did, the worst team since the turn of the 20th century.

And . . . the scatterplot:

Another way of looking at this information is that the ’93 Mets were the unluckiest team over the last 114 years.  Sure.  We’ll call that a silver lining. I like to think the unluckiest ones were those of us who rooted for this team back then.  I do miss Howard Johnson, though.  I loved Ho-Jo when I was a kid.

Being the inquisitive sort, you’re probably wondering what teams exceeded their expected win totals the most over the last 114 years. Since you asked, here’s a top ten list of the teams that exceeded our expectations the most (expected wins are rounded up):

 Year Team Wins Expected Wins Residuals 1905 Detroit Tigers 79 65 0.091 1981 Cincinnati Reds 66 57 0.086 2004 NY Yankees 101 89 0.075 1954 Brooklyn Dodgers 92 81 0.074 2008 LA Angels 100 88 0.074 1972 NY Mets 83 71 0.074 1984 NY Mets 90 78 0.072 1981 Baltimore Orioles 59 52 0.071 2005 Arizona Diamondbacks 77 66 0.070 1917 St. Louis Cardinals 82 71 0.069

Great Results – Not So Great Expectations

Two Mets teams made this list. Redemption! See, book learnin’ is fun.

1. Technically, we’re discussing professional baseball since the years begin in 1871 and the National League we all know and love wasn’t founded until 1876, but I’ll use Majors here for simplicity’s sake.
2. All of this can be found in the book Analyzing Baseball Data with R by the way, so it’s not like I’m some math wizard breaking new ground.

Jan 11

## In Retrospect, 1993 Mets Still Awful

As if I need further reminders that my high school years were miserable, I discovered today—completely by accident—that the 1993 Mets were the worst team in the last 84+ years (basically since the Babe Ruth and Lou Gehrig led 1931 Yankees) in actual wins vice expected wins based upon run differential.

Using a linear regression (trying things out from the book Analyzing Baseball Data with R), the 1993 Mets were expected to win 73 games with a run differential of -72. The Mets, in all of their absolute horridness, managed to win just 59 games. The residual of -0.089 (residual here is the error from the expected win total based on run differential to what actually occurred) is the lowest in the Majors since 1931. The ’93 squad is technically the 36th worst team in this regard, but every team “above” them in the list played baseball in the 19th century except for a 1931 Yankees team that scored 1067 runs, the sixth most in the history of baseball. Ruth and Gehrig each hit 46 home runs that year. Oh, and the ’31 Yankees won 94 games and were expected to win 108 games, which amazingly enough is only one game better than the Philadelphia Athletics actually finished.1

Here’s the scatterplot of the seasons since 1962. It looks the same if you go back further, so trust me on the actual “worst” part:

All of those orange circles are the Mets teams throughout the years, and the one at the very bottom represents . . . well, you get the idea.

It was the 1992 team that Bob Klapisch and John Harper wrote about in The Worst Team Money Could Buy, which was published in 1993. Let’s just say that 1993 wasn’t the best year to be a New York Mets fan.

On the bright side, it was Bobby Bonilla’s best year as a Met. It was also the year Vince Coleman threw firecrackers the equivalent of a quarter stick of dynamite out of a parked car at Dodger Stadium, injuring three including an 11-year-old boy and a 1-year-old girl. You can read more about that team over at Amazin’ Avenue if you want.

Sigh.

At least I can bury the pain of that season with memories of being an awkward teen.

1. Based on Bill James’ Pythagorean expectation, which is something different, the ’31 Yanks should have won “only” 100 games.

Jan 07

## Sigh. Mike Piazza Missed Again.

Mike Piazza and former manager Tommy Lasorda.

Statistically speaking, when you get all advanced metricky, Mike Piazza’s best seasons were his youthful ones spent with the Dodgers. He produced the best season ever by bWAR for a catcher (narrowly passing Johnny Bench 8.7 to 8.6) and he owns two of the top 13. Both of those were in Los Angeles, in 1997 and 1993 respectively, as he redefined the definition of a catcher from a broken kneed, game manager to an offensive force. He’s one of only five men to hit 40 or more homers playing primarily as a backstop (Piazza and Bench both did it twice), and he owns three of the top 10, five of the top 12, and nine of the top 27 seasons (think about him owning 1/3 of those) ever for homers by a catcher. Not all of those were with the Dodgers. His best three and part of a fourth by bWAR were, though.

I’m not arguing he was better then. I’m not arguing much of anything. I’m just stating a fact. Clearing my throat.

Looking back over the numbers from the years now called the Steroid Era, or the Selig Era as Joe Posnanski does, is similar to scanning the statistics after a season of MVP Baseball set to Rookie. Those 150 stolen bases and 40 straight starts of 15+ strikeouts sure seemed like an accomplishment at the time—you won a trophy after all—but in retrospect, maybe six players breaking Maris’ home run record was a bit much. Well, maybe not. Maybe it jumped the shark when you did it by August.

In that regard, leading into a discussion of Mike Piazza’s best years by presenting his offensive bona fides is akin to offering the savvy sommelier a watered down bottle of Sine Qua Non. Things sure look okay at first glance, but when you dig a little deeper, it’s difficult to swallow and keep down. When Bench led the Majors in homers in 1970 with 45, six players topped 40 home runs that year.1 When Piazza hit 40 home runs in 1997, he was tied for eighth with four others and there were 12 men who hit 40+. Mark McGwire and Ken Griffey, Jr. each hit over 50 with Larry Walker at 49. What the hell do I make of those numbers? They’re so cartoonish it’s impossible to use those as the foundation of an argument. But I’m not arguing anything. I’m just chatting with friends.

In 1972, Bench was the only player to reach 40 home runs. 1999 was the year after the famed home run record chase between McGwire and Sammy Sosa, after McGwire hit 70 home runs, but it still saw McGwire and Sosa top 60 (65 and 63 respectively) and 13 men hit 40 or more with another seven within three of the goal.

At the time I was amazed. Now I’m just amazed that an entire decade of my baseball watching life is a statistical anomaly.

All of this is important only in the sense that yesterday the Hall of Fame voting came and went and Piazza missed the cut. Again. He was close. His overall percentage rose from 62.2 last year to 69.9. He missed election by a scant 28 votes, and if history tells us anything it’s that he’ll likely be elected very soon.

I don’t know how I feel about any of that. I guess I’m upset. Technically Piazza’s best years were his earliest with the Dodgers, but the ones I remember most fondly are the years spent with the Mets. Mike Piazza helped the Mets make the playoffs for the first time since 1988 and the World Series for the first time since 1986. The 1990s were always something of dark time in Flushing for the Mets, and Piazza was part of a team that actually made baseball fun to watch again. No team could every supplant that ’86 squad as my all-time favorite team, but that ’99 team ranks right up there as one of my favorites. I could watch highlights of Turk Wendell and John Olerud all day if my wife let me.

So, I lead in with numbers because a paragraph of rain drop gifs for tears just wouldn’t send the right message.

I imagine a lot of voters are like me and aren’t quite sure what to make of the years where everyone grew to unimaginable proportions with numbers to match. Still, though, after all these years, I wonder if Roger Clemens throwing the bat at Piazza was the first indicator of roid rage or his undeniable douchebaggery:

You’re right. Douchebaggery of course.

I remember in 2001 watching the Mets play the Orioles at Camden Yards. At the time, I couldn’t believe how large both Piazza and Benny Agbayani were. Two of the largest human beings I’d ever seen in person. I felt awful for Josh Towers, the O’s starting pitcher. In comparison, he looked tiny like a high school freshmen. Piazza crushed a homer to center in that game, giving me something to cheer about. I left Camden that night thinking two things: Steve Trachsel really isn’t that awful, and Selig needs to make these fields bigger because the players are outgrowing the dimensions.

Like my thoughts on the matter, this post isn’t entirely coherent. I apologize. I feel a little like Stephen Dedalus from Ulysses. Stream of consciousness.

In my mind, Piazza easily passes the duck test. If he looks like a HOFer then he’s probably a HOFer. I don’t even consider Piazza’s candidacy open for any serious debate. He was amazing and deserves to be in. That he’s now missed election three years running is so absurd that it defies all logic except that entire era is absurd. Luis Gonzalez, a man who weighed about 200 pounds, jacked 57 home runs in 2001, which is three more than Mickey Mantle’s best season and three shy of Babe Ruth’s. So, yeah, it’s difficult to be that outraged over the HOF results when nothing at all makes sense.

I guess I’m not upset. I’m disappointed. Others aren’t going to see Piazza how I do, and that’s fine. I look at the numbers, and I see memories rather than columns and rows. I’m not disappointed with the voters because they did what they did and probably had a good reason for it…unless they voted for Aaron Boone to get in.

I’m disappointed because I can’t summon any level of anger or indignation. Not even an inkling. A part of my baseball watching life has to be discussed almost apologetically, with a sigh and a sad shake of my head, and when an oversight like Piazza missing the cutoff happens again I just shrug and say, “Sure, I understand.”

Whatever. I guess there’s always next year.

Mike Piazza photo credit: iccsports via photopin cc

1. There were only 24 teams then, so take that into consideration as well. This isn’t a discussion of who was the greatest catcher of all-time, so I’m not going to dive too deeply into these numbers.