Caveat Emptor No More – The Mike Hampton Rule

In completely unshocking news, Mike Hampton’s hurt again. Now, he’s saying it’s just a minor groin strain and he won’t miss his next start, but you have to think the Braves are a little worried since Mike hasn’t thrown a pitch for them since 2005.

The fact that Hampton has been paid $29 million dollars the last two years not to pitch got me thinking about all the contracts that teams wish would just go away. Let’s say MLB instituted their own version of the Allan Houston Rule (let’s call it the Mike Hampton Rule), but instead allowed each team to completely void one contract (and paid it out of the central fund to avoid any grievances). Who would each team choose to get rid of?

For this exercise, past performance doesn’t matter. It’s a completely unscientific attempt to pick the one player on each team who is most overpaid for his likely performance going forward. Notice I didn’t say projected performance because that would require me to look at something like PECOTA, and that would take a lot of the fun out of this.

We’ll break this down by division and cover them all over the next few days. All contract information is from Cot’s Baseball Contracts and Baseball Reference.

National League East

Atlanta Braves

Well, this is last year of the aforementioned Hampton contract*, since I can’t imagine the Braves picking up the $20 million option for 2009. But they’re still on the hook for $15 million this year, although Colorado pays the $6 million buyout for 2009. And judging by the fact that he hasn’t pitched in two and a half years, it’s not like the Braves are going to miss his contribution. The only other possibility is Mark Kotsay who’s in the last year of a contract that will pay $8 million this season. He’s going to be overpaid unless he bounces back both offensively and defensively, but his contract doesn’t really put up much of a contest against Hampton.

*How crazy is that Hampton contract by the way? He never once pitched for Florida, but ended up costing them $23.5 million. And Colorado, who already put in $23 million dollars for just two years of below average pitching (98 and 78 ERA+), owes him the $6 million buyout and then $19 million of deferred money (plus interest) for his signing bonus.

Florida Marlins

Florida is completely different ballgame. According to Cot’s, they have one player under contract beyond 2008*. And since Andrew Miller was a core piece of the Miguel Cabrera deal, I don’t see him going anywhere. So it comes down to the three highest paid players on the team: Kevin Gregg at $2.5 million, Luis Gonzalez at $2 million and Mark Hendrickson at $1.5 million. Yes, you read it correctly. The highest paid player on the team is making $2.5 million dollars.

Gregg is a pretty good relief pitcher who was the Marlins’ closer last year. He should easily be worth the $2.5 million. Gonzalez is an aging OF who doesn’t seem to have a lot left in the field, but is still passable at the plate. For $2 million, you can’t complain too much. Mark Hendrickson, on the other hand, has had one good half season and gives up a lot of hits. In front of Florida’s putrid defense (although it almost has to be better this year… right?) I can’t see Hendrickson putting up an ERA much south of 6.00. So even though he was just recently signed as a free agent, he’d be the one I let go.

*Wow, you have to be thinking Loria is either milking the revenue sharing (well, duh) or looking to sell the team after the new stadium is built. I can’t believe there’s only one player with a multi-year deal on the team. And that was signed by the Tigers. I can only think he’s trying to hold down the team’s long-term liabilities so the balance sheet will look better when he sells the team. Or he’s just an immoral d-bag. One or the other.

New York Mets

There’s really no contract on the Mets that is too outlandish. Obviously Johan Santana‘s monster deal carries some risk at the back end, and he’s one torn ligament away from making this list, but that can be said about almost any highly paid pitcher. Carlos Beltran is signed to a pretty reasonable deal for arguably one of the top players in the league. I’m not sure he’ll be worth $18.5 million a year in 2011, but it’s too early to forecast his decline. That leaves Carlos Delgado. He had a pretty bad year in 2007 (by his standards) and was definitely not worth the $14.5 million he was paid. He’s owed $16 million this year with an option for 2009 that will cost $4 million to buyout*. If I were the Mets, I don’t know if I take the chance he rebounds. His offense has dropped pretty precipitously over the last two years and he’s already gotten hurt this spring. However, it’s the Mets. They have money and right now they need warm bodies who can be propped up at the plate and hit. I’m not sure the Mets would take advantage of the Hampton rule, but if so, then Thin Carlos is their man.

* Delgado’s contract carries a $16 million option for 2009 that vests based on his rank in the MVP voting. Cot’s doesn’t have any details on where he needs to place, but Delgado finished 6th in 2005 and 12th in 2006. Does anyone know the status of that option?

Philadelphia Phillies

The Phillies’ Hampton Rule representative would likely differ based on who you asked. If you asked the Philly Phaithful, they’d probably say Pat Burrell. He’s finally in the last year of the 6 year / $50 million contract he signed back in 2003, and the fans have never thought he lived up to it.* Still, he’s only owed $14 million more and he’s delivered an OPS+ in the mid 120s for the last three years. Now, that’s no great shakes for an adventurous fielding corner outfield, but there’s a far more egregious thief of Bill Giles’ money.

For some reason, Pat Gillick thought it would be a good idea to sign Adam Eaton to a 3 year deal that will pay him a total of $16.5 million dollars over the next two seasons (including the $500 thousand buyout of the 2010 option). Now $8 million a year for a starter isn’t too much above market price, but you’d want that starter to at least be somewhere close to league average, right? About the only positive for Eaton is that he did make 30 starts last year, a number he hadn’t hit since 2004. The bad news is he gave up 117 runs in those 30 starts for a sparkling 6.29 ERA. It makes you wonder how the Phillies won the division last year running Eaton out there every 5 days.

* I know it’s cliche to talk about Phillies’ fans, but they really do seem to have trouble appreciating what they have. Pat Burrell is obviously nothing special, but he’s not worth the derision some fans seem to have for him. I mean I can understand it somewhat. I was in college in Philly when he came up and he was viewed as a potential superstar. He never reached those heights, despite being paid like he had. Of course, I wouldn’t mention how he’s a disappointment to Mets fans. Anyway, Burrell is just another link in a long chain that reaches back to Mike Schmidt. Schmidt was obviously never driven out of town, but Scott Rolen and Bobby Abreu were, if not pushed out the door, at least very strongly encouraged to leave. And I’d consider Rolen and Abreu to be the team’s two best players between Schmidt and Utley. Sometimes I just don’t understand people.

Washington Nationals

The Nationals are a team that don’t really have too many albatross contracts, especially considering they’re about to get a huge revenue jump from moving into a new stadium. Austin Kearns is somewhat overpaid at $13 million over the next two seasons (with a $10 million option for 2010 or a $1 million buyout). But he’s shown some potential in the past and he’s a toolsy outfielder – there’s no chance that Bowden will get rid of him*. The easy choice for the Nats is Cristian Guzman. Brought in as one of the first “big” signings of the Washington era, he couldn’t even hold his starting job beyond one season (to be fair he missed the entire 2006 season, but that was probably a good thing for Washington). In that one season, Guzman managed an OPS+ of 53. And lest you think he made up for it with defense, he was rated one of the worst shortstops in the majors. He’s in the last year of his deal now and only owed another $4.25 million, which really isn’t too much. But there shouldn’t be much expectation he repeats his performance from last year (which was actually pretty solid in a small sample size). And really, who else was there to choose from. Paul LoDuca and Dmitri Young** aren’t paid much more than Guzman and should deliver quite a bit more.

* Does any team have a more intriguing set of youngish toolsy outfielders than the Nats? Between Kearns, Lastings Milledge, Wily Mo Pena and Elijah Dukes, there’s a whole lot of unrealized potential there. The Dodgers with Matt Kemp and Andre Ethier have a lot of promise, but who knows if they’ll both play. The Diamondbacks with Chris Young and Justin Upton and the Rays with B.J. Upton, Carl Crawford and Rocco Baldelli are contenders as well.

** I just saw on Dmitri Young’s Baseball Reference page that his nickname is “Meat Hook”. I don’t really have anything to add, I just thougt it was pretty cool.

Check back tomorrow or Monday for the NL Central.

How to link PITCHf/x to Retrosheet

Update 04/28/2008 – I uncovered a bug in the parser script that was causing the nightly update to fail for all dates after the 10th of the month. Grab the new ZIP file for the fix. Also, you’ll need to run the parser manually starting from April 10. If you need any help send a comment or an email.

The hot area of study in baseball today is detailed pitch analysis made possible by data from the PITCHf/x system. Analysts like John Walsh at the Hardball Times, Dan Fox from Baseball Prospectus and Mike Fast (among many others) are producing some amazing research on identifying pitch types, the consistency of release points and many other topics that were impossible to study before having the detailed PITCHf/x data. Mike Fast provides a running catalog of PITCHf/x studies at his FastBalls blog.

Background

Last year, Mike also provided his method for capturing the PITCHf/x data and storing it in a relational database. He details the steps needed to download the XML from MLB, to parse it, and to write it to a MySQL database. These instructions are a great way to get started in downloading and analyzing the data, but there were a few areas for improvement I saw – namely, the process as described requires you to manually run the programs, and there’s no easy way to tie the PITCHf/x data to the play-by-play data from Retrosheet.

Let me take this in reverse order. Tying the data to Retrosheet is important if you want to pull in any information that’s not captured in the PITCHf/x data. In my case, I’m interested in the pitcher/catcher relationship, and that’s not explicitly available from PITCHf/x. But creating the relationship to Retrosheet isn’t necessarily that easy. First off, PITCHf/x data can be downloaded nightly throughout the season, whereas Retrosheet releases a complete season during the winter. Because of the time lapse, you need to anticipate what the Retrosheet data will look like while parsing the PITCHf/x data.

Mike provides both a spider to download the PITCHf/x information and a parser to transform the data and store it in a database. Both are written in Perl and are based on the Baseball Hacks book by Joseph Adler. The spider does exactly what I want, so that’s unchanged. However, I did need to make changes to the parser. Since I’m not great in Perl, I started from scratch on a parser using the Python language. The parser code can be found here and here. Don’t worry about downloading them now, I’ll provide a ZIP file at the end of the post that contains the whole package of code. The parser takes the PITCHf/x data and builds a Retrosheet-like game id and event number. Storing this forced me to change Mike’s database structure as well. A SQL script for creating the new structure can be found here. As with Mike’s setup, this is a MySQL database and everything he talks about still applies.

I found my parser to match up very, very well with the Retrosheet data from 2007. The only discrepancies I uncovered were the strange behavior of the PITCHf/x data missing the very last at-bat of the game. If it were only a handful of times, I would shrug it off, but it appears to have happened over 400 times, or basically once every six games. I’ve confirmed that it’s not my code – the XML files really are missing the last at-bat. I can’t explain why it happens so frequently, but hopefully it’s something that will be resolved this season. I am aware of a few issues with my parser. Just like Mike, I don’t handle mid at-bat pitching changes well. I also will be missing the pitches for the partial plate appearance when the runner ends up making the third out on the bases. This is because I use the Retrosheet event number as part of the unique identifier for pitches, and when the plate appearance is partial there is no Retrosheet event number. I don’t think I’m missing anything else major, but please let me know if you find anything.

Software Needs

The setup as described by Mike requires Perl and MySQL. I’m adding Python (and some libraries) to that list. I’m not going to rewrite how to setup Perl or MySQL – Mike does a very good job of explaining what’s needed there. I will share how to get Python going though.

First, download and install the Python language interpreter. Next, download and install the EasyInstall package. This will make your life a whole lot easier going forward when you try to install other packages. We’re not really going to be using the full power of EasyInstall, but if you’re going to be doing more with Python you should really understand how it works. Finally, download the mysql-python file which contains the code that allows you to connect to your database. You can download in a variety of packaging format. Personally, I’ve had good luck using the .egg format, but feel free to experiment with the others if you want. If you’ve downloaded the .egg format, go to where you’ve installed EasyInstall. For me, this was C:\Python25\Scripts. Run easy-install.exe pointing it to where you downloaded the mysql-python.egg. For example, easy-install.exe C:\Download\MySQL_python-1.2.2-py2.4-win32.egg. You’ve got everything you need to run the new scripts, so let’s talk about how they work.

The Scripts

As I mentioned above, I use Mike’s spider software, so I’m not going to go into details about that. I will talk about the parser though. Open up a command window and navigate to where you downloaded the scripts. Type python pitchfxparser.py -h. This should give you some instructions on how you can use the script. Basically, you need to provide a location that represents the top-level directory for the PITCHf/x files – mine is C:\Baseball\pitchfx\games. You can also specify which dates to parse by adding arguments for year, month and day. If you don’t provide any date arguments it will only parse yesterday’s information.

Let’s look at a couple of examples. Say you wanted to parse the entire 2007 season. You would use the following command:

python pitchfxparser.py -l "C:\Baseball\pitchfx\games" -y 2007

If you wanted to only parse the games in October of 2007, you would use this command:

python pitchfxparser.py -l "C:\Baseball\pitchfx\games" -y 2007 -m 10

If you only wanted the games from October 1, 2007, use this command:

python pitchfxparser.py -l "C:\Baseball\pitchfx\games" -y 2007 -m 10 -d 1

And finally, if you only wanted the games from yesterday (whatever date yesterday turns out to be), just use this command:

python pitchfxparser.py -l "C:\Baseball\pitchfx\games"

Getting a Nightly Update

The last piece of the puzzle is setting the scripts to automatically run every night. I’m going to provide the instructions on how to do this for Windows. For those of you running Linux (or if there are any BSD or OpenSolaris users out there), you’ll want to look into cron jobs.

The first thing you need to do is create a Windows batch (.bat) file that will run both scripts in order. I’ve already written one, but it’s a very easy thing to do. The really important thing is to make sure you have your directories identified correctly. In my batch file, I assume everything is in the same directory as the batch file itself.

Next, you’ll create a Windows scheduled task. Go to your Windows Control Panel (the full or advanced version) and click on “Scheduled Tasks” followed by “Add Scheduled Task.” You should see a dialog that looks like this:

Scheduled Task - Screen 1

Click “Next”. The next screen will ask you to select a program to schedule. Click on “Browse” and select the batch file you created. Then click “Next”. You should see a screen asking how frequently you’d like to perform the task.

Scheduled Task - Screen 2

Select “Daily” and then click “Next”. After this you’ll be asked what time you want to run the program. Remember, the parser is set up to parse the previous day’s results, so you’ll want to run it after midnight. I use 8:00 AM EST since I don’t necessarily know what time the West Coast night games are going to end and this seemed safe enough. Hopefully I don’t need to mention that it needs to be a time when your computer is turned on.

Scheduled Task - Screen 3

Enter whatever time you want, make sure the task will run everyday and choose a start date. If you want, you can wait until Opening Day, but spring training games are currently available. Click “Next”.

Now you reach the really critical part. This is where you enter your Windows user name and password. If you do not provide a password, the task will not run. If your like me and don’t have a password set up to log into Windows when it starts, you’ll need to set one up. Look through Windows Help or on Microsoft’s site if you need more information.

Scheduled Task - Screen 4

Click “Next” and you’ll be shown a success screen which should look something like this:

Scheduled Task - Screen 5

Congratulations, you have successfully set your computer up to automatically download and parse the PITCHf/x data every day.

Now you’re ready to set out analyzing the data. I’ll provide the link to Mike’s wonderful library of PITCHf/x resources again, in case you’re looking for some help on what it all means.

Resources

Pitchfx.zip - a zip file containing the database definition file, both parsers and a sample batch file

Powered by WordPress with GimpStyle Theme design by Horacio Bella.
Entries and comments feeds. Valid XHTML and CSS.