Grandpa Gets All the Calls
Posted by Dan Turkenkopf - 10/04/08 at 08:04:18 pmProbably the most common response to my look at catcher framing was that the pitching staff had to have something to do with the results. I totally agree, but I was curious about how the staff would affect the number.
I thought of three possible “biases” that could change how a pitcher is judged by an umpire: age, reputation (by which I mean success) and early-game wildness. I’m sure there are probably others, and I’m willing to take requests for study.
Anyway, these will be probably be combined into a longer post for Beyond the Box Score later, but I thought I’d post the results as I got them here.
The first test I looked at was by age. I broke down all the pitchers from 2007 into three categories - under 25, between 26 and 35, and older than 35. The age breakdown was somewhat arbitrary, but I wanted to get a young group and an old group, with the hypothesis that umpires were kinder to the older pitchers.
Turns out that young pitchers saw missed calls cost them .37 runs per game, while older pitchers benefited by .23 runs per game. The middle age bracket gained .14 runs per game.
There are a couple of potential flaws to this study that I’ll point out, but I don’t think they’re too serious. First, the release of the Lahman database doesn’t include the Retrosheet ids for rookies from 2007, so I didn’t have a birth year for them - although my presumption is that most will fall in the bottom group. I eliminated them from the study - which will lower both the number of opportunities and the number of missed calls from the young pitchers. If anything, this dampens the actual effect and younger pitchers were hurt a lot more.
Second, there’s definitely a selection bias in play. Older pitchers are those who have pitched well enough to stay around, while younger pitchers may be drummed out of the league under 25. I’m not sure how to correct for that yet, but I might have a better idea when I try to measure how big a role reputation plays.
Long story short, younger pitchers appear to lose at least .6 runs per games to older pitchers based on umpires calls. This may not be solely related to age, but to a combination of factors that are correlated to age, which I’ll try to examine over the next few days.
Pitch Classifier
Posted by Dan Turkenkopf - 10/04/08 at 07:04:42 amI’ve decided to start work on a pitch classifier similar to what Josh Kalk describes over at from small ball to the long ball. I know he’s already built a good one, and can generate some really cool graphs from it, but he hasn’t released the algorithm, and it’s a good opportunity for me to use my Programming Collective Intelligence book. I know he’s got some advantages over me… rumor has it that he’s a theoretical physicist (and so is John Walsh)… but it should be a fun attempt nonetheless.
The goal is to use the PITCH f/x data to automatically classify each pitch as a fastball, curveball, whatever. Besides the obvious usefulness of this data for evaluating pitchers, I think it can be used to enhance some of my work with catchers.
Anyway, this could be a fairly long process (and there may be some other interesting things coming along with it). I figure I’ll post progress updates here. After all, what is a blog but a place for me to blather on about things no one is interested in?
Pitch Framing - Is it Real?
Posted by Dan Turkenkopf - 05/04/08 at 01:04:00 pmMy latest post at Beyond the Box Score looks into whether certain catchers do better than others in convincing the umpire to call close pitches strikes and how many runs might be saved by turning balls into strikes. The results shocked me, and even cause me to question the outcome. Check it out and let me know what I did wrong.
The Legend of Clutch
Posted by Dan Turkenkopf - 02/04/08 at 06:04:02 pmColin has an excellent post over at The Other Fifteen discussing the importance of narrative in how we watch and talk about baseball.
Now, when we discuss baseball statistics, what we’re really talking about is an aggregation of these records; we take the events and collect them. That’s true about the old-school triple crown stats - batting average, home runs and runs batted in - or the “new age” VORP and Win Shares. Baseball statistics are, at their heart, simply a summary of what occurred on the field of play.
But when we collect statistics in that way, we tend to do so in a way that dissociates them from their narrative context. A player’s VORP counts hits against a hated division rival in the midst of a tight race exactly the same as hits against a 30-year-old journeyman pitcher playing out the string for a basement dweller in September. Baseball statistics don’t seem to have any understanding of the fact that Yankees players are attractive, handsome stars, and that Kansas City Royals players… aren’t. There isn’t a baseball stat that measure how athletic and, really, balletic Derek Jeter looks when he does that mid-air throw or dives for a ground ball.
And yet all of those things are important, if not essential, in forming a narrative of a baseball season. All of those things add a sense of excitement and drama to baseball. And they’re the things that first attracted most of us to baseball - yes, even the Dread Sabermetricians.
He goes on to suggest that the very human need for a narrative is what leads to an inordinate focus on clutch performance and clutch performers. He offers a truce to sportswriters - declaring he’ll stop proclaiming that clutch isn’t real if they refrain from relying upon it as THE key piece of evaluating a player.
I think Colin hits the nail on the head with the importance of the story arc to our enjoyment of the game and the perceived conflict with statistics. I’m not yet ready to throw in the towel and say the two positions are incompatible though. I know there haven’t been any studies that have clearly identified clutch performers, but that merely suggests a lack of skill rather than proves it (something we all know deep down, but Bill James still felt the need to point out because we sure weren’t acting like that was the case). And I’m not completely on board with WPA - but that’s mainly due to how it’s calculated rather than any deep-seated angst against the concept of context-sensitive value.
The problem I see with Colin’s approach to resolving the stalemate between believers and non-believers is that it doesn’t resolve anything. He admits this himself, saying “This isn’t a cry for fusion, or balance, or peaceful coexistence.” But his call for a truce ends the conversation. Each side can continue happy in their supposed knowledge with little to no challenge. With no challenge, there’s often no learning. And both sides will be poorer for it.
Hat Tip: The Book Blog
Switching a Ball to a Strike
Posted by Dan Turkenkopf - 02/04/08 at 08:04:11 amOne of the cool things (ok, I’m a baseball nerd) about analyzing data is how often you discover little interesting facts that you didn’t know before.
While researching my current work on whether catchers have an effect on whether a pitch is called a strike or a ball (look for it tomorrow or Friday at Beyond the Box Score </shameless plug>), I realized I didn’t know what replacing a ball for a strike was really worth. A Google search turned up some information as to the run value of a given count, but not the information I was looking for. So I took a little detour from my planned study and decided to calculate the value myself.
Using data found in this thread at the Book Blog and the 2006 Major League splits from Baseball-Reference, I determined the value of a ball and a strike for every count (from the point of view of the pitcher). The difference between those numbers is the value of switching a ball to a strike at each count. Then I took average of the values at each count weighted by the number of plate appearances at that count to get the final number of .161 runs.
In the interest of full disclosure, and for those who might be interested in the breakdowns, here’s the data.
| B | S | WOBA | LW | RV Ball | RV Strike | RV B->S | PA | Weighted RV |
| 0 | 0 | 0.332 | 0 | -0.0339 | 0.0426 | 0.0765 | 188071 | 0.0217 |
| 0 | 1 | 0.283 | -0.0426 | -0.0269 | 0.0617 | 0.0886 | 87779 | 0.0117 |
| 0 | 2 | 0.212 | -0.1043 | -0.02174 | 0.2007 | 0.2224 | 33467 | 0.0112 |
| 1 | 0 | 0.371 | 0.0339 | -0.0626 | 0.0496 | 0.1122 | 77323 | 0.0131 |
| 1 | 1 | 0.314 | -0.0157 | -0.0504 | 0.0670 | 0.1174 | 72385 | 0.0128 |
| 1 | 2 | 0.237 | -0.0826 | -0.0461 | 0.2224 | 0.2685 | 48727 | 0.0197 |
| 2 | 0 | 0.443 | 0.0965 | -0.1104 | 0.0617 | 0.1722 | 27566 | 0.0072 |
| 2 | 1 | 0.372 | 0.03478 | -0.1026 | 0.0713 | 0.1739 | 38594 | 0.0101 |
| 2 | 2 | 0.29 | -0.0365 | -0.0983 | 0.2685 | 0.3667 | 39862 | 0.0220 |
| 3 | 0 | 0.57 | 0.2070 | -0.1170 | 0.0696 | 0.1866 | 9512 | 0.0027 |
| 3 | 1 | 0.49 | 0.1374 | -0.1866 | 0.0757 | 0.2623 | 16577 | 0.0066 |
| 3 | 2 | 0.403 | 0.0617 | -0.2623 | 0.3667 | 0.629 | 23156 | 0.0220 |
Powered by WordPress with GimpStyle Theme design by Horacio Bella.
Entries and comments feeds.
Valid XHTML and CSS.


