It's really, really late, I'm bleary-eyed from staring at a computer screen for way too many consecutive hours, and I should be in bed. But I have to share this with you.
Long-time readers (and any friend I've gotten on the phone or IM for any length of time) know I've spent a lot of time lately working on a program to take plain-text game logs (like this one, for instance) and turn them into standardized data files (specifically, data files in the format used by Retrosheet.org). The goal is to drastically increase the amount and the quality of Minor League statistics publicly available.
The possibilities of having standardized play-by-play data for all of Minor League Baseball are extensive, largely for prospect evaluation. Baseball America has recently begun publishing lefty-righty splits for the minors, but what about Home/Road numbers? Situational stats like RISP? Or any of dozens of other things.
As far as the programming side of things goes, I'm basically done. My program has parsed the gamelogs for just about every triple-A and double-A game played through about April 27th this year. I'm still tinkering with various ways to present this data. For now, you'll get a bunch of stodgy-looking HTML tables, but eventually I plan on moving to a player-by-player format like that of David Pinto's day-by-day database or the player pages at Retrosheet. For the presentation, I'm mostly dreaming stuff up--if you've got ideas, I'd love to hear them.
In the meantime, I proudly present my first small wave of data. Here's a sneak preview, updated through April 29th:
AB | H | 2B | 3B | HR | BB | IW | HP | K | SH | SF | GDP | BA | OBP | SLG | OPS | BABIP | GB | LD | FB | P | B | unk | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Overall | 87 | 25 | 6 | 1 | 1 | 6 | 0 | 0 | 11 | 0 | 1 | 4 | .287 | .330 | .414 | .744 | .320 | 38 | 15 | 14 | 3 | 6 | 1 |
vs LHP | 32 | 7 | 2 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 2 | .219 | .219 | .281 | .500 | .250 | 17 | 4 | 3 | 2 | 2 | 0 |
vs RHP | 55 | 18 | 4 | 1 | 1 | 6 | 0 | 0 | 7 | 0 | 1 | 2 | .327 | .387 | .491 | .878 | .362 | 21 | 11 | 11 | 1 | 4 | 1 |
Home | 34 | 13 | 4 | 0 | 1 | 3 | 0 | 0 | 3 | 0 | 0 | 3 | .382 | .432 | .588 | 1.021 | .400 | 17 | 6 | 6 | 1 | 1 | 0 |
Road | 53 | 12 | 2 | 1 | 0 | 3 | 0 | 0 | 8 | 0 | 1 | 1 | .226 | .263 | .302 | .565 | .267 | 21 | 9 | 8 | 2 | 5 | 1 |
0 out | 31 | 8 | 3 | 0 | 0 | 3 | 0 | 0 | 4 | 0 | 0 | 3 | .258 | .324 | .355 | .678 | .296 | 13 | 6 | 2 | 1 | 5 | 0 |
1 out | 27 | 10 | 1 | 0 | 0 | 2 | 0 | 0 | 4 | 0 | 1 | 1 | .370 | .400 | .407 | .807 | .435 | 12 | 5 | 5 | 0 | 1 | 1 |
2 out | 29 | 7 | 2 | 1 | 1 | 1 | 0 | 0 | 3 | 0 | 0 | 0 | .241 | .267 | .483 | .749 | .240 | 13 | 4 | 7 | 2 | 0 | 0 |
none on | 51 | 12 | 1 | 0 | 1 | 5 | 0 | 0 | 6 | 0 | 0 | 0 | .235 | .304 | .314 | .617 | .250 | 26 | 7 | 5 | 1 | 6 | 0 |
men on | 36 | 13 | 5 | 1 | 0 | 1 | 0 | 0 | 5 | 0 | 1 | 4 | .361 | .368 | .556 | .924 | .419 | 12 | 8 | 9 | 2 | 0 | 1 |
RISP | 24 | 9 | 3 | 1 | 0 | 1 | 0 | 0 | 3 | 0 | 1 | 2 | .375 | .385 | .583 | .968 | .429 | 6 | 6 | 7 | 2 | 0 | 1 |
By the way, the rightmost six columns are "batted ball data": from left to right, ground balls, line drives, fly balls, popups, bunts, and unknown. (Why some batted balls are unknown is a story for another day.) If you click here, you'll get that set of data (in, as I said, stodgy table form) for every batter on the Nashville Sounds (like Gwynn, through the game of April 29th). You'll hear much more about this in the coming days. Pretty soon, that data (and more) will be available for every player in all of Minor League Baseball.