Retrosheet’s bgame program takes a Retrosheet event file and creates a game summary. Again, I’ll use the Colorado Rockies @ Los Angeles Dodgers game on 4/9/2007 as an example. Here is the event file.  Here is a step-by-step guide for using bgame.exe.

  1. Download bgame.exe from Retrosheet or here, and unzip the executable file.
  2. Put the event file (or a text file that includes many event files) in the same directory as the bgame.exe program.
  3. You also need to have a team file included in the same directory.  A team file is text file that lists every team, their league, and their three-letter abbreviation.  The team file must have a filename of TEAMYYYY.  For example, here is TEAM2007.
  4. Open the Windows Command Prompt.
  5. Navigate to the directory where you stored bgame.exe, the event file, and the team file.
  6. Type the command: bgame -y 2007 dodgers_rockies040907.evn (or the name of your event file).  -y 2007 specifies the year of the game.  If you are generating game info for a game from 1960, make sure you use -y 1960.
  7. If you want to output the game info to a text file, use the command bgame -y 2007 dodgers_rockies040907.evn > dodgers_rockies_040907_gameinfo.txt.

Here is the output from bgame.exe.

The output is a comma-delimited file that contains the following fields:

0       game id - This is formatted (Home Team Abbreviation + YY + MM + DD + Game Number (see below))
1       date - This is formatted YYMMDD
2       game number - This field shows 0 if only one game was played between the two teams on that day.  If a double-header was scheduled, this field will show either 1 or 2.
3       day of week - Monday, Tuesday, etc
4       start time - This field is text only.  For instance 3:30 would be 330.  All times are assumed to be PM.
5       DH used flag - This field displays a T (true) if the designated hitter rule was used.  Otherwise, this field is F (false).
6       day/night flag - This field is D or N.
7       visiting team - This is the three-letter abbreviation of the visiting team.
8       home team
9       game site - Every ballpark has a special Retrosheet code.  The code is displayed in this field.  Click here for a listing of ballpark codes.
10      visiting starting pitcher - This is the unique Retrosheet ID of the visiting team's starting pitcher.
11      home starting pitcher
12      home plate umpire - This is the unique Retrosheet ID of the home team's starting pitcher.
13      first base umpire
14      second base umpire
15      third base umpire
16      left field umpire - Big games have two additional umpires in the outfield.  This is the unique Retrosheet ID of the left field umpire.  If no left field umpire was used, this field is blank.
17      right field umpire
18      attendance - The game's attendance
19      PS scorer - The name of the scorer.
20      translator - The name of the translator.
21      inputter - The name of the inputter.
22      input time - The time that the game was input.
23      edit time - The time that the game was edited.
24      how scored - How the game was scored: live, online, tv, radio, etc.
25      pitches entered? - Were pitches entered, the final count, or just the results of the at-bat.  This field shows: pitches, count, or none.
26      temperature - The temperature of the game, in Fahrenheit.
27      wind direction - The wind direction (fromcf, fromlf, fromrf, rtol, ltor, tolf, torf, tocf, unknown)
28      wind speed - The wind speed, in MPH.
29      field condition - The field condition (dry, wet, soaked, unknown)
30      precipitation - drizzle, none, rain, showers, snow, unknown
31      sky - cloudy, dome, night, overcast, sunny, unknown
32      time of game - The duration of the game, in minutes.
33      number of innings - The number of innings in the game.
34      visitor final score - The number of runs scored by the visiting team.
35      home final score
36      visitor hits - The number of hits by the visiting team.
37      home hits
38      visitor errors - The number of errors committed by the visiting team.
39      home errors
40      visitor left on base - The number of runners left on base by the visiting team.
41      home left on base
42      winning pitcher - The unique Retrosheet ID for the winning pitcher.
43      losing pitcher - The unique Retrosheet ID for the losing pitcher.
44      save for - The unique Retrosheet ID for the player who earned the save.
45      GW RBI - The unique Retrosheet ID for the player who hit the game-winning RBI.  This used to be an official MLB statistic.
46      visitor batter 1 - The unique Retrosheet ID of the first hitter for the visiting team.
47      visitor position 1 - The numeric position of the first hitter for the visiting team (1=P, 2=C, 3=1B, 4=2B, 5=3B, 6=SS, 7=LF, 8=CF, 9=RF).
48      visitor batter 2
49      visitor position 2
50      visitor batter 3
51      visitor position 3
52      visitor batter 4
53      visitor position 4
54      visitor batter 5
55      visitor position 5
56      visitor batter 6
57      visitor position 6
58      visitor batter 7
59      visitor position 7
60      visitor batter 8
61      visitor position 8
62      visitor batter 9
63      visitor position 9
64      home batter 1
65      home position 1
66      home batter 2
67      home position 2
68      home batter 3
69      home position 3
70      home batter 4
71      home position 4
72      home batter 5
73      home position 5
74      home batter 6
75      home position 6
76      home batter 7
77      home position 7
78      home batter 8
79      home position 8
80      home batter 9
81      home position 9
82      visiting finisher (NULL if complete game) - The final pitcher for the visiting team.
83      home finisher (NULL if complete game)

Retrosheet has developed three (Windows-only) programs that work with their play-by-play event files.  An event file is essentially a text-based representation of an entire baseball game.  Retrosheet offers event files for nearly every MLB game played since 1953.  For example, here is the event file from Colorado Rockies @ Los Angeles Dodgers game on 4/9/2007.

The first program that I’ll outline is box.exe.  Box.exe creates a box score from the Retrosheet event file.  Here’s how you can use box.exe.

  1. Download box.exe from Retrosheet or here, and unzip the executable file.
  2. Put the event file (or a text file that includes many event files) in the same directory as the box.exe program.
  3. You also need to have a team file included in the same directory.  A team file is text file that lists every team, their league, and their three-letter abbreviation.  The team file must have a filename of TEAMYYYY.  For example, team2007.
  4. Open the Windows Command Prompt.
  5. Navigate to the directory where you stored box.exe, the event file, and the team file.
  6. Type the command: box -y 2007 dodgers_rockies040907.evn (or the name of your event file).  -y 2007 specifies the year of the game.  If you are generating a box score for a game from 1960, make sure you use -y 1960.
  7. If you want to output the box score to a text file, use the command box -y 2007 dodgers_rockies040907.evn > dodgers_rockies_040907_boxscore.txt.

Here is the output from box.exe.

I wrote a script that automatically grabs the current roster of every MLB team and saves it as a ROS file.  This script runs every day at 12:00 PM.  These files are available from the .ROS Files tab at the top of every page.

ROS files are used by Retrosheet’s box.exe, bevent.exe, and bgame.exe programs.  They are plain-text, comma-delimited files having the format: ID, Last Name, First Name, Batting Hand, Throwing Hand, Team Abbreviation, Position Abbreviation.

ID can be any unique ID for the player.  Retrosheet has a unique ID for each player based on first and last name.  For instance, Derek Jeter is jeted001.  Other databases use different formats.  The files that I generate use the 6 digit unique ID used by mlb.com.

The batting hand value can be either R, L, or B (both – switch hitter).

The throwing hand value can be either R or L.

The team value is a three-digit representation of each team.  Retrosheet has established abbreviations for each team.  They are:ANA, ARI, ATL, BAL, BOS, CHA (Chicago American – Whit Sox), CHN (Chicago National – Cubs), CIN, CLE, COL, DET, FLO, HOU, KCA, LAN, MIL, MIN, NYA, NYN, OAK, PHI, PIT, SDN, SEA, SFN, SLN, TBA, TEX, TOR, and WAS.

The position value can be either: P, C, IF, OF, or DH.

As an example, here are the first few lines from the current Atlanta Braves’ ROS file.

407924,Acosta,Manny,B,R,ATL,P
430831,Bennett,Jeff,R,R,ATL,P
430641,Boyer,Blaine,R,R,ATL,P
499107,Bueno,Francisley,L,L,ATL,P
435658,Campillo,Jorge,R,R,ATL,P

I’ve been a baseball fan for my entire life.  I know that one of the first elements of the sport that truly engaged me was the scorecard.  I’ve always been intrigued at the way that an entire game can be completely described on a single sheet of paper.  No two games are identical, yet a small handful of letters, numbers, and symbols can represent any on-field situation.

When I was younger, my scorecards were much more graphical than they are today.  Initially, I found it easiest and most enjoyable to draw pictures to represent the on-field activity.  Later, especially after I developed an interest in computers and information technology, I set about finding ways to use the computer to process the information that I recorded on my scorecards.  As a result, those pretty pictures were replaced by letters and numbers.

Over the past several years, I’ve been working on a variety of projects based on collecting, storing, and analyzing baseball data on the computer.  For a long while, I’ve been thinking about building a web site to showcase these projects, and to encourage me to continue the development of these projects.

And so… what better of a time than one day after the conclusion of the 2008 World Series to get started?

I proudly introduce you to Pitch-By-Pitch.