Two great tastes that go great together.
One of the greatest things ever is the Lanham Baseball Database. It’s a database of all baseball statistics since 1871, suitable for random SQL queries. Not quite pitch by pitch, but season by season. Note that retrosheet has made their boxscore database for all games since 1971 available, at least in limited fashion, so you can do a day by day database, which Baseball Musings has done. One can only imagine what sort of amazing data we’ll be able to get in 2071.
As an example of the kind of fun you can have, here’s a query to figure out the most recent world series win, loss and appearance for every team in the league:
SELECT
teams.franchid,
MAX(winyear) AS lastwin,
MAX(loseyear) AS lastloss,
MAX(COALESCE(winyear, loseyear)) as lastappearance
FROM teams
LEFT JOIN (
SELECT a.yearid AS winyear,
null as loseyear,
franchid
FROM seriespost a
JOIN teams b
ON (a.yearid = b.yearid AND
a.teamidwinner = b.teamid AND
a.lgidwinner = b.lgid AND a.round = 'WS')
UNION SELECT null AS winyear,
a.yearid AS loseyear,
franchid
FROM seriespost a JOIN
teams b ON (
a.yearid = b.yearid AND
a.teamidloser = b.teamid AND
a.lgidloser = b.lgid AND
a.round = 'WS')) years on (teams.franchid = years.franchid)
WHERE teams.yearid = 2004
GROUP BY teams.franchid
ORDER BY lastappearance desc;
And here’s the output:
franchid | lastwin | lastloss | lastappearance |
---|---|---|---|
HOU | |||
MON | |||
TEX | |||
TBD | |||
COL | |||
SEA | |||
BOS | 2004 | 1986 | 2004 |
STL | 1982 | 2004 | 2004 |
NYY | 2000 | 2003 | 2003 |
FLA | 2003 | 2003 | |
ANA | 2002 | 2002 | |
SFG | 1954 | 2002 | 2002 |
ARI | 2001 | 2001 | |
NYM | 1986 | 2000 | 2000 |
ATL | 1995 | 1999 | 1999 |
SDP | 1998 | 1998 | |
CLE | 1948 | 1997 | 1997 |
TOR | 1993 | 1993 | |
PHI | 1980 | 1993 | 1993 |
MIN | 1991 | 1965 | 1991 |
OAK | 1989 | 1990 | 1990 |
CIN | 1990 | 1972 | 1990 |
LAD | 1988 | 1978 | 1988 |
KCR | 1985 | 1980 | 1985 |
DET | 1984 | 1940 | 1984 |
BAL | 1983 | 1979 | 1983 |
MIL | 1982 | 1982 | |
PIT | 1979 | 1927 | 1979 |
CHW | 1917 | 1959 | 1959 |
CHC | 1908 | 1945 | 1945 |
(30 rows)
Note that the database doesn’t yet have 2005 data. But you get the picture.