Koufax’s perfect game — the tale of the data


On Sept. 9, 1965, I was lucky enough to watch the perfect game Sandy Koufax pitched at Dodger Stadium. Some say this was the greatest baseball game ever played — the most perfect of perfect games. Koufax pulled it off with the most strikeouts of any perfect game (14), and with the least run support. The victorious Dodgers scored their lone run without a hit, and most of the game looked to be a freakish double no-hitter, as the Cubs’ Bob Hendley also pitched the game of his career.

For all of us watching that day, including my Little League teammates and our fathers, we came as close as we’re likely to get to seeing perfection unfold. For me, the event proved to be a watershed moment in my lifelong obsession with data, and thus a major development in my future career in biology.

Earlier that year, the Dodgers led a campaign to teach local kids how to score a baseball game, recording data on the outcome of every at-bat. Somewhere around the fifth inning, we realized the game might go down in baseball history. I was pleased I was keeping a detailed record of every at-bat as taught, including which fielders were involved and so on. Eventually I noticed that my friend Kyle also was keeping score — by marking every failed at-bat with a simple “X” and nothing more. His whole data sheet on the Cubs was one long string of Xs.


Despite feeling the superior data collector, I later realized that my own data also had shortcomings, and in fact failed to capture the game’s most dramatic moment. In the seventh inning Koufax had pitched himself into a 3 and 0 count — three balls and no strikes, or one bad pitch away from messing up perfection. That was the moment when all sound left the stadium and the suspense of the game burned into our boyhood souls.

My data about how each batter was put out was competent, but it didn’t record Koufax’s pitch-by-pitch performance and therefore failed to capture how close he’d come to failure. In gathering data — whether about ballplayers or bacteria, my specialty –— we must exercise not mere diligence but also foresight and imagination.

Baseball and biology came to this realization at about the same time and, over the last 10 to 15 years, have taken a keen interest in analyzing old data for new insights. The old data are often sufficiently complete for us to discover new “laws” of baseball, invent new guidelines for recruiting and managing players, or, in science, to reveal new perspectives on how organisms live, grow and evolve. Yet in far too many cases, fresh scrutiny of old data reveals painful omissions that merely prove science has missed an opportunity.

Cathy Lozupone and Rob Knight at the University of Colorado recently sought to identify the ecological transitions that are most difficult for bacteria by analyzing previously published data on bacteria present in different environments. They found that, over billions of years of evolution, the most difficult transitions for bacteria were from saline (salty) to non-saline environments and vice versa. Frustrating to Lozupone and Knight (and others among us), however, was that most of the original data characterized the habitat simply as saline or not saline. Measurements of the exact salinity would have enabled them to pinpoint the precise concentration of salinity that has been most difficult to cross in bacterial evolution.

The truth is, previous standards of data collection in biology were typically limited to what might be interesting for the experiment at hand, or perhaps for some future experiment in the same laboratory. Fortunately, biologists are increasingly expected to anticipate likely uses by others of the data we gather and are taking pains to do so. This is a good thing.

In baseball, the invention of the pitch-by-pitch record, reliably available only since 1988, has turned out to be important for much more than capturing the drama of one riveting seventh inning long ago. We now understand its utility for managing a pitcher’s productivity, health and longevity. We don’t need to go beyond Koufax to see that throwing too many pitches over too short a time can lead to debilitating injury. Despite suffering excruciating pain from arthritis in his elbow, he sometimes pitched under conditions that today would be considered abusive — entire games on just two days’ rest, for example. Baseball eventually figured out what constitutes abuse through analysis of the effects of game and season pitch counts, days of rest and age of the pitcher on performance and injury.

Biology and baseball are widely different pursuits, but both benefit from copious, thoughtful data gathering. We should keep in mind that the data we collect today could someday bear fruit when analyzed in currently unimagined ways. As with biological data, honest, precise and complete baseball data will help extend the limits of human physiology and anatomy — and perhaps allow some future Sandy Koufax to throw another legendary game.

Frederick M. Cohan is a professor of biology at Wesleyan University in Connecticut.