A staple in college psychology courses is the story of polio and “spongy tar.” It seems that decades ago, researchers noticed that there were higher polio rates during times when tar was spongier in children’s playgrounds. Based on that, they mistakenly concluded that spongy tar causes polio and raised an alarm; in response, some schools went so far as to dig up their tar playgrounds — before scientists realized that the tar and the polio were both symptoms of something else: hotter temperatures. Polio tended to be a summertime problem, and tar softens in the hot weather.
The lesson: Correlation does not imply causality. Just because two things are happening at the same time doesn’t mean that one led to the other.
In this era of more robust and carefully vetted science, when policy decisions supposedly spring from research rather than hunches, the spongy-tar theory would probably not get much traction. Even so, studies aren’t always what they seem — or what they’re made out to be. Sometimes they’re misanalyzed, and the wrong message is gleaned from them. Sometimes they can’t be reproduced by subsequent studies, or the results aren’t as clear-cut as the first studies suggested. (Vitamin E, anyone?) At times there are unintended flaws or biases in the study design, and at other times the findings are misrepresented or overblown.
One of the most recent reminders of this came from a massive study of studies. As reported in August in the journal Science, a group of scientists attempted to reproduce 100 psychology studies, all of which had been published in leading journals. Their finding: More than half the time, the results of those studies couldn’t be replicated. Many were considered seminal studies frequently cited in other research.
The conclusion of the Reproducibility Project caused some snickering among those who had thought the “soft sciences” of psychology and sociology were sketchy fields to start with. But even research in the hard sciences, including medicine, often has similar problems. A 2012 study found that when medical research finds a “very large effect,” follow-up studies usually find a much smaller effect — or none at all.
Science is essential to our daily functioning and to our ability to understand the universe, nature and ourselves. Its benefits are almost unfathomable, especially when scientists build a body of multiple studies that support and round each other out. That’s how people learned about the horrific effects of smoking, and were given warnings about climate change long before the glaciers began to disappear. But scientific studies have their limitations; problems arise when the results are mishandled by the scientific community or when politicians and advocacy groups seize on studies that back their own beliefs without waiting for more research. We are confronted by mounting piles of studies about standardized testing, green coffee-bean extract, paleo diets, vitamin use by older men and sexual assault on campus. And in too many cases, before the findings have been confirmed by other studies, they become the basis for approving new drugs, or they set off new diet fads or lead to new policies.
A report that a nutritional supplement “may” lead to weight loss or reduced risk of Alzheimer’s disease is enough to send many people running to the store, even when the findings are preliminary and call for more research.
Various factors add to the problem. Researchers, attempting to attract grants and have their work accepted for publication, focus their research on sexy subjects that they hope will yield novel, dramatic findings, says John P.A. Ioannidis, a Stanford University epidemiologist. They pursue the aspects of their research that in preliminary stages show provocative results — and, conversely, when they find little or no effect, they tend to abandon those avenues of research, even though those results can be important, too. Journals also are looking for hot new topics, and university press offices tend to hype their faculty’s research in an attempt to draw new funding, which is in ever-tighter supply. The media, to fill their websites, have at times published press releases verbatim, or failed to report on the limitations in the research. Sometimes there are unintended biases in studies, or the researchers examined too small a group or set of questions. And occasionally, the pressure to be a research star leads to outright fraud.
What’s more, there isn’t much grant money around for trying to follow up on the research published by others. No one wants to fund replication studies, Ioannidis said.
Consider the case of Robert Martinson, a sociologist who in 1974 published an attention-grabbing essay based on a survey of 231 studies of prison rehabilitation programs. Though his role in the survey was relatively minor compared with those of his two colleagues, Martinson was the one who became famous for his assertion in the follow-up essay that almost no rehabilitation programs work. That conclusion was different from that of the underlying research, which found that although the specific programs studied didn’t work, that didn’t mean the techniques might not be sound.
Martinson thought his essay, which came to be known as the “Nothing Works” paper, would lead to less-harsh sentencing, by making the public realize that prison wasn’t the best place to try to rehabilitate convicts. Instead, to his horror, it became the justification for wiping out prison rehabilitation programs.
Scientific research has transformed the world for the better, and has the potential to do far more. But policymakers, academics and educators must make certain that scientific caution isn’t lost in the race for the newest, hottest finding. Schools should be teaching students how to think critically about research findings that are presented to them. And politicians and regulators should avoid the temptation to turn each new bit of research into policy, without widespread scientific consensus that the matter at hand has been proven.