COLUMN ONE : Statistics Can Throw Us a Curve : Controversy over a book linking race and IQ shows the potential pitfalls in analyzing data. Most people think numbers don’t lie, but mathematicians know better.
There is a direct correlation, mathematicians have found, between children’s achievement on math tests and shoe size. A clear signal that big feet make you smarter?
And what about the striking link, documented in the early part of this century, between increasing pollution and rising birth rates in the Los Angeles Basin? Does breathing bad air make people fertile?
And what, for that matter, should be made of studies that connect skin color with IQ scores? Does that mean that race can make you dumb, or smart?
Certainly that is what the authors of “The Bell Curve"--Charles Murray of the American Enterprise Institute and the late Richard Herrnstein of Harvard--would have us believe.
Their controversial book trots out an arsenal of mathematical artillery to bolster their proposition that intelligence is mostly inherited, that blacks have less of it and that little can be done about it. Reviewers--not to mention readers--have admitted to shellshock in the face of such a barrage of statistics, graphs and multiple regression analysis.
And surely, numbers cannot lie. Or so most people believe.
Mathematicians, however, know better.
Correlation, they say, does not necessarily mean causation. Correlation means only that one thing has a relationship with another. Causes sometimes can get lost in a tangled web of competing factors so impenetrable that even sophisticated mathematical sifting fails to sort them out.
Individual studies showing one result can be contradicted by larger studies analyzing the same data. Background statistical noise drowns out signals as readily as radio static garbles one’s favorite song. To top it all, some scientists even suggest that humans ultimately may be ill-suited for seeing through the veil of statistics to the real relationships of cause and effect.
These numerical obfuscations explain, among other things, why studies can indicate one day that oat bran lowers cholesterol, and a few years later, show that it has no more effect than good old refined wheat.
The stories told in numbers have profound effects on the design of personal and social agendas. Sometimes statistical correlations point the way to significant findings that result in major policy changes. For example, the correlation between lung cancer and smoking motivated scientists to find direct causal links.
But misinterpreting statistics--even inadvertently--is an old problem that goes far beyond matters of race and IQ. In fact, it’s difficult to find an area of life where it doesn’t apply.
“The truth is, you can make a correlation between almost anything,” said Temple University mathematician John Allen Paulos, whose research revealed the connection between feet and ability in math. “It’s the mystique of precision.”
Psychologist and statistician Rand Wilcox of USC concurred: “Correlation doesn’t tell you anything about causation. But it’s a mistake that even researchers make.”
Indeed, correlations may be nothing more telling than coincidence. Or timing.
For example, studies routinely reveal a strong statistical link between divorced parents and troubled adolescents. But it is also true that adolescents are attracted to trouble no matter what parents do.
“The Bell Curve,” some experts say, is a more complex variation on this theme. “It’s quite possible that two things move together, but both are being moved by a third factor,” Stanford statistician Ingram Olkin said.
Paulos points out that almost anything that correlates with high IQ is also associated with high income. This conclusion comes as no surprise, given that affluent parents can more easily afford better schools, more books and computers and generally raise more healthy, better-nourished children.
Studies of IQ and race, experts say, may mask the stronger relationship between white skin and wealth.
“The most reasonable argument against ‘The Bell Curve,’ ” Paulos said, “is that disentangling these factors may be impossible.”
Medical studies are rife with correlations that may or may not be meaningful. Several years ago, according to Wilcox, a study concluded that Japan’s low-fat diet was correlated with a high incidence of stomach cancer compared to U.S. rates.
“The speculation was that our high-fat diet somehow prevented stomach cancer,” Wilcox said. “Then it turned out that it wasn’t the low-fat diet (that contributed to cancer). It was soy sauce.”
Mark Lipsey of Vanderbilt University is involved in a study of the relationship between alcohol use and violent behavior. “People believe that alcohol is causative,” he said. “But the research base is not adequate to support that conclusion. It may be that the same kind of people who are prone to violence are prone to alcohol abuse.”
Sometimes, a seemingly causal factor is a “proxy” for something else, he said. Many gender differences fall into this category.
A number of studies show differences in the math abilities of boys and girls. “It’s obviously not the gonads,” he said. “It would be hard to link that with math ability.”
Instead, some experts say, society has a way of subtly prodding each sex in a certain direction. Racing Hot Wheels, for example, teaches boys about velocity, momentum and spatial relationships, while playing house teaches girls to be passive. Teachers encourage boys to be more analytical, girls to be “good.”
Even studies of twins that purport to prove inheritance of behavioral characteristics may be explainable by other factors.
Genetics may not be the main reason that identical twins raised apart seem to share so many tastes and habits, said Richard Rose, a professor of medical genetics at Indiana University. “You’re comparing individuals who grew up in the same epoch, whether they’re related or not,” said Rose, who is collaborating on a study of 16,000 pairs of twins. “If you asked strangers born on the same day about their political views, food preferences, athletic heroes, clothing choices, you’d find lots of similarities. It has nothing to do with genetics.”
Comparing more than one factor always complicates the issue. When dealing with income, age, race, IQ and gender, the effects of these co-variants, as the statisticians call them, can be almost insurmountable.
Impressive-sounding statistical methods such as multiple regression analysis are said to eliminate this confusion by controlling for certain variables, erasing their effects. To see what effect shoe size really has on math scores, one might control for the influence of grade level, which always would confuse the results; only a comparison of children in the same grade would be meaningful. But mathematically erasing influences that shape life as pervasively as race, income and gender is far more difficult.
“There are lots of ways to get rid of (these variables)” Wilcox said, “but there are also a million ways that (the methods) can go wrong.”
“The Bell Curve” overflows with statistical analyses that purport to control for numerous variables. The income difference between blacks and whites wouldn’t be so extreme, the authors argue, if only the IQs of blacks were as high as those of whites. Using regression analysis, they control for IQ, effectively seeing what would happen if it were equal for both groups. This mathematical manipulation, the authors say, reduces the difference between poverty rates for blacks and whites by 77%, an impressively precise statistic. This suggests, they say, that income differences are primarily the result of IQ rather than of a family’s economic status.
But mathematicians like Stanford’s Olkin take a more skeptical view of what it means to control for anything. “It’s a bad term because it can mean many different things,” he said. “It can help you predict, but it doesn’t help you determine causality.”
Knowing who goes to church in a community, he said, can help predict who gets burglarized--because “people who go to church frequently leave their (home) doors open. But it doesn’t mean that you cause burglaries by going to church.”
Even if the statisticians could somehow unweave this web, “it’s still just glorified correlation,” Paulos said. “You still don’t know anything about causes.”
The best analysis of what they see as the statistical sleight of hand in “The Bell Curve,” Olkin and other experts said, was done by Harvard professor Stephen Jay Gould, who has written volumes about attempts to subvert science for the purpose of “proving” that one race, gender or ethnic group is superior.
Gould argues that the way “The Bell Curve” uses multiple regression analysis to “prove” the strong correlation between IQ and poverty violates all statistical norms. In particular, he said, the graphs in “The Bell Curve” do not show the strength of these correlations, which turn out to be very weak. “Indeed, very little of the variation in social factors,” he said, is explained by either IQ or parents’ socioeconomic status. Although “The Bell Curve’s” authors acknowledge in the book that some of the correlations are weak, they say they are strong enough to use as a basis for their conclusions about race and intelligence.
Murray, one of the book’s co-authors, has responded in various publications that criticisms have been unfair and have blatantly misrepresented what he and Herrnstein wrote.
Comparing groups--as “The Bell Curve” compares blacks and whites--complicates the matter even further. Because you can’t compare everyone in one group with everyone in another, most studies compare averages. And “average” is about the slipperiest mathematical concept ever to slide into popular consciousness.
Let’s say the payroll of an office of 15 workers is $1,977,500--and the boss brags that the average salary is about $131,833.
But what if the boss takes home $1 million, pays her husband $500,000 as vice president, and pays two other vice presidents $200,000 each? That means the average salary of the other workers is far less. Yet nothing is technically wrong with the math.
Rather, something is wrong with the choice of “average.” In this case, using the average known as the arithmetical “mean” (dividing the total by the number of workers) disguises gross disparities. The median (the salary of the person in the middle of the range of employees) would provide the more realistic “average"--$10,000. One could also use the mode, or most common number in the list--$5,000.
A bell curve plots the so-called “normal” distribution of probabilities. In a perfect bell curve, the mean, median and mode coincide, so it does not matter which “average” is used. In plotting IQ scores, for example, the vast majority of people are in the middle of the curve, with the Forrest Gumps and Albert Einsteins almost alone on the tails.
But the assumption that the distribution is normal is “almost never true,” Wilcox said. “And if you violate that assumption ever so slightly, it can have an unusually large impact. I could draw a curve that would look exactly like (the perfect bell curve) but it could have a very different meaning.”
The difference of 15 points between the mean IQs of blacks and whites, as proposed in “The Bell Curve,” could be very misleading, Wilcox said. “The median could be a lot smaller,” he said.
“Even the title--'The Bell Curve'--is a red flag, because it assumes a perfectly normal distribution. And no group is normal. If you have one unusual person, that can have an unusually large impact.”
Recently, statisticians have discovered yet another reason to use caution in reviewing studies. A technique known as meta-analysis--an analysis of analyses that pools data from many studies on the same subject--can produce results that apparently contradict many of the individual studies.
Hundreds of studies concluded that delinquency prevention programs did negligible good. But a meta-analysis by Lipsey showed a small but real positive effect: a 10% reduction in juvenile crime. At the same time, he found that “scare ‘em straight” programs led to higher delinquency rates compared to those of control groups.
Meta-analysis works, Lipsey explained, by clearing the background “noise” that comes from doing research in the real world, instead of in a laboratory. A teen-ager could have a bad memory or decide he doesn’t trust the interviewer; or the interviewer could have an off day.
Even objective measures such as arrest records have statistical noise, Lipsey said. “That may vary from officer to officer. It’s not just a function of how the kid does.” Sampling errors are common, he said. “From the luck of the draw, you get a group of kids that is particularly responsive or resistant. And all those quirks come through in that study.”
Individual studies, amid this buzz, may not find a statistically significant effect. By pooling data with meta-analysis, however, “the noise begins to cancel out,” Lipsey said. “Suddenly you begin to see things that were in the studies all along but were drowned out.”
Another dramatic reversal in the story numbers tell came in a meta-analysis released in April on school funding’s effect on pupil performance. Previously, studies suggested that pouring money into teacher salaries and smaller class size made a negligible difference. But when Larry Hedges of the University of Chicago reviewed several dozen studies conducted between 1954 and 1980, he found that money made a big difference.
“People who didn’t want to pay more for schools used to cite studies showing that funding didn’t make any difference,” he said. “So these results were very influential.”
In the end, a correlation is no more than a hint that a relationship might exist. Without a plausible mechanism--that is, a way that one thing might cause another--it’s practically useless.
Therefore, it’s unlikely that the surge in Wonderbra sales caused the recent Republican election sweep, even though the trends were closely linked in time. On the other hand, studies linking rising teen-age obesity to increased hours of TV viewing at least offer a way to get from cause to effect without straining credibility.
“The Bell Curve,” critics say, ultimately sinks under the absence of a realistic mechanism for linking race to IQ. Evolution is too slow and the differences between races are too muddled and too small to account for the apparent statistical divergence, according to Gould and others. To do the kinds of experiments necessary to prove the link in humans would be unthinkable, said mathematician William Fleishman of Villanova University. Such research would have to involve random mating and perfectly controlled environments.
“Here we seem to have these highly heritable traits,” he said. “But what is it we know about what’s really important to the successful education of young children?”
Every correlation, he said, should come with an automatic disclaimer. “There’s a big logical fallacy here. What you need is a mechanism. But the numbers can be oh so seductive. . . .”
Curiously, the very reason that people are prone to jump to conclusions based on tenuous correlations may have something to do with humans’ genetic endowment, according to Paul Smith, who has been analyzing social statistics since the early 1970s.
“You and I don’t have a statistical facility in our brains,” said Smith, who is at the Children’s Defense Fund. “We are primates evolved to gather fruit in the forest and when possible to reproduce, and I think it’s marvelous that we can do what we do.
“But we have to exercise almost intolerable discipline to not jump to conclusions. There might be a banana behind that leaf, or it might be the tiger’s tail. The one who makes the discrimination best and moves fastest either gets the banana or gets away from the tiger. So this leaping to conclusions is a good strategy given that the choices are simple and nothing complicated is going on.
“But at the level of major social policy choices, (jumping to conclusions) is a serious concern.”
(BEGIN TEXT OF INFOBOX / INFOGRAPHIC)
What’s the Average?
Mathematically speaking, there are three specific kinds of averages: mean, median and mode. Assume that a company employs 15 people at an “average” annual salary of $131,833. This impressive average salary is derived by dividing the total payroll ($1,977,500) by the number of workers, which gives the arithmetic mean. But the median salary--the middle of all the values--would give a far better idea of the pay scales. The mode, or most common salary, is more paltry still. Here is how the annual earnings of individuals in this hypothetical company would look. Mean salary: $131,833 PRESIDENT: $1 million VICE PRESIDENT: $500,000 SENIOR VICE PRESIDENTS: $200,000 Median salary DEPARTMENT MANAGERS: $10,000 ASSISTANT MANAGER: $7,500 Mode STAFF MEMBERS: $5,000