Schools Learn Perils of Using a Single Test

By RICHARD LEE COLVIN and MARTHA GROVES

Sept. 25, 1999 12 AM PT

Share via
- Email
- Facebook
- X
- LinkedIn
- Threads
- Reddit
- WhatsApp

TIMES EDUCATION WRITERS

With public education in New York City thrown into disarray by a test scoring flub, educators are sounding the alarm anew: It is dangerous to rely on the scores from a single standardized test to make life-altering decisions about students or schools.

Faulty data supplied by test publisher CTB/McGraw-Hill mistakenly sent more than 8,600 students to summer school in the nation’s largest school district, it was revealed last week. Many of them then failed an exam given at the end of summer to determine whether they would advance to the next grade.

Similar scoring foul-ups, not always with such extreme consequences, have roiled California and other jurisdictions of late.

CTB/McGraw-Hill, which published the Terra Nova exam used in New York City, has also acknowledged scoring flaws in Indiana, Nevada, South Carolina, Tennessee and Wisconsin. Officials in those states were still assessing the damage.

The Stanford 9, a similar test used in California, was plagued over the summer by scoring flaws blamed on test publisher Harcourt Educational Measurement. Schools eager to gauge the effect of the state’s anti-bilingual education measure initially got a misleadingly rosy picture of student performance.

Although the usefulness of standardized tests is widely acknowledged, they were never intended to be the sole indicator of student and school achievement.

Educators who criticize their high-stakes use call the errors and ensuing fallout inevitable given the national obsession with school accountability and the rush to embrace ways to measure it.

“It’s downright dangerous to attach rewards and punishments to these tests, which are so clearly fallible,” said Robert Schaeffer of Fair Test, a Boston-based, anti-testing organization. “This ought to be a wake-up call to every politician in the country who thinks they can gain votes and perhaps improve schools by making high-stakes standardized tests the linchpin of so-called education reform.”

In some school districts, test scores are now being used not only to determine which students pass or fail a grade but also which schools are celebrated or sanctioned and which superintendents or principals get fired or reassigned.

Linking Success With Bonuses

For now in California, Stanford 9 results are the only element in place in Gov. Gray Davis’ programs to prod schools to improve. Other data--teacher and student attendance and graduation rates--will not be available until at least 2002, said William L. Padia, who heads the state Department of Education’s office of policy and evaluation.

Last weekend, Davis said a school could earn a bonus of $150 per pupil for improving its average score by at least 5 percentile points. And under a new law, the staffs of schools that have extraordinary success boosting scores could be eligible for bonuses of as much as $25,000--an idea that makes critics cringe with fear that teachers will scale back instruction and “teach to the test.”

The highly lucrative testing industry has benefited from the growing national focus on developing academic standards and requiring students to demonstrate what they know. Although tests have long been part of the educational landscape, they have received renewed impetus during the Clinton administration.

In an interview with The Times this week, U.S. Education Secretary Richard W. Riley agreed that accuracy must be a top priority but defended exams as an important tool for improving schools.

“You can’t improve schools unless you can measure what they’re doing,” Riley said.

However, measuring results takes time, he added. “You have to have a right good bit of testing before you get a handle on where things are,” he said. “The main question is, are the tests good, are they testing the right thing?”

In addition, he said, any time test scores are used to hold pupils accountable for their achievement, “you need to do all you can to prevent mistakes.”

Until recently, results of standardized tests were not widely publicized nor were high stakes attached, said Michael Kirst, a Stanford University education professor.

California began providing test results for individual students two years ago, after three decades of not doing so. Harcourt Educational Measurement, which receives $23 million to operate the testing effort in California, now tests more than 4 million children a year in the state and must turn around the scores in a matter of weeks--a “burden of precision” that Kirst noted did not exist in the past.

Yet, even under perfect conditions, tests such as the Stanford 9 and Terra Nova provide only a rough approximation of how one student compares to another.

David Rogosa, a Stanford education professor who studies test characteristics, has just evaluated the accuracy of the Stanford 9, using Harcourt’s own data.

He concluded that there is a 70% chance that a ninth-grader whose math skills are truly average--meaning that if perfectly measured the student would be at the 50th percentile--will actually get a score more than 5 percentile points above or below that. (Rogosa has posted an accuracy guide for parents at https://www.cse.ucla.edu.).

Rogosa, a proponent of standardized testing, said that given the “wobble” in scores, it is likely that many students would be misclassified, with potentially disastrous long-term results.

Although testing advocates often contend that scores for an entire classroom or a school are far more accurate, Rogosa says that’s not necessarily so. To register a gain that is statistically significant, he said, the average score for a classroom of 25 students must go up 12 to 15 points.

That calls into serious question ideas such as Davis’ to reward California teachers based on smaller test score gains.

In New York City’s case, officials landed on a precise measure to determine students’ summertime fate: the 15th percentile.

Waging a closely watched campaign to boost performance, officials had arbitrarily determined that third-, sixth- and eighth-grade students scoring below that level on the exam in reading or math would be required to enroll in summer school. Eighty-five percent of students in the national norm group would have scored at or above that level.

All told, 34,000 students with scores below the 15th percentile were ordered to attend. After summer school ended, officials learned that more than 8,600, with scores a hair below the cutoff, were assigned in error.

The glitch happened when CTB/McGraw-Hill, based in Monterey, Calif., translated raw scores to show how students compared with a national sample. About 3% of the data used were incorrect.

Of the 8,600, nearly 3,500 either never showed up for summer school or failed a second test that the city used to determine whether summer school students should be promoted. More than 5,100 other students went to summer school and passed the test.

Members of the first group will now be promoted along with the others, according to Karen Crowe, a spokeswoman for the New York City Board of Education and schools Chancellor Rudy Crew. As school got underway, officials were still scrambling to notify all of those affected.

However, she added, families of students who failed the test after attending the summer session will get letters from Crew noting his concern about the students’ progress. In many cases, she suspected, parents, teachers and principals will decide mutually to hold those children back.

CTB/McGraw-Hill President David Taggart, visibly shaken, apologized last week for the blunder.

“Maybe the word ‘apology’ isn’t enough,” he said under the glare of television lights and the forceful questioning of board members. According to Crowe, Taggart vowed that his company would cover the city’s costs for operating summer school for those children and for reprinting citywide test results.

Attempting to put the best face on the situation, New York Mayor Rudolph Giuliani pointed out that parents of the affected children should be grateful because the students were all faltering academically and got a chance for more schooling at taxpayers’ expense.

Other Criteria for Decisions

Still, the error proved expensive for the city and inconvenient for families, many of whom canceled vacations or scrambled to make unexpected child-care arrangements. The children, meanwhile, were penned up in sauna-like classrooms during a sweltering season.

Under a new policy presented several weeks ago, the city’s students this year will be judged on the basis of grades and attendance as well as on results on citywide tests.

That, test company officials said, would be more in keeping with their view that a single test should not drive key decisions about children’s academic lives.

The Long Beach Unified School District two years ago became one of the first in California to require lagging students to attend summer school or even repeat a grade. But Supt. Carl A. Cohn said the district uses test scores, classroom grades and examples of student work to identify those students.

“For those school systems that took some time to talk about this with the people who would be most affected by it, everybody said do not use a single, standardized test, and we’ve moved to implement it in that fashion,” Cohn said.

Cohn co-chairs a panel designing the formula that will drive the state’s new school accountability system. Unfortunately, he said, that formula rests for now, perilously, on results of the Stanford 9.

“We’re just finding out,” he said, “that there’s an awful lot that can go wrong.”

Schools Learn Perils of Using a Single Test

More to Read

More From the Los Angeles Times