Advertisement

Scoring of School Tests Found to Be Inaccurate : Education: Officials concede they used only a handful of tests to rank some campuses in statewide exams.

Share
TIMES STAFF WRITERS

State education officials broke their own rules in scoring California’s new achievement tests, counting so few examinations that the scores for hundreds of schools may be wildly inaccurate and the results in half the cases are less precise than was promised, a Times computer analysis shows.

Among the most extreme examples found in the Times study are a San Bernardino County middle school where only 1% of the math tests were counted, and two schools--Vicentia Elementary in Corona and Columbia Elementary in Tuolumne County--where results were based on exactly one student’s work.

At Roosevelt Elementary in Indio, 169 students took the tests, but the state graded only four papers, then reported that everyone there scored at the lowest possible level, showing “little or no mathematical ability.”

Advertisement

“I kept reading (the results) over and over again. . . . I kept trying to turn the bar graph upside-down. It really had me panicked,” Principal Kennedy Rocker said. “It’s not even a snapshot. It’s a very blurry Polaroid with some Vaseline over the lens.”

Short of time and money in the first year of the revolutionary California Learning Assessment System, which 1 million students took last spring at a cost of $15 million, education officials decided to score only a fraction of the tests at each school, promising that results would remain statistically sound.

But when told of The Times analysis, several independent statisticians agreed that although the scores in reading, writing and math may offer a general picture of California students as a whole, sampling error makes them too imprecise to be used for judging most districts or individual schools.

“It appears they lost control,” said Lee Crombach, a retired Stanford professor of education who has studied testing for 60 years. “(Sampling) is certainly something that can be managed, but in the first run it can go wrong, and it apparently went wrong here.”

State officials acknowledged dozens of sampling problems with the scoring system, and said last week that reporting results on schools where the most extreme snafus occurred was a serious error. They said corrections and apologies would go out to some schools this month, while a new group of students takes the tests with a state promise that this time many more exams will be scored.

Still, education officials strongly defended CLAS, a new approach to testing that evaluates students’ thought processes, as well as their ability to derive correct answers, and measures performance against tough statewide standards.

Advertisement

“When you’re beginning a program as massive as this one, with limited funding, there have to be choices made,” Acting Supt. of Public Instruction William D. Dawson said. “The expectation is that it’s going to be simple, it’s going to be perfect and it’s going to be painless the first time through. That’s impossible. It’s just an absolutely extraordinary accomplishment to have moved as far as we have as constructively as we have.”

Results released last month to schools and the media painted a grim portrait of student achievement across California, with an especially woeful performance in math, where at least a third of the students statewide showed little or no understanding of basic concepts.

In their attempt to obtain valid samples from the 1993 tests, CLAS architects set some guidelines: At least 25% of the exams taken at every school were to be scored. The percentage varied by school but the state said the number should never fall below 44 tests for fourth grade, 70 for eighth grade and 80 for sophomores. At the smallest schools, all exams were to be counted.

However, using the state’s data, the Times analysis found that:

* Overall, the guidelines were broken more than 11,000 times, meaning that in 49.8% of the cases, individual school results were based on smaller samples than the state intended. In about half of those cases, errors were minor, with samples falling one to five tests short of the guidelines. But in about 400 instances, samples were at least 15 tests off.

* The most serious errors, where fewer than 25% of tests were scored, occurred in 148 cases, mostly in elementary schools. In general, the sampling problems were the worst at elementary schools, where the guidelines were broken more often than they were followed.

* On the writing segment, more than 60% of the small schools did not have all their tests scored as the guidelines require.

Advertisement

* In Los Angeles County, there were more than 50 cases--most of them in the Los Angeles Unified School District--of the extreme violation in which fewer than 25% of a school’s tests were scored.

Across the county, the overall sampling guidelines were violated in 53% of the cases. For fourth-graders, samples fell below the standard in 81% of the schools on the writing segment and 70% in reading.

State officials agree that it would be best to avoid sampling and have announced plans to count all tests and report individual results when the CLAS system--which is expected to cost $55 million a year--is fully in place: “Every student, every paper, that’s the plan,” Dawson said.

CLAS Director Dale Carlson said that since the 1993 results were released last month, the contractor that handled the scoring has discovered sampling errors at about 70 schools. But the Times analysis found that in at least twice that number of instances, not even the required minimum of 25% of tests were scored.

Although that magnitude of sampling error directly spoiled the scores at only about 1% of the state’s schools, the imprecise data was factored into district averages and other comparisons used for the 7,000 schools that took the test.

“It was a waste of money,” said Barbara Anderson, principal of Santee Elementary in San Jose where 90 children took the tests but only nine or 10 were scored in each subject. “It’s a difficult test to give, it’s a hard test for students, and you’d at least like to have some (accurate) results.”

Advertisement

In retrospect, Carlson said, he wishes the state had not publicized data based on so few students’ work in schools such as Anderson’s. He called the situation very disappointing.

“It’s hard to think of anything that can go wrong that didn’t go wrong this year,” Carlson said. “A lot of mistakes were made on a lot of people’s part--most all of them were related to it being the first time.”

The private firm hired to process the tests, CTB Macmillan/McGraw Hill, is trying to figure out how the errors occurred, Carlson said. So far, officials say the problems resulted from answer sheets getting lost at the contractor’s office, arriving there defaced, or being split into groups that caused processors to believe that fewer students at certain schools took the tests.

Winnie Young, CLAS project director for the company, said the state Department of Education instructed her not to discuss the tests with the media.

Unlike previous standardized tests, which used multiple choice answers scored by a computer, the CLAS tests include essays, diagrams and explanations that must be judged by people. Last summer, about 2,000 public schoolteachers statewide were paid $100 a day to score a random sample of tests. Overall, 48.7% of the state’s tests were scored, and because most students took three tests, that left 1.5 million answer sheets untouched.

Each graded test was evaluated against the tough new achievement standards and assigned a score of 1 to 6. Then every school, district and county in the state was told what percentage of their students scored at each level.

Advertisement

All test scores contain some degree of “standard error,” a statistical term that measures the level of precision. CLAS results have higher standard error than previous tests because students are given a diverse set of problems and many of their answers are explanatory rather than absolute.

Small samples amplify standard error, in some cases raising it so high that the results have little meaning, said Gary Phillips, associate commissioner of the U.S. Department of Education’s National Center for Education Statistics.

“If you’re going to be giving a test that has high stakes . . . if it’s going to affect the lives of the students, the schools and the state, then you want to have as low a standard error as possible,” Phillips said.

Eva Baker, a UCLA professor of education who runs the Center for Student Testing, Evaluation and Standards, cautioned people against choosing schools or changing curricula based on the test scores.

“I don’t think people should be massively running around moving people about at this point,” Baker said. “High-stakes decisions ought not to be made until they have the kinks worked out.”

Even though educators warn against it, the public typically uses standardized tests as a barometer for judging schools.

Advertisement

Real estate agents carry lists of the scores, encouraging buyers to move into neighborhoods where the numbers are highest. With new state laws allowing open enrollment, public schools compete with one another for students, and many parents see CLAS as the simplest way to measure schools’ success.

So to Steve Simpkins, principal at Canyon Hills Junior High in Chino, the CLAS results are “less than useless.”

“When incorrect scores come out, people remember that first information,” he said. “We’re going to have to do damage control.”

The Los Angeles Unified School District, the state’s largest, included schools with some of the worst sampling mistakes: 51 schools where less than 25% of the tests were scored, 37 of them among fourth-graders, according to the Times analysis.

Throughout the district, the guidelines were broken in some way more than 1,100 times, or 61% of the total.

Statewide at the eighth-grade level, more than half of the schools had fewer than half of their math tests scored. Similarly, fewer than half the sophomore math tests were scored at 49% of California’s high schools.

Advertisement

But the sampling problems were most widespread at the fourth-grade level, where thousands of schools--76% in writing, 65% in reading and 30% in math--had fewer tests scored than promised by the state guidelines.

According to those rules, any school with 49 students or less should have had all their tests scored. But the data shows that about 500 such schools, more than half of the total, did not have 100% counted in the writing or math sections. In reading, more than 400 of these small schools, 48% of the total, did not have 100% of their tests counted.

“That is an unfortunate occurrence,” said Dawson, the state’s top education official.

In writing, 43 schools had less than 25% of their tests scored, while in reading, there were 37 such schools. On the math tests, 21 schools had less than 25% of their tests scored.

Like about half a dozen other schools, Cahuenga Elementary in Los Angeles makes the list in all three categories. At Cahuenga, only eight of 95 papers were graded in each area, instead of 46, as the guidelines require.

“I work so hard. . . . Our test scores have always shown that we were above the state average,” Cahuenga Principal Lloyd Houske said. “To drop to the lowest level--it was terrible.”

At Lerdo Primary in Kern County, 106 students took the exams, but only 12 were scored in reading and 13 in writing (the sampling guidelines call for 46). At Muir Elementary in Fresno, only 25 of 151 writing papers were graded (it should have been 48). At Excelsior Elementary in Garden Grove, the math results are based on five of the 30 students tested (the guidelines require 30).

Advertisement

“What do you get for what you’re paying?” asked Christine Olsen, spokeswoman for a Sacramento County district where five of 72 math tests were scored for one school. “There are a lot of things that are better about CLAS (than previous tests). . . . But (these results are) just flat not accurate.”

The state’s scoring guidelines instructed that “the larger the school, the larger the number of students sampled,” but in fact, some large schools had results based on tiny numbers of students.

About four dozen of the state’s junior high and high schools had less than 25% of their tests counted, breaking the main sampling rule.

Some examples: two of 350 math tests were scored at Terrace Hills Junior High in San Bernardino County, 14 of 360 tests at Sequoia Junior High in Simi Valley, 16 of 305 tests at Tetzlaff Junior High in Cerritos, 41 of 494 tests at Mar Vista Middle School in San Diego County and 74 of 590 tests at Portola Middle School in Tarzana.

“It’s ludicrous,” said Rollin Grider, director of curriculum for the district that includes Terrace Hills. “These results are not valid. We can’t draw any conclusions from (them). They told us they’d get us some new results, so we’re waiting for those. And if they don’t, we’ll scream.”

Another key question educators are asking is: Which students were counted? The state’s sampling was done randomly to ward off bias, but some administrators say that in schools in which the samples were too small, the group often lacked diversity and did not reflect the student population.

Advertisement

Cahuenga Elementary’s student body is one-third Asian American and two-thirds Latino, but the tests that were scored all belonged to students of Asian heritage, Houske said.

The problem was flipped at Santee Elementary in San Jose, where only nine or 10 students were scored on each portion of the exam. Santee is 40% Asian American, but no Asian American students’ tests were graded, according to the principal.

Further, four of the tests graded in each area belonged to students with learning disabilities, although only 3.8% of the students overall have such problems.

“It wasn’t our population. The CLAS test is a better assessment tool than the former tests were, but they need to make sure there’s enough time and energy and money to correct them and give them back to schools so they have correct information,” Principal Anderson said.

“It’s not a nice feeling to know that the public is seeing information like this. They don’t know what we know about it. They don’t know it’s inaccurate.”

O’Reilly is The Times’ director of computer analysis. The story was written by Wilgoren.

Widespread Under-Sampling

State guidelines for scoring CLAS tests at individual schools were violated 49.8% of the time, although in most cases samples were only a few tests short. Here are the percentages of schools in which samples were smaller than planned:

Advertisement

CALIFORNIA

Reading Writing Math Overall Grade 4 65% 76% 30% 57% Grade 8 21% 33% 45% 33% Grade 10 26% 44% 60% 43% * LOS ANGELES UNIFIED Grade 4 79% 88% 42% 70% Grade 8 18% 18% 69% 35% Grade 10 28% 47% 60% 45% LOS ANGELES COUNTY Grade 4 70% 81% 34% 62% Grade 8 16% 21% 53% 30% Grade 10 24% 40% 63% 42% ORANGE COUNTY Grade 4 72% 75% 28% 58% Grade 8 13% 16% 49% 26% Grade 10 6% 27% 61% 31%

* Largest school district in California

Source: State Department of Education

Researched by RICHARD O’REILLY and JODI WILGOREN / Los Angeles Times

Scoring CLAS

Because educators lacked the time and money to grade every paper, only 48.7% of the nearly 3 million California Learning Assessment System (CLAS) tests were scored statewide. Here is a breakdown by grade and testing area:

Reading Writing Math Overall Grade 4 54% 53% 60% 56% Grade 8 47% 46% 42% 45% Grade 10 46% 44% 40% 43%

Source: State Department of Education

Researched by RICHARD O’REILLY and JODI WILGOREN / Los Angeles Times

Advertisement