Advertisement

2 Experts Say Stanford 9 Test Has Many Flaws

Share
TIMES EDUCATION WRITER

California has invested millions of dollars to make the Stanford 9 test the foundation of its school reform drive, but two experts--the first to provide an independent assessment of the exam--contend that it is poorly constructed and culturally biased.

The exam, published and copyrighted by Harcourt Educational Measurement of San Antonio, is kept secret so that the same test items can be used year after year with very few changes--an approach that holds down costs but has prevented public scrutiny of the high-stakes examination. School personnel who handle or administer the exam sign affidavits that they will not copy or disseminate the test.

The Times obtained a copy of the sixth-grade exam in April and provided it to the two experts for an assessment. Several other leaders in the testing field declined to analyze the exam, citing a variety of reasons.

Advertisement

With the third annual test results due to be released by the state Monday, the two experts who did peruse the test contend that the Stanford 9 standardized test is not a good measure of whether schools are doing a better job in the classroom.

“The truth is it’s a fundamentally flawed [testing] system,” said W. James Popham, an emeritus professor of education at UCLA.

“Students’ scores are almost certain to be meaningfully contaminated by factors that have little to do with the effectiveness of a teaching staff’s instructional efforts,” he said.

As a result, he said, California should not reward or punish students, teachers and schools strictly on the basis of test scores as the state plans to do.

Robert Schaeffer, public education director for the National Center for Fair and Open Testing, known as Fair Test, a nonprofit advocacy group highly critical of standardized tests, sounded a similar theme. “If the public could look at the content of these exams, people would be appalled at what’s being used to measure educational quality,” he said.

In a statement, Harcourt acknowledged that criticisms of standardized tests have “some merit” but added that they are nothing new.

Advertisement

“There is some merit in the criticisms, but it is limited,” said Eugene T. Paslov, president of Harcourt Educational Measurement.

“The Stanford 9 . . . is one of the finest instruments of this type in the country,” Paslov said. “It is not the only information that is required to determine how well students are doing, but it is a valuable source of information that helps educators make thoughtful decisions about students’ academic progress.”

For Popham, a chief problem with the Stanford 9 is that too many items measure what he would term inherited aptitudes or what children have learned at home, rather than what has been taught at school. Several items that depend on knowledge of computers, he noted, might put children from low-income homes at a disadvantage.

“Too many of the items are apt to be answered correctly more often by youngsters whose families are well off . . . [or] by children whose parents completed higher levels of education,” he said.

Popham is an advocate of assessment but said many standardized achievement tests are being used inappropriately. He tends to dislike “norm-referenced” tests like the Stanford 9 that compare students to a national sample. He prefers “criterion-referenced” tests that grade students on an absolute standard, rather than a curve.

Wording of Questions Criticized

Both Schaeffer and Popham complained of sloppy or trickily worded passages and answers.

In one segment about a legendary Philadelphia retailer, the student reads that the merchant shortened long work hours for his employees, provided medical benefits and started a school for them. It also contains this information:

Advertisement

John Wanamaker introduced many “firsts.” His was the first store to be lit by electricity rather than gas. It had the first mail-order catalog, first full-page ad in a newspaper and first restaurant in a store. It was the first to total one million dollars in sales in one day.

[A subsequent store] was the first to be fully equipped with telephones and protected by a fire alarm system.

Here is a question posed to students:

John was the first store owner to--

F. offer a lay-away plan

G. pay his workers weekly

H. sell a variety of things

J. give his workers medical benefits

Arguably none of the responses is correct based on the text, Popham and Schaeffer agreed.

Another item, in a section of questions that is supposed to test students on their knowledge of California reading standards, also troubled both experts.

An “application” for a community library card specifies which materials may be checked out and for how long. Among other details, it states:

Videotapes and magazines may not be renewed; a book may be renewed once if another borrower has not requested it. . . . Certain reference materials, such as encyclopedias, may not be taken from the library.

Here is a related question:

With regard to the library’s lending policies, which of these statements is not true?

F. A book may be renewed once if no one is waiting for it.

G. An encyclopedia may not be checked out at all.

H. A magazine may be renewed as many times as the borrower needs it.

J. A videotape may not be renewed.

At the least, Schaeffer said, “the reader has to wade through complicated double negatives to reject” two incorrect answers (G and J). The correct answer (H) is clear enough, assuming the student grasps the concept of flagging an answer that is not true.

Advertisement

Issues Raised About 325 Items on Test

Both Popham and Schaeffer raised questions about dozens of the 325 items on the test. The language section, Popham said, “simply oozes cultural bias.” Both noted that the spelling section is more a proofreading test, requiring students to single out which option is spelled incorrectly.

Many math items, they said, depended on knowledge of sometimes obscure terms, and many involve a great deal of reading, hindering children who lack English fluency.

Given such weaknesses, both men said California is misguided in using Stanford 9 results to gauge educational quality and make high stakes decisions.

As of now, nearly $1 billion in bonuses hang in the balance for teachers, students and principals, even janitors, at schools that improve by a prescribed amount. Some students who score poorly will be at risk of being held back; pupils who do well could qualify for scholarships, regardless of financial need. In extreme cases, some low-performing schools could risk a takeover by the state and principals could lose their jobs.

Scott Hill, chief deputy superintendent over accountability with the California Department of Education, says that too much weight has been attached to the Stanford 9. But he emphasized that the state is working to develop a better assessment tool and a more wide-ranging accountability system.

One key to that will be improving the portions of the Stanford 9 that assess how well students are learning material specified in California’s new academic standards. Some standards-based questions are included now, but state officials acknowledged that those portions are not yet reliable.

Advertisement

Meantime, Hill said, “we’re using the [Stanford 9] to the best of our and its ability to fulfill state policy.”

For now, the state is expected to announce Monday that overall scores improved.

Many districts have reported substantial gains over last year’s scores. Alhambra, which saw sizable jumps almost across the board, credited an emphasis on teaching the state’s prescribed curriculum.

ABC Unified School District in Cerritos attributed significant gains at many of its schools to focused attention on reading, particularly at traditionally low-performing schools.

Still, even such good news is certain to generate debate. Do the gains result from growing familiarity with the exam? Or are instructors doing a better job of teaching and students of mastering the basic skills that the exam aims to assess?

More cynically, do increases reflect a penchant by teachers to “teach to the test”--or even to cheat--that results from heaping extraordinarily high stakes, such as cash bonuses, onto an assessment tool that was not designed to handle them?

Stanford 9 scores are to be posted at noon Monday on the California Department of Education Web site, https://www.cde.ca.gov.

Advertisement

*

Times researcher Maloy Moore contributed to this story.

(BEGIN TEXT OF INFOBOX / INFOGRAPHIC)

Problematic Test

Two test experts who analyzed a copy of the sixth-grade Stanford 9 exam for The Times said that it was flawed for a variety of reasons. Here are three questions from the test and how the experts rated them:

Advertisement