John Deasy, incoming superintendent of schools in Los Angeles, says of the change: “We are not questing for perfect. We are questing for much better.” (Irfan Khan, Los Angeles Times / March 28, 2011)
In Houston, school district officials introduced a test score-based evaluation system to determine teacher bonuses, then — in the face of massive protests — jettisoned the formula after one year to devise a better one.

In New York, teachers union officials are fighting the public release of ratings for more than 12,000 teachers, arguing that the estimates can be drastically wrong.

Despite such controversies, Los Angeles school district leaders are poised to plunge ahead with their own confidential "value-added" ratings this spring, saying the approach is far more objective and accurate than any other evaluation tool available.

"We are not questing for perfect," said L.A. Unified's incoming Supt. John Deasy. "We are questing for much better."

As value-added analysis is adopted — if not embraced — across the country, much of the debate has focused on its underlying mathematical formulas and their daunting complexity.

All value-added methods aim to estimate a teacher's effectiveness in raising students' standardized test scores. But there is no universal agreement on which formula can most accurately isolate a teacher's influence from other factors that affect student learning — and different formulas produce different results.

Nor is there widespread agreement about how much the resulting ratings should count. Tensions are all the greater because the stakes for teachers are high as more districts consider using the evolving science as a factor in hiring, firing, promotions, tenure and pay.

"It is too unreliable when you're talking about messing with someone's career," said Gayle Fallon, president of the Houston Federation of Teachers.

She said many teachers don't understand the calculations. The general formula for the "linear mixed model" used in her district is a string of symbols and letters more than 80 characters long:

y = Xβ + Zv + ε where β is a p-by-1 vector of fixed effects; X is an n-by-p matrix; v is a q-by-1 vector of random effects; Z is an n-by-q matrix; E(v) = 0, Var(v) = G; E(ε) = 0, Var(ε) = R; Cov(v,ε) = 0. V = Var(y) = Var(y - Xβ) = Var(Zv + ε) = ZGZT + R.

"It's doctorate-level math," Fallon said.

In essence, value-added analysis involves looking at each student's past test scores to predict future scores. The difference between the prediction and students' actual scores each year is the estimated "value" that the teacher added — or subtracted.

The Times released a value-added analysis of about 6,000 L.A. Unified elementary school teachers in August that was based on district data. Before school ends, L.A. Unified plans to release its own analysis, confidentially providing teachers with their individual value-added scores. For at least the first year, the teachers' scores will not be used in formal evaluations; whether they are ultimately used is subject to negotiation with the union.

Deasy and many others argue that value-added analysis is far more useful than the common practice of dispatching administrators to classrooms, where they often make pro-forma observations. These reviews overwhelmingly result in "satisfactory' ratings, which may or may not be deserved.

In designing its model, the nation's second-largest school district has wrestled with myriad questions: whether to tweak the model to account for the students in a class who don't speak fluent English, for example, or for those who moved from one school to another during the academic year.

Should value-added models take student race and poverty into account, even if it means having lower expectations for some races and higher ones for others?

Deasy said these were among the most difficult questions the district grappled with. Theoretically, value-added models inherently account for these differences, because each student's performance is compared each year with the same student's performance in the past, not with the work of other students. But many experts say further statistical adjustments are necessary to improve accuracy.

A 2010 study of 3,500 students and 250 teachers in six Bay Area high schools by researchers at Stanford University and UC Berkeley found that, under their model, teachers with more African American and Latino students tended to receive lower value-added scores than those with more Asian students.

Dan Goldhaber, a professor at the University of Washington Bothell, said that there is no definitive answer on the race question but that most specialists in the field support factoring it in because research overwhelmingly shows that it is correlated with student performance.