Monday, August 23, 2010

Value-added measurements: not as easy as it sounds

The biggest education story recently has been the L.A. Times reporting on how student test scores do and do not improve in the classrooms of "more than 6,000 third- through fifth-grade teachers for whom reliable data were available." The report used "value added analysis" in which "each student's performance is compared with his or her own in past years, which largely controls for outside influences often blamed for academic failure: poverty, prior learning and other factors." The article discussed four teachers by name and linked to a data base showing results on the others, again by name.

After a week of letting the issue simmer in my thinking, I have some thoughts about the method.

First, the analyses are worth creating. This kind of data on student progress could be quite useful for professional development, first for individual teachers' individual reflection, and second for individual conversations with supervisors about next steps in professional development.  Separate from evaluation decisions, tenure decisions, promotion decisions, intervention decisions, and removal decisions, the information could help teachers improve their craft and help school leaders support teachers on making those improvement.

A recent national study reported that:
If only three years of data are used for estimation (the amount of data typically used in practice), Type I and II errors for teacher-level analyses will be about 26 percent each. This means that in a typical performance measurement system, 1 in 4 teachers who are truly average in performance will be erroneously identified for special treatment, and 1 in 4 teachers who differ from average performance by 3 to 4 months of student learning will be overlooked. 
Most elementary teachers have 25 or fewer students. Middle school teachers often have more students, but they have shorter periods for working with each one.  Given those small numbers, weakness in the statistics should really not be a surprise.

The value-added approach requires "before" and "after" test scores. Using Kentucky's current state assessments, only reading and mathematics teachers in grades four through eight could be analyzed this way. In science, history, and other subjects, we do not test every year. Even in reading and math, we do not have grade-to-grade results we can use for teachers in third grade and earlier or in ninth grade or higher.  Future testing could provide additional data and options--but those approaches are in the design phase and their funding in limbo.

In current discussions, many educators are clearly very wary of value-added analysis.  Some of the wariness rests on specific evidence, including examples of plans that ignore the best methods available and informations on the uncertainties even when the best methods are applied. Some comes from wanting to hear the details before endorsing any approach. Some does, frankly, seem to come from general uneasiness with change, and at least a little comes from resistance to responsibility.

The thing is, if teachers do not trust and respect an evaluation system, it will not work. That is, an evaluation system without substantial buy-in will not lead to increasing teaching quality that leads in turn  to increasing student performance.  Accordingly, any proposal for using value-added data in evaluations has to pass two standards: technical soundness and teacher acceptance.

As I read the current discussion, that teacher acceptance will have to be earned through careful listening, careful responses, careful design, careful explanations of the design, and and very careful implementation of any design that gets adopted.  That is the kind of work Commissioner Holliday has requested from the task forces now looking at Kentucky evaluation methods, and I'm honored to be participating in those efforts.

If the reports are to be created on individual teachers, I think they should be confidential rather than public documents.

It is one thing to imagine a principal using the value-added analysis in the context of observing the teacher in the classroom, discussing the data with the teacher, offering the teacher support to improve, and checking whether the teacher uses that support--and doing all of that based on careful training in a system that allows appeals and reviews of the principal's judgments.  I think it is possible to do that in a way that delivers for students and is fair to teachers.

It is quite another to imagine parents pulling the analysis from a website separated from all other information and controls.  In that second model, there is huge opportunity for good teachers to have their skill and effort devalued.   I do not see a way to make that fair to teachers.  Further, trading off parents' knowledge against teachers' concerns, I do not see a way it nets out to helping improves student performance.

I can add that I think public release of individual teacher data will be politically impossible.  Educators will fight that part fiercely enough to win.  Only when I add that part, I do not want that political calculus to be the main argument.  The main argument should be that we cannot make it fair and we cannot make it helpful to students.


  1. As Commissioner Holliday said at the TEK talk, “what gets measured gets done.” Our assessment system needs to measure individual student growth and that information must be given to parents if they are to be effective partners in improving student growth. You mention limiting information on individual teachers to confidential documents just for the principal. That would cut the SBDM out of an important part of their responsibility to improve student achievement. I’m very disappointed to read this analysis and it seems like we’re setting ourselves up to miss out on another valuable opportunity at RTTT funding in the future because of a lack of vision.

  2. Anon,

    I'm glad to hear your different opinion on the issue. Quick factual note: School councils have never participated in the evaluation of individual teachers or in decisions about renewing, tenuring, disciplining, or dismissing individual teachers.

  3. A proficient school council keeps its focus on student achievement. If teacher A’s student have high growth and teacher B’s have low growth, an effective SBDM needs to be asking why this is happening. As you pointed out, this does not mean they need to be renewing, tenuring, disciplining, or dismissing individual teachers. If student growth data is withheld from the SBDM, you’re tying one hand behind their back.


Updates and data on Kentucky education!