Friday, October 22, 2010

Limits of value-added teacher evaluations

In the debate over how (and whether) to use student test scores in evaluating teachers, the "value-added" approach calls for looking not simply at how students score, but at how much they improve from year to year--at what value having a particular teacher seems to add to a student's academic performance.

The idea has some intuitive appeal, but it comes with some serious problems.

The largest may be that it's hard to get enough student results to be sure we're seeing a fair representation of teacher impact.  The big Sanders studies in Tennessee were, among other things, big studies.  The findings reflect data from many teachers and many many students.  Trying to convert the method to give credible information on individual teachers will be a tougher challenge. Back in July, a report from the National Center for Education Evaluation highlighted how big the challenge could be:
The simulations suggest that, if three years of single classroom data per year are available for each teacher and the system aims to identify low performing teachers, then 1 in 4 teachers who are truly average in performance may be erroneously classified as performing significantly below the district average. They also suggest that 1 in 4 teachers whose true performance is lower than the district average by 3 to 4 months of student learning per year may be overlooked. In contrast, with 10 years of classroom data per teacher (or, for example, two years of data for teachers who teach five classrooms per year), the error rates fall to 11 percent. Similar error rates would be observed if the system aims to identify high performing teachers.
And then, here are three more challenges to consider:
■■ Not all subjects and grades currently have mandated tests. As a result, if teacher effectiveness measures were limited to a score based on teachers’ contributions to student performance on standardized tests, the feedback would exclude the majority of teachers— all of whom have an important role in student learning. This issue could be resolved by additional tests, but tests are resource- and time intensive (for both students and teachers) and are highly variable in quality.
■■ Value-added measures, which determine a teacher’s unique contribution to each student’s performance, offer fair comparisons among teachers within a system, but they do not and cannot help teachers understand why one teacher is more successful than another. Teachers with the highest and lowest value-added scores are both left to speculate about what they did to merit their scores. More important, the scores do not suggest what a teacher would have to change to improve his/her effectiveness in the classroom.
■■ For some teachers, particularly those early in their careers, consequential performance judgments would be made based on the test performance of relatively few students. Though this concern diminishes over time, multiple measures could allow more accurate judgments earlier in a teacher’s career when they could have a significant impact on a teacher’s professional growth.
Those three points are part of the logic behind the Gates Foundation's work on Measuring Effective Teaching, and taken from the report on that work I discussed earlier in the month.

That's four problems: getting enough data for credible statistics, untested subjects, lack of information on how to improve, data that comes later than needed.

Taken together, those problems show that sound evaluation systems will need to include other kinds of information, taken from observing teachers working with students and with colleagues and from looking at student work beyond the data pulled from state assessment systems.

Value-added data is never going to be the main solution to the challenge of helping each teacher become increasingly effective over the course of his or her professional career.

1 comment:

  1. Hi Susan,

    Your critiques of value added are accurate and fair. However, it's important to note that correctly capturing the student to teacher alignment (which most districts/states don't do very well!), setting a high level of statistical sensitivity for when value added makes an inference of teacher quality, and using multiple years of analysis in addition to multiple years of student data - can all allow the value added method to make a much more accurate inference than some critiques have claimed.

    That said, your point that VAM should never be the MAIN solution for either determining or improving teacher quality is right on. To be fair to those who advocate for the utility of the value added model in interpreting data and improving instruction (myself included!), I can't think of anyone who says it should be the sole and final determinant or source of information. It should always be used in the context of multiple measures, and a coherent system of professional improvement.

    Even where value added has the capacity to provide some limited information on instruction, you are also right that it only works for tested subjects and grades - in many cases only 30%-40% of teachers in most systems.

    This calls for us to be more creative, inclusive, and exploratory of other ways of setting high levels of rigor/quality for our students and measuring progress toward those marks.

    VAM is important and can be incredibly useful. But it's limited in several important ways, that you have highlighted.

    Thanks much for this piece and have a great weekend. Go Cats!

    Jason Glass


Updates and data on Kentucky education!