The idea has some intuitive appeal, but it comes with some serious problems.
The largest may be that it's hard to get enough student results to be sure we're seeing a fair representation of teacher impact. The big Sanders studies in Tennessee were, among other things, big studies. The findings reflect data from many teachers and many many students. Trying to convert the method to give credible information on individual teachers will be a tougher challenge. Back in July, a report from the National Center for Education Evaluation highlighted how big the challenge could be:
The simulations suggest that, if three years of single classroom data per year are available for each teacher and the system aims to identify low performing teachers, then 1 in 4 teachers who are truly average in performance may be erroneously classified as performing significantly below the district average. They also suggest that 1 in 4 teachers whose true performance is lower than the district average by 3 to 4 months of student learning per year may be overlooked. In contrast, with 10 years of classroom data per teacher (or, for example, two years of data for teachers who teach five classrooms per year), the error rates fall to 11 percent. Similar error rates would be observed if the system aims to identify high performing teachers.And then, here are three more challenges to consider:
■■ Not all subjects and grades currently have mandated tests. As a result, if teacher effectiveness measures were limited to a score based on teachers’ contributions to student performance on standardized tests, the feedback would exclude the majority of teachers— all of whom have an important role in student learning. This issue could be resolved by additional tests, but tests are resource- and time intensive (for both students and teachers) and are highly variable in quality.
■■ Value-added measures, which determine a teacher’s unique contribution to each student’s performance, offer fair comparisons among teachers within a system, but they do not and cannot help teachers understand why one teacher is more successful than another. Teachers with the highest and lowest value-added scores are both left to speculate about what they did to merit their scores. More important, the scores do not suggest what a teacher would have to change to improve his/her effectiveness in the classroom.
■■ For some teachers, particularly those early in their careers, consequential performance judgments would be made based on the test performance of relatively few students. Though this concern diminishes over time, multiple measures could allow more accurate judgments earlier in a teacher’s career when they could have a significant impact on a teacher’s professional growth.Those three points are part of the logic behind the Gates Foundation's work on Measuring Effective Teaching, and taken from the report on that work I discussed earlier in the month.
That's four problems: getting enough data for credible statistics, untested subjects, lack of information on how to improve, data that comes later than needed.
Taken together, those problems show that sound evaluation systems will need to include other kinds of information, taken from observing teachers working with students and with colleagues and from looking at student work beyond the data pulled from state assessment systems.
Value-added data is never going to be the main solution to the challenge of helping each teacher become increasingly effective over the course of his or her professional career.