Student Evaluations and Teacher Performance

A new study seems to show that student evaluations of teachers are something other than a popularity contest.

A new study seems to show that student evaluations of teachers are something other than a popularity contest.  Sam Dillon for NYT:

How useful are the views of public school students about their teachers? Quite useful, according to preliminary results released on Friday from a $45 million research project that is intended to find new ways of distinguishing good teachers from bad.

Teachers whose students described them as skillful at maintaining classroom order, at focusing their instruction and at helping their charges learn from their mistakes are often the same teachers whose students learn the most in the course of a year, as measured by gains on standardized test scores, according to a progress report on the research.

Financed by the Bill and Melinda Gates Foundation, the two-year project involves scores of social scientists and some 3,000 teachers and their students in Charlotte, N.C.; Dallas; Denver; Hillsborough County, Fla., which includes Tampa; Memphis; New York; and Pittsburgh.  The research is part of the $335 million Gates Foundation effort to overhaul the personnel systems in those districts.

Statisticians began the effort last year by ranking all the teachers using a statistical method known as value-added modeling, which calculates how much each teacher has helped students learn based on changes in test scores from year to year.   Now researchers are looking for correlations between the value-added rankings and other measures of teacher effectiveness.

Research centering on surveys of students’ perceptions has produced some clear early results.  Thousands of students have filled out confidential questionnaires about the learning environment that their teachers create. After comparing the students’ ratings with teachers’ value-added scores, researchers have concluded that there is quite a bit of agreement.  Classrooms where a majority of students said they agreed with the statement, “Our class stays busy and doesn’t waste time,” tended to be led by teachers with high value-added scores, the report said.  The same was true for teachers whose students agreed with the statements, “In this class, we learn to correct our mistakes,” and, “My teacher has several good ways to explain each topic that we cover in this class.”

I’m quite skeptical of these findings.   That’s not surprising, in that I’ve developed rather strong opinions on teaching evaluations from personal experience with them as a college professor.  My own ratings with survey courses were all over the place — sometimes wildly different in different sections of the same class in a given semester with open format questions delivering precisely opposite conclusions.  In the upper sections, populated mostly by political science majors, my evaluations followed a curve nearly identical to the student grades.

Granted, that’s anecdotal, with a very small “n.’  And maybe elementary school students are more insightful and honest than their college peers.

But three other questions jump immediately out at me.

First, which direction does the causality flow?   Maybe the type of students who perform well on tests are more likely to give their teacher credit, rather than the teaching style influencing both the ratings and the test performance?

Second, how do the correlations hold up longitudinally?  That is, do the same teachers get similar ratings and test performances every year?   Otherwise, this may be a function of a teacher getting a randomly generous allocation of bright, well behaved students in a given rating period and it being taken as a sign of good teaching.

Third, what other variables are at work here?   Scanning the PDF of the Gates Foundation report, it appears as if the research team proceeded as if teaching styles and test outcomes were the only factors.  But we know that parental involvement, parental education, home life, child nutrition, and all manner of other things really, really matter.  Maybe the “good” teachers mostly taught in wealthy schools with involved parents who ensure their kids went off to school with a good breakfast, a decent night’s sleep, and had mastered their homework.    If there’s no control for this sort of thing, the study is completely worthless.

FILED UNDER: Environment, Uncategorized, , , , , , , ,
James Joyner
About James Joyner
James Joyner is Professor and Department Head of Security Studies at Marine Corps University's Command and Staff College. He's a former Army officer and Desert Storm veteran. Views expressed here are his own. Follow James on Twitter @DrJJoyner.


  1. john personna says:

    It sounds like they asked the right questions. Rather than “how do you like this teacher” it was “how does this teacher maintain order,” etc.

    Now, all normal human asymmetries probably apply. Taller and better looking teachers probably have an easier time maintaining order, etc.

  2. The idea of expecting anything substantive out of a survey of elementary schoolchildren’s opinions about their teachers strikes me as pretty silly, actually.

  3. just me says:

    I am not sure I agree with you.

    I think there is a difference between a college student evaluation and elementary aged kids-at least in what the may value in a teacher.

    I suspect classroom management isn’t really an issue with college aged kids-most college students are there to learn and aren’t going to require management and behavioral cues and redirections in the same sense that elementary aged kids do. A college professor doesn’t need a time out chair in the classroom-a 4th grade teacher generally does. So college students are going to be looking for different things in a teacher.

    My girls will tell you that their least favorite teachers are those who had poor control of their classroom and I know as well that poor classroom management generally means taking time away from learning.

    I also think college isn’t generally a busy work kind of setting-elementary school and middle school often are (whether it is a good or bad thing is debatable).

    So I am actually not surprised that kids who viewed their classrooms positively also did well on tests-a well managed classroom is generally going to lead to more learning opportunities and fewer distractions.

  4. just me says:

    The idea of expecting anything substantive out of a survey of elementary schoolchildren’s opinions about their teachers strikes me as pretty silly, actually.

    Not sure I agree either.

    My kids could have told you which teachers they liked and why, and I know my daughter loathed her 5th grade Language Arts teacher because she had no control of the classroom. She felt like she wasn’t learning anything.

    I do think there are always going to be the students who won’t take a survey seriously, but if the questions are constructed well, I think you could get something substantive. At the very least any kid can tell you whether or not the teacher maintains order of the classroom.

  5. michael reynolds says:

    So we use surveys and test results to test the effectiveness of teachers who teach to the tests. The only thing missing is a test of the test which the test-teaching teachers test on. Well, that and a test of the test-writers who create both the tests and the tests of the test. Or would a survey of test questions asked by teachers who’ve been surveyed be better?

    I know this guy whose SAT math score was 85th percentile but had never gone beyond 10th grade geometry (where he earned a C but deserved an F) and literally cannot multiply fractions, let alone perform algebra or even define the meaning of the word “calculus.” In fact I’ve known this guy literally my whole life.

  6. Trumwill says:

    Questioning causality is smart, but even so… I think I am with Just Me. Up to about maybe high school, I think that my internal evaluations were actually pretty accurate. I look back at the teachers I would have given low grades back then and… I would still give them low grades. Same with high grades.

    It’s about high school that there starts to be a divergence. Some of the high school teachers that I really didn’t like at the time actually get high marks from me now and vice-versa. And at the college level, it’s actually moreso. I think it’s a combination of diverging priorities (as you get closer to college, getting better grades becomes more important and so you find reasons not to like teachers from whom you don’t get a good grade) and increased specialization (I happen to not like most of my science teachers. Not coincidentally, I am uninterested in science).

  7. Zelsdorf Ragshaft III says:

    I am shocked Dr. Joyner and Mataconis disagree with the idea that evaluation of teachers by students could be both accurate and useful. Actually there is a sold way to figure out who is a teacher and who is drawing a pay check. Those that grade on a curve are drawing a paycheck. If you get less than 60% correct, you fail.

  8. Grewgills says:

    Speaking as someone who has taught middle school (never again) high school, and college and subbed at all levels, I think the questions about classroom management and wasted time are likely to be answered accurately by most elementary and high school students as are questions about fairness. They will still complain about teachers they find hard, but will generally give accurate responses on those questions. I choose to believe the same about college students because in my relatively limited experience i have been given good reviews, despite few students earning As.