Scaling Down Inequality
Gender bias in student evaluations of college professors diminishes considerably by changing the scale.
Here’s the paper abstract:
Quantitative performance ratings are ubiquitous in modern organizations—from businesses to universities—yet there is substantial evidence of bias against women in such ratings. This study examines how gender inequalities in evaluations depend on the design of the tools used to judge merit. Exploiting a quasi-natural experiment at a large North American university, we found that the number of scale points used in faculty teaching evaluations—whether instructors were rated on a scale of 6 versus a scale of 10—significantly affected the size of the gender gap in evaluations. A survey experiment, which presented all participants with an identical lecture transcript but randomly varied instructor gender and the number of scale points, replicated this finding and suggested that the number of scale points affects the extent to which gender stereotypes of brilliance are expressed in quantitative ratings. These results highlight how seemingly minor technical aspects of performance ratings can have a major effect on the evaluation of men and women. Our findings thus contribute to a growing body of work on organizational practices that reduce workplace inequalities and the sociological literature on how rating systems—rather than being neutral instruments—shape the distribution of rewards in organizations. [emphases mine]
Drum highlights this explanation from later in the paper:
Drawing from a complementary survey experiment, we show that this effect is not due to gender differences in instructor quality. Rather, it is driven by differences in the cultural meanings and stereotypes raters attach to specific numeric scales. Whereas the top score on a 10-point scale elicited images of exceptional or perfect performance—and, as a result, activated gender stereotypes of brilliance manifest in raters’ hesitation to assign women top scores—the top score on the 6-point scale did not carry such strong performance expectations. Under the 6-point system, evaluators recognized a wider variety of performances—and, critically, performers—as meriting top marks. Consequently, our results show that the structure of rating systems can shape the evaluation of women’s and men’s relative performance and alter the magnitude of gender inequalities in organizations. [emphases presumably Drum’s]
In other words, students viewed a 9 or 10 on a scale of 1-10 as implying true brilliance, and they were reluctant to evaluate female instructors as brilliant. However, a 6 on a scale of 1-6 doesn’t carry the same connotations. Students interpret it as really good, but not necessarily brilliant. Because of that, they were perfectly happy to evaluate the top female instructors with the top evaluation.
Do you believe this? Do I believe it? Beats me. The sample size in the study is large, so that’s not a problem. The switch to a 6-point scale was unrelated to gender concerns, so that’s not an issue. The modeling appears to be reasonable. And the change in results is large. The effect sure seems real, but it’s still anyone’s guess about why the effect is real and why it’s so large. Given my respect for cognitive biases like framing effects, the authors’ explanation seems OK to me, but it’s still a bit of a guess. I’d sure like to hear a few other people weigh in.
The authors both have PhDs in sociology from Harvard and are tenured at top-drawer universities, and the article is forthcoming in what I believe to be the top journal in their field, so I’ll defer to their expert judgment, especially since I’ve only skimmed the article.
One thing that occurs to me about the specific scale—6 vice 10—is that the latter number has a particular connotation that might bring out gender bias in a way the former does not. That is, it has long been a custom for boys and men to rate girls and women on their physical attractiveness using a scale of 1 to 10. I first became aware of this phenomenon in 1979, following publicity for the movie “10” starring Bo Derek, although the practice may well long predate that film. So, it’s quite possible that male students asked to rate a female professor on a scale of 1 to 10 will subconsciously factor in her sexual desirability in way that they wouldn’t with a male professor. And, on a 6-point scale, that connotation simply wouldn’t be introduced.
That’s pure conjecture, of course, and I’m not sure offhand how one would even go about testing it.