One of the things I mentioned to commenters in this post is that sometimes probability theory (and also statistics) can lead to counter-intuitive results. One example is Simpson’s Paradox. Simpson’s Paradox is when the reversal of direction of a comparison or an association when data from several groups are combined into one single group (source). So how does it work? Here is a simple example:
A firm has opened a new plant and needs to fill 455 jobs. There are 70 management job openings available and 385 blue collar job openings. 200 women apply for the management positions and 100 women apply for the blue collar positions. In each case 20% and 85% of the women who applied were hired. For management positions 200 men also applied, while for the blue collar position 400 men applied. For the men the respective hiring ratios are 15% and 75%. Looks pretty good from a diversity stand point, no?
Well it turns out that of the women applying for jobs at that plant a little over 58% of them were turned away. Whereas for the men only 45% were turned away. So clearly the company is discriminatory.
What is going on here is Simpson’s Paradox. Because so many more men applied for the blue collar jobs the firm was pretty much going to come looking bad unless it did something like hired every woman applying for the blue collar job and hired women for 69% of its management positions. This is one reason why people have to be careful in looking at statistics. What looks good or bad at one level of aggregation can look change at another. Similar things have been found for women’s salaries. We have all heard the phrase “lying with statistics” and so forth, well this is how it is often accomplished. This is why I like to get data in as disaggregated form as possible. So keep this in mind the next time you see comparisons made at higher levels of aggregation such as men’s salaries vs. women’s salaries.
(Source for the example above.)