Let’s consider the following table:

This is not a randomized experiment. The data is observed.

By looking at the data, you can observe the following:

• Female in Control Group has 5% chance of getting heart attack, and 7.5% on Treatment Group. (Take the drug increases heart attack risk in Female)
• Male in Control Group has 30% chance of getting heart attack, and 40% on Treatment Group. (Take the drug increases heart attack risk in Male)
• Overall, the people in Control Group has 21.6% chance of getting heart attack, and 18.5% on Treatment Group. (Take the drug decrases heart attack risk overall)

How can it be? The drug is bad for Females, bad for Male, but good for people?

If you think my mathematics is wrong, you probably are confused with the behavior of aggregation of fractions. It is possible that if A/B > a/b and C/D > c/d, (A+C)/(B+D) < (a+c)/(b+d). So, the case presented above is real. It is just the uneven distribution of control and treatment groups that makes the reversal in probability.

The question is why the drug is bad for Females, bad for Male, but good for people? The answer is that Gender also affects the decision to take drugs or not. You can observe that there are 40 females over 60 who took drugs while only 20 males over 60 took drugs. Gender is a confounder.

We can adjust for the confounder by averaging the probability of getting a heart attack in control and treatment groups. We can divide by two because we have an equal number of males and females. A bad drug for females and males is also bad for the overall population as well.

Does Simpson’s Paradox exist in the real world?

• In an observational study published in 1996, open surgery to remove kidney stones had a better success rate than endoscopic surgery for small kidney stones. It is also better for large kidny stones. However, it had lower success rate overall. The paradox is due to the fact that larger stones tend to use open surgery more, and have lower success rate.
• In a study in 1995, smokers had a higher survival rate over twenty years than non-smokers. However, non-smokers had a better survival rate in six out of seven age groups with mimnimal difference in the seventh. Why smoker had a higher survival rate overall? It is because age is affecting the survival rate and whether a person is smoker or not (average smoker was younger)

Let’s consider another data generating process:

Now, instead of having a confounder, we have a mediator, Blood Pressure. The data is exactly the same as above.

• Drug will lower the blood pressure (40/60[Treatment, Low BP] V.S. 20/60[Control, Low BP])
• Low blood pressure has lower heart attack probability (4/60[Low BP, heart attack rate] V.S. 20.60[High BP, heart attack rate]
• Overall, drug will lower the risk of heart attack (11/60 V.S. 13/60)

It is not necessary to control blood pressure. The overall result is correct. On the other hand, if we control for blood pressure, the risk of heart attack is higher if you take drugs.

• In Low blood pressure group, 7.5% risk of having heart attack in treatment group V.S. 5% in control group
• In high blood pressure group, 40% risk of having heart attack in treatment group V.S. 30% in control group.

Some would say that whether we need to control for a variable depends on the timing of measurement. In the Gender example, since the measure is taken before the drug, we need to control for it. In the blood pressure example, since the blood pressure is measured after taking the drug, we do not need to control for it. Consider the following example,

B can happen after X. But if you control for B, then you will open the backdoor path.

The lesson learnt is that not only we should look at the data, but understand the data generation process. Based on the process, we need to determine the strategy to adjustment.